P.Mean >> Category >> R software (created 2007-08-22).

These pages discuss how to program using R software, an open source package for statistical analysis. Also see Category: SPSS software, Category: Statistical computing.

My blog also has some entries about R software.

I've written a few R programs and converted the output to html using the knitr package. These programs are mostly short tutorials. Here's a list of the programs I've written so far.

2. qqplot. A program to show how to check the normality assumption using a normal probability plot (also known as a qqplot or quantile-quantile plot).
1. axis. A program to show some of the options to manipulate the axes on a plot.

2013

26. P.Mean: Testing an R function (created 2013-11-02). I am working on a grant resbumission and one of the things I need to do is write up is more details about the process we will use to develop the programs that we need to monitor patient accrual. I write a lot of programs, but almost all of them are programs that are run once, in a very specialized and tightly controlled setting. If you develop programs that other people will use, you need to test them against a range of inputs to make sure that they do what you want them to do. This is one of the topics covered in a short course I took at the Joint Statistical Meetings, Practical Software Engineering for Statisticians taught by Murray Stokely of Google.

2012

25. P.Mean: Animations in R (created 2012-12-08). About twenty years ago, computers got fast enough to provide smooth animations of small and moderate sized data sets. There was a lot of effort to incorporate animation, such as 3D rotation of point clouds into statistical software programs. The results looked stunning, but I'm not sure if it led to many great insights. I experimented with programs like JMP, but never really felt too comfortable with them. So, I gave up on animation for the most part. But there is one area where animation makes sense and that is in teaching.

24. P.Mean: Mapping my runs in R (created 2012-10-10). I started running as part of a 2011 New Years resolution to build up my stamina to the point where I could run a five kilometer race. I didn't care too much how fast I ran, but I did want to run the whole way without stopping and without taking a walking break. I've run in about dozen different five kilometer and four mile races. In the middle of 2011, I bought an iPhone with a built-in GPS system. It is the coolest thing ever. I used that iPhone to track my runs and started sharing the maps of my runs with other details on my running log. You can produce nice running routes using Google Maps, but I wanted to be able to manipulate the data a bit, so I developed a simple program in R. Here are the details.

23. P.Mean: Data management in R versus SAS (created 2012-08-27). Someone on LinkedIn was arguing that data management is easier in SAS than in R. A lot of times these claims are subjective. What is easier for one person may be more difficult for another. Also, you need to consider whether it is easy in that it is efficient (uses little computer time), fast to program, less likely to need debugging, or simpler for a non-statistician to understand and cross-check the code. It's probably some mix of this. So anyway, this person on LinkedIn was challenging the group to come up with a "simple" way in R to replicatea common data management scenario. Here's my response to that challenge.

22. The Monthly Mean: Honorable mention in the "Applications of R in Business" contest (January/February 2012)

2011

21. P.Mean: My entry into the Applications of R in Business Competition (created 2011-10-20). I recently heard about a contest sponsored by Revolution Analytics. Revolution Analytics is offering \$20,000 in prizes to the best examples of applying R to business problems. This competition is designed to grow the collection of on-line materials describing how to use R, and to spur adoption of R and Revolution R for business applications. The contest is open to all R users worldwide. http://www.inside-r.org/howto/enter. I want to submit the R code that I developed for the accrual paper published by Byron Gajewski and me in 2008.

2010

20. P.Mean: Putting variable names into a model automatically (created 2010-09-20). I always have trouble with including a changing variable name into a sequence of statistical models in R, so when someone wrote about it on the R-Help list, I thought I should try some of the suggestions and then write them down here so I don't forget.

19. P.Mean: Lessons learned the hard way: don't presume to know how your software handles missing value codes (created 2010-05-28). I'm working on an interesting project that involves summing up rvu's (resource value units) across certain records for a given patient. Some of the rvu's are missing. How should the program handle these missing rvu's. We discussed this by email and agreed to ignore missing rvu's in the sum. This is effectively the same as replacing the missing rvu's with zero. There is two cases worth worrying about, though, and handling those cases makes me realize just how tricky missing values are.

18. P.Mean: Should I learn R instead of SAS (created 2010-04-05). I got a question from a statistician beginning her career asking whether she should learn SAS or R. That's a very personal question and there is no perfect answer. Here is what I wrote.

2009

17. P.Mean: Randomly generating simple math problems using R (created 2009-11-30). To help drill simple concepts in math for my second grade son, I developed a series of R programs to generate these problems randomly. It makes use of the sample function on a sequence of integers and allows you to limit or expand the scope of the problems generated. It is far from perfect, but it shows a few simple tricks in R.

Other resources:

Vance A. Data Analysts Captivated by R�s Power. The New York Times. 2009. Excerpt: To some people R is just the 18th letter of the alphabet. To others, it�s the rating on racy movies, a measure of an attic�s insulation or what pirates in movies say. R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it. [Accessed October 21, 2009]. Available at: www.nytimes.com/2009/01/07/technology/business-computing/07program.html

Randall Pruim. Foundations and Applications of Statistics: An Introduction Using R. Description: This is a description on the publisher's website of a book that uses R to illustrate important theoretical cocepts in Mathematical Statistics. This book is not available until March 2011. [Accessed October 20, 2010]. Available at: http://www.ams.org/bookstore-getitem/item=AMSTEXT-13.

Phillip Grosjean. Integrated Development Environoments / Script Editors for R. Description: There are several text editors that can integrate with R, including the ability to highlight syntax and submit code directly to R. This page provides a good summary of these editors, including the editor that I use, TextPad. [Accessed November 4, 2010]. Available at: http://www.sciviews.org/_rgui/projects/Editors.html.

Webpage (Word document): Jospeh G. Voelkel. An Introduction to R for Windows Excerpt: "This technical report is designed to show you how to download a copy of R and set it up to be convenient on your Windows computer and provide you with a some simple examples of what R can do and how R processes data. There are many sources of information on R. This report is designed just to get you started." Accessed on March 29, 2011]. http://www.rit.edu/~w-sgas/about/TR%202005-6.doc

Søren Højsgaard. Miscellaneous material on R and StatWeave. Description: This page has links to several interesting papers on R, such as R in a few hours - a brief introduction, an introduction to basic graphics in R, some graphs made by the lattice package, reshaping data with the reshape function, a note on using the sqldf package for managing moderately large datasets in R, and linear algebra in R - a brief introduction. [Accessed October 13, 2010]. Available at: http://genetics.agrsci.dk/%7Esorenh/misc/index.html.

Sonego P. One R Tip A Day. Description: A blog site that offers tips on how to do things in R. It also lists other R resources. [Accessed October 21, 2009]. Available at: onertipaday.blogspot.com/

Rossini A, James DA. Open Source Statistical Software (OS3) in Pharma Development: A case study with R. Description: This article describes some of the real and imagined problems with validating an open source program (R) for FDA. This is a PowerPoint presentation from the 2007 Use R! conference. [Accessed October 21, 2009]. user2007.org/program/presentations/rossini.pdf

Alex Guazzelli, Michael Zeller, Wen-Ching Lin, Graham Williams. PMML: An Open Standard for Sharing Models. The R Journal. 2009;1(1):60-65. Excerpt: "The PMML package exports a variety of predictive and descriptive models from R to the Predictive Model Markup Language (Data Mining Group, 2008). PMML is an XML-based language and has become the de-facto standard to represent not only predictive and descriptive models, but also data preand post-processing. In so doing, it allows for the interchange of models among different tools and environments, mostly avoiding proprietary issues and incompatibilities." [Accessed May 29, 2010]. Available at: http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf.

Kabacoff R. Quick-R: Home Page. Excerpt: R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for experienced users of popular statistical packages such as SAS, SPSS, Stata, and Systat (although current R users should also find it useful). My goal is to help you quickly access this language in your work. [Accessed October 21, 2009]. URL: www.statmethods.net

Fox J. The R Commander: A Basic-Statistics Graphical User Interface to R. Journal of Statistical Software. September 2005, Volume 14, Issue 9. Abstract: Unlike S-PLUS, R does not incorporate a statistical graphical user interface (GUI), but it does include tools for building GUIs. Based on the tcltk package (which furnishes an interface to the Tcl/Tk GUI toolkit), the Rcmdr package provides a basic-statistics graphical user interface to R called the �R Commander.� The design objectives of the R Commander were as follows: to support, through an easy-to-use, extensible, crossplatform GUI, the statistical functionality required for a basic-statistics course (though its current functionality has grown to include support for linear and generalized-linear models, and other more advanced features); to make it relatively difficult to do unreasonable things; and to render visible the relationship between choices made in the GUI and the R commands that they generate. The R Commander uses a simple and familiar menu/dialog-box interface. Top-level menus include File, Edit, Data, Statistics, Graphs, Models, Distributions, Tools, and Help, with the complete menu tree given in the paper. Each dialog box includes a Help button, which leads to a relevant help page. Menu and dialog-box selections generate R commands, which are recorded in a script window and are echoed, along with output, to an output window. The script window also provides the ability to edit, enter, and re-execute commands. Error messages, warnings, and some other information appear in a separate messages window. Data sets in the R Commander are simply R data frames, and can be read from attached packages or imported from files. Although several data frames may reside in memory, only one is �active� at any given time. There may also be an active statistical model (e.g., an R lm or glm object). The purpose of this paper is to introduce and describe the use of the R Commander GUI; to describe the design and development of the R Commander; and to explain how the R Commander GUI can be extended. The second part of the paper (following a brief introduction) can serve as an introductory guide for students who will use the R Commander. [Accessed October 21, 2009]. Available at: www.jstatsoft.org/v14/i09

Hornik, K. R FAQ. Description: This page answers frequently asked questions about R. There are companion FAQ lists for Windows and Macintosh users that detail specific issues for those platforms. [Accessed October 21, 2009]. URL: cran.r-project.org/doc/FAQ/R-FAQ.html

Harrell FE. R for Clinical Trial Reporting: Reproducible Research, Quality and Validation. [Accessed October 21, 2009]. Description: The title says it all. This is a PowerPoint presentation from the 2007 Use R! conference. Available at: user2007.org/program/presentations/harrell.pdf

R Development Core Team. R Manuals. Excerpt: The following manuals for R were created on Debian Linux and may differ from the manuals for Mac or Windows on platform-specific pages, but most parts will be identical for all platforms. The correct version of the manuals for each platform are part of the respective R installations. Here they can be downloaded as PDF files or directly browsed as HTML. [Accessed October 21, 2009]. Available at: cran.r-project.org/manuals.html

Carey V. R Journal. Description: This newsletter offers informal, but peer-reviewed articles about new features in R, with a special emphasis on new R packages. A previous link to R News at cran.r-project.org/doc/Rnews/ is now obsolete, but the webmaster was smart enough to leave the old link standing to avoid linkrot. [Accessed October 21, 2009]. Available at: journal.r-project.org/

R Development Core Team. R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. Excerpt: [This document] is intended to provide a reasonable consensus position on the part of the R Foundation for Statistical Computing (hereafter referred to as the R Foundation) relative to the use of R within these regulated environments and to provide a common foundation for end users to meet their own internal standard operating procedures, documentation requirements and regulatory obligations. The R Foundation for Statistical Computing makes no warranties, expressed or implied, in this document. [Accessed October 221, 2009]. www.r-project.org/doc/R-FDA.pdf

Patrick Burns. R Relative to Statistical Packages: Comment 1 on Technical Report Number 1 (Version 1.0) Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS. Excerpt: "The technical report Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS focuses on comparing strengths and weaknesses of SAS, SPSS and Stata. There is a section on R, which some have suspected damns R with faint praise. In particular, R is characterized as hard to learn. Finally there are sections on a number of very specialized pieces of statistical software. The primary purpose of this comment is to provide an alternative view of the role that R has in the realm of statistical software." [Accessed January 14, 2010]. Available at: http://www.ats.ucla.edu/stat/technicalreports/Number1/R_relative_statpack.pdf.

Robert Muenchen. R-SAS-SPSS Add-on Module Comparison. Excerpt: "R has over 3,000 add-on packages, many containing multiple procedures, so it can do most of the things that SAS and SPSS can do and quite a bit more. The table below focuses only on SAS and SPSS products and which of them have counterparts in R. As a result, some categories are extremely broad (e.g. regression) while others are quite narrow (e.g. conjoint analysis). This table does not contain the hundreds of R packages that have no counterparts in the form of SAS or SPSS products. There are many important topics (e.g. mixed models) offered by all three that are not listed because neither SAS Institute nor IBM's SPSS Company sell a product focused just on that." [Accessed January 14, 2010]. Available at: http://r4stats.com/add-on-modules.

Baron J. R site search. Excerpt: This search will allow you to search the contents of the R functions, package vignettes, task views, and R-help mail archives. [Accessed November 13, 2009]. Available at: http://search.r-project.org/nmz.html

Smith D. Revolutions. News about R, statistics and the world of open source from the staff of REvolution Computing. Excerpt: Revolutions is a blog dedicated to news and information of interest to members of the R community. Revolutions is created, hosted, and maintained by REvolution Computing. We welcome contributions to this blog. To announce an event or to contribute a post, or simply to provide feedback or suggestions about this blog, please contact the blog editor: David Smith. [Accessed October 221, 2009]. Available at: http://blog.revolution-computing.com/

Webpage: Thomas Lumley. survey: analysis of complex survey samples. Excerpt: "Summary statistics, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase subsampling designs. Graphics. Predictive margins by direct standardization. PPS sampling without replacement. Principal components, factor analysis." [Accessed on September 20, 2011]. http://cran.r-project.org/web/packages/survey/index.html.

Alex Zolot. useR! 2010: Work with R on Amazon's Cloud. Abstract: "Usage of R is often constrained by available memory and/or cpu power. Cloud computing allows users to get as much resources as necessary in any specific moment. The tutorial will cover software tools and procedures that are useful to manage R applications on Amazon's Elastic Compute Cloud (EC2) and Simple Storage Service (S3) cloud services." [Accessed September 8, 2010]. Available at: http://user2010.org/tutorials/Zolot.html.

Verzani J. Using R for Introductory Statistics. Boca Raton FL: Chapman & Hall/CRC (2005). Description: This book provides a general introduction to statistics and incorporates examples in R throughout the text. [BookFinder4U link]

Soukup M. Using R: Perspectives of a FDA Statistical Reviewer. Description: The author provides perspective from the government's side about validation of statistical software in general and validation of R in particular. This is a PowerPoint presentation from the 2007 Use R! conference. [Accessed October 21, 2009]. user2007.org/program/presentations/soukup.pdf

2008

16. Stats: Running R on a web server (June 17, 2008). I'm working on a project for planning and monitoring accrual patterns in clinical trials. This will eventually lead, I hope, to a grant to support this work. I have some existing R scripts and want to examine the possibility of running those scripts on a web page.

2007

15. Stats: Stair step interpolation in R (November 15, 2007). I am working on some charts that show discrete (sudden) jumps at specific time points. This requires the use of stair step interpolation, because if you just connected the lines, it would imply a linear transition between consecutive points.

14. Stats: useR! 2007 conference in Ames, Iowa (April 6, 2007). I may not be able to go to it, but the R community has an annual meeting, useR!, that will be held this year in Ames, Iowa from August 8-10. The web site for this conference (user2007.org) provides some of the details.

13. Stats: R Wiki (March 27, 2007). I use R software for a lot of my complex data analyses and have written up a few web pages about various things you can do with R. It turns out that R has a Wiki site, R Wiki: rwiki.sciviews.org/doku.php.

12. Stats: Randomly dividing a dataset in R (March 16, 2007). I'm working with someone who wants to do a simple cross-validation of a statistcal procedure. One simple way to do this is to randomly divide a data set into two piece. Assume that you have a matrix or data frame (x) that has n rows and you want to split the data set into a group that has proportion p of the rows and a group that has the remaining proportion (1-p). You want to do this randomly. Here is the code in R to do this.

11. Stats: Modular arithmetic and rounding in R (February 1, 2007). In certain programming situations, you need to perform calculations involving division that produce whole numbers as a result. For example, if you divided 27 by 4, you would get 6.75, but if you were using whole numbers only, then your result would be 6 with a remainder of 3. In R, the operator `%/%` produces an integer division, and the operator `%%` computes the remainder. So in R, the result of `27%/%4` would be `6` and the result of `27%%4` would be `3`.

2006

10. Stats: Graphics options in R (September 12, 2006). When you are producing graphics in R, the default option does not save your graphs for later review. You can change this in several ways. My comments will discuss the options for R running under Microsoft Windows. There are similar approaches that work for other systems.

9. Stats: R libraries for sample size justification (July 28, 2006). There are a lot of good commercial and free sources for sample size justification. Note that most people use the term power calculation, but there is more than one way to justify a sample size, so I try to avoid the term "power calculation" as being too restrictive. Anyway, I just noted an email on the MedStats list that suggests two R libraries.

8. Stats: Colors for R graphs (June 28, 2006). I tend to use color sparingly in graphs because most of my graphs end up in black and white in the final production. Even on my web pages, which appear in color, I try to avoid too much use of color because I often print these pages on a black and white printer.

2005

7. Stats: Object oriented features of R (December 19, 2005). If you want to do any serious data analysis in R, you need to learn some of the object oriented features that this program has.

6. Stats: Group Sequential Monitoring of Clinical Trials in R (December 13, 2005). It is very expensive to purchase software that performs group sequential monitoring of clinical trials (sometimes called interim analysis). Group sequential monitoring is looking at a trial at selected time points during the study to see if you should stop the study early. There are a couple of functions in R that will do simple calculations, and the price, of course, is free.

5. Stats: Two nice R libraries (October 14, 2005). I found a couple of nice libraries in R available from CRAN (Comprehensive R Achive Network). The first, vcd, was recommended by a regular contributor to the epidemio-l list. This library provides visualization techniques with special emphasis on categorical data. I found the second library, epitools, when I went searching on the web for resources to calculate an exact confidence interval for a Poisson rate. In addition to the exact Poisson intervals, the package can perform age standardization, draw epidemic curves, and has a variety of useful utility functions and interesting data sets.

4. Stats: A simple trick in R (October 11, 2005). There may be times when you have a string in R that represents a specific R command. How would you run this command?

3. Stats: Dates in Excel and R (August 10, 2005). Every program uses a slightly different method for calculating date values. Excel, for example, counts the number of days since the start of 1900 (January 1, 1900=1) for Windows, but for the Macintosh it uses 1904 instead of 1900. R counts the number of days since the start of 1970 (January 1, 1970=0). It ignores fractional portions of the day.

2. Stats: Moving R objects (July 28, 2005). I regularly work from home on my laptop, and when I need to re-run some analyses in R, I usually just re-create the original data sets. But there are several ways you can transfer objects from one R system to another.

1. Stats: String manipulations in R (May 10, 2005). As part of my efforts to analyze microarray data, I am finding that I need to do simple string manipulations in R. Here is a list of functions that might help.

What now?

Browse other categories at this site