Session 11 Introduction to R and RStudio

11.1 R and RStudio

R is an free software environment most commonly used for statistical analysis and computation. Because Learning Days participants arrive with different statistical backgrounds and preferred statistical softwares, we will use R to ensure that everyone is on the same page. We advocate the use of R more generally for its flexibility, wealth of applications, and comprehensive online support (mostly forums).

RStudio is a free, open source integrated development environment for with an user interface that makes R much more user-friendly. R Markdown, a feature of RStudio, enables the easy output of code, results, and text in a .pdf, .html, or .doc format.

This document provides a tutorial on downloading R and RStudio in addition to an introduction to the interface.

11.2 Downloading R and RStudio

11.2.1 Downloading R

R can be freely downloaded from CRAN at the link corresponding to your operating system:

11.2.2 Downloading RStudio

RStudio can be freely downloaded from the RStudio website, https://www.rstudio.com/products/rstudio/download/. In the table, click the green Download button at the bottom of the left column, “RStudio Desktop Open Source License” as depicted below in Figure 1. Once you select this button, the page will jump to a list of download options as depicted in Figure 2 (page 3).

  • For Windows, select Windows Vista/7/8/10.
  • For Mac OS X, select Mac OS X 10.6+ (64-bit).
Select **Download** in the "RStudio Desktop Open Source License" column.

Figure 11.1: Select Download in the “RStudio Desktop Open Source License” column.

Select the Windows Vista/7/8/1 link for Windows or the Mac OS X 10.6+ (64-bit) link for Mac.

Figure 11.2: Select the Windows Vista/7/8/1 link for Windows or the Mac OS X 10.6+ (64-bit) link for Mac.

11.3 RStudio Interface

When you open RStudio for the first time, there should be three panels visible, as depicted in Figure 3 (page 3).

  • Console (left panel)
  • Accounting (upper right panel): includes Environment and History tabs
  • Miscellaneous (lower right panel)
When you open RStudio, there are three panels visible: the Console (left), Accounting (upper right), and Miscellaneous (lower right).

Figure 11.3: When you open RStudio, there are three panels visible: the Console (left), Accounting (upper right), and Miscellaneous (lower right).

11.3.1 Console

One can execute all operations in the console. For example if one entered 4 + 4 and hit the Enter/Return key, the console will return [1] 8.

To make sure everyone is prepared to use R at Learning Days, we ask you to run three lines of code in the Console to download several R packages. Packages are fragments of reproducible code that allow for more efficient analysis in R. To run these lines, copy the following code into the console and hit your Return/Enter key. Note that you must be connected to the internet to download packages.

If successfully downloaded, your console will resemble Figure 4 (page 5), though the urls will be different as a function of your location.

An image of the console after executing the three lines of code listed above.

Figure 11.4: An image of the console after executing the three lines of code listed above.

11.3.2 Editor

In order to write and save reproducible code, we will open a fourth panel, the Editor, by clicking on the icon with a white page with a plus sign on the upper-left corner of the RStudio interface and selecting R Script, as depicted in Figure 5 (page 6).

Create a new R script and open the editor panel by selecting `R Script` from the dropdown menu.

Figure 11.5: Create a new R script and open the editor panel by selecting R Script from the dropdown menu.

Once the R script is opened, there should be four panels within the RStudio interface, now with the addition of the Editor panel. We can execute simple arithmetic by entering a formula in the editor and pressing Control + Enter (Windows) or Command + Enter (Mac). The formula and the “answer” will appear in the Console, as depicted in Figure 6 (page 6), with red boxes for emphasis.

An arithmetic expression is entered in the editor and evaluated in the console. The red boxes are added for emphasis.

Figure 11.6: An arithmetic expression is entered in the editor and evaluated in the console. The red boxes are added for emphasis.

R can be used for any arithmetic operation, including but not limited to: addition (+), subtraction (-), scalar multiplication (*), division (/), or exponentiation (^).

11.3.3 Accounting

Beyond basic functions, we can also store values, data, and functions in the global environment. To assign a value to a variable, use the <- operator. All stored values, functions, and data will appear in the environment tab in the Accounting panel. In Figure 7 (page 6), we define the variable t to take the value \(3 \times \frac{6}{14}\), and can see that it is stored under Values. We also load a dataset. Here, “ChickWeight” is a dataset built into R; most datasets will be loaded from the web or other files on your computer through an alternate method. We can see that ChickWeight contains 578 observations of 4 variables and is stored in the Environment. By clicking on the name ChickWeight a tab will enter with the dataset in your Editor window.

The value 3 * (6/14) is assigned to the variable t (red) and the dataset ChickWeight is added to the global environment (blue). The boxes are added for emphasis.

Figure 11.7: The value 3 * (6/14) is assigned to the variable t (red) and the dataset ChickWeight is added to the global environment (blue). The boxes are added for emphasis.

R provides many tools to analyze and view data that we will discuss in more depth at Learning Days. For now, we can learn some basic tools to examine the data. The function head() allows us to see the first six rows of the dataset. summary() summarizes each of the columns of the dataset and dim() provides the dimensions of the dataset in terms of the number of rows and then columns.

Grouped Data: weight ~ Time | Chick
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1
     weight         Time          Chick     Diet   
 Min.   : 35   Min.   : 0.0   13     : 12   1:220  
 1st Qu.: 63   1st Qu.: 4.0   9      : 12   2:120  
 Median :103   Median :10.0   20     : 12   3:120  
 Mean   :122   Mean   :10.7   10     : 12   4:118  
 3rd Qu.:164   3rd Qu.:16.0   17     : 12          
 Max.   :373   Max.   :21.0   19     : 12          
                              (Other):506          
[1] 578   4

Unlike other statistical software, R allows for the storage of multiple datasets, possibly of different dimensions, simultaneously. As such it is quite flexible for analysis using multiple methods.

11.3.4 Miscellaneous

R provides a suite of tools, ranging from built in plot functions to packages, to graph data, models, and estimates, etc. The final “miscellaneous” panel allows for viewing of graphs with ease in R studio. Figure 8 (page 9) depicts a plot in this panel. We will discuss how to plot data during Learning Days; for now, don’t worry about the graphing the code in the Editor.

An example plot of the `ChickWeight` data made in R.

Figure 11.8: An example plot of the ChickWeight data made in R.

11.4 Learning to Use R

11.4.1 Online Resources

There are many helpful online resources to help you start learning R. We recommend two sources:

  • Code School, which runs entirely through your browser https://www.codeschool.com/courses/try-r.
  • Coursera, via an online R Programming course organized by Johns Hopkins University:
    1. Go to coursera.org
    2. Create an account (this is free!)
    3. Sign up for R Programming at Johns Hopkins University (instructor: Roger Peng) under the “Courses” tab
    4. Read the materials and watch the videos from the first week. The videos from the first week are about 2.5 hours long total.

11.4.2 Basic Practice

Here we provide some fragments of code to familiarize you with some basic practices in R. We recommend that you practice by typing the code fragments into your Editor and then evaluating them.

11.4.2.1 Setting up an R Session

In general, we read other files such as data or functions into R and output results like graphs or tables into files not contained within an R session. To do this, we must give R an “address” at which it can locate such files. It may be most efficient to do this by setting a working directory, a file path at which relevant files are stored. We can identify the current working directory using getwd() and set a new one using setwd(). Note that the syntax of these filepaths varies by operating system.

[1] "/Users/nichino/Projects/learningdays-guide/Guide"

You may need to install packages beyond those listed above to execute certain functions. To install packages we use install.packages(""), filling in the package name between the "" marks, as follows. You need only install packages once.

Once a package is downloaded, it can be loaded and accessed using library() where the package name is inserted between the parentheses (no "" marks).

To clear R’s memory, namely the stored data, functions, or values that appear in the accounting tab, use rm(list = ls()). It may be useful to set a random number seed to ensure that replication is possible, particularly when we work with simulation-based methods.

11.4.2.2 R Basics

We now explore some of the basic commands we will use during learning days. In order to assign a scalar (single element) to a variable, we use the <- command as discussed previously:

[1] 5

We may also want to assign a vector of elements to a variable. Here we use the same <- command, but focus on how to create the vector.

 [1]  1  2  3  4  5  6  7  8  9 10
[1] 1.000 3.000 2.000 4.000 3.142

We can then refer to elements of a vector by denoting their position in a vector inside hard brackets [].

# Extract elements of a vector:
b[1]                   # Returns position 1
b[5:4]                 # Returns positions 5 and 4, in that order
b[-1]                  # Returns all but the first number  

# Returns all numbers indicated as "TRUE"
b[c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)]  
                                                                          
# Assign new values to particular elements of a vector
b[5] <- 0

There are a set of built-in functions that can be applied to vectors like b.

[1] 55
[1] 5.5
[1] 10
[1] 1
[1] 3.028
[1] 9.167

We can also apply arithmetic transformations to all elements of a vector:

 [1]   1   4   9  16  25  36  49  64  81 100
 [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162
 [1] 0.0000 0.6931 1.0986 1.3863 1.6094 1.7918 1.9459 2.0794 2.1972 2.3026
 [1]     2.718     7.389    20.086    54.598   148.413   403.429  1096.633  2980.958  8103.084 22026.466

Finally, we can evaluate logical statements (i.e. ``is condition X true?’’) on all elements of a vector:

 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
 [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] 1 2 3 4

The basic logic of these commands applies to data structures much more complex than scalars and vectors. Understanding of these basic features will help facilitate your understanding of more advanced topics during Learning Days.