# Session 11Introduction to R and RStudio

## 11.1 R and RStudio

R is an free software environment most commonly used for statistical analysis and computation. Because Learning Days participants arrive with different statistical backgrounds and preferred statistical softwares, we will use R to ensure that everyone is on the same page. We advocate the use of R more generally for its flexibility, wealth of applications, and comprehensive online support (mostly forums).

RStudio is a free, open source integrated development environment for with an user interface that makes R much more user-friendly. R Markdown, a feature of RStudio, enables the easy output of code, results, and text in a .pdf, .html, or .doc format.

RStudio can be freely downloaded from the RStudio website, https://www.rstudio.com/products/rstudio/download/. In the table, click the green Download button at the bottom of the left column, “RStudio Desktop Open Source License” as depicted below in Figure 1. Once you select this button, the page will jump to a list of download options as depicted in Figure 2 (page 3).

• For Windows, select Windows Vista/7/8/10.
• For Mac OS X, select Mac OS X 10.6+ (64-bit).
knitr::include_graphics(here("Resources/Images", "new_rstudio.png"))
knitr::include_graphics("Resources/Images/rstudio_download.png")

## 11.3 RStudio Interface

When you open RStudio for the first time, there should be three panels visible, as depicted in Figure 3 (page 3).

• Console (left panel)
• Accounting (upper right panel): includes Environment and History tabs
• Miscellaneous (lower right panel)
knitr::include_graphics("Resources/Images/rstudio_intro.png")

### 11.3.1 Console

One can execute all operations in the console. For example if one entered 4 + 4 and hit the Enter/Return key, the console will return [1] 8.

To make sure everyone is prepared to use R at Learning Days, we ask you to run three lines of code in the Console to download several R packages. Packages are fragments of reproducible code that allow for more efficient analysis in R. To run these lines, copy the following code into the console and hit your Return/Enter key. Note that you must be connected to the internet to download packages.

install.packages(c("ggplot2", "dplyr", "AER", "arm", "MASS", "sandwich", "lmtest", "randomizr"))

If successfully downloaded, your console will resemble Figure 4 (page 5), though the urls will be different as a function of your location.

knitr::include_graphics("Resources/Images/console2.png")

### 11.3.2 Editor

In order to write and save reproducible code, we will open a fourth panel, the Editor, by clicking on the icon with a white page with a plus sign on the upper-left corner of the RStudio interface and selecting R Script, as depicted in Figure 5 (page 6).

knitr::include_graphics("Resources/Images/new_script.png")

Once the R script is opened, there should be four panels within the RStudio interface, now with the addition of the Editor panel. We can execute simple arithmetic by entering a formula in the editor and pressing Control + Enter (Windows) or Command + Enter (Mac). The formula and the “answer” will appear in the Console, as depicted in Figure 6 (page 6), with red boxes for emphasis.

knitr::include_graphics("Resources/Images/first_addition.png")

R can be used for any arithmetic operation, including but not limited to: addition (+), subtraction (-), scalar multiplication (*), division (/), or exponentiation (^).

### 11.3.3 Accounting

Beyond basic functions, we can also store values, data, and functions in the global environment. To assign a value to a variable, use the <- operator. All stored values, functions, and data will appear in the environment tab in the Accounting panel. In Figure 7 (page 6), we define the variable t to take the value $$3 \times \frac{6}{14}$$, and can see that it is stored under Values. We also load a dataset. Here, “ChickWeight” is a dataset built into R; most datasets will be loaded from the web or other files on your computer through an alternate method. We can see that ChickWeight contains 578 observations of 4 variables and is stored in the Environment. By clicking on the name ChickWeight a tab will enter with the dataset in your Editor window.

knitr::include_graphics("Resources/Images/save_data.png")

R provides many tools to analyze and view data that we will discuss in more depth at Learning Days. For now, we can learn some basic tools to examine the data. The function head() allows us to see the first six rows of the dataset. summary() summarizes each of the columns of the dataset and dim() provides the dimensions of the dataset in terms of the number of rows and then columns.

head(ChickWeight) # First 6 observations in dataset
Grouped Data: weight ~ Time | Chick
weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1
summary(ChickWeight) # Summary of all variables
     weight         Time          Chick     Diet
Min.   : 35   Min.   : 0.0   13     : 12   1:220
1st Qu.: 63   1st Qu.: 4.0   9      : 12   2:120
Median :103   Median :10.0   20     : 12   3:120
Mean   :122   Mean   :10.7   10     : 12   4:118
3rd Qu.:164   3rd Qu.:16.0   17     : 12
Max.   :373   Max.   :21.0   19     : 12
(Other):506          
dim(ChickWeight) # Dimensions of the dataset in the order rows, columns
[1] 578   4

Unlike other statistical software, R allows for the storage of multiple datasets, possibly of different dimensions, simultaneously. As such it is quite flexible for analysis using multiple methods.

### 11.3.4 Miscellaneous

R provides a suite of tools, ranging from built in plot functions to packages, to graph data, models, and estimates, etc. The final “miscellaneous” panel allows for viewing of graphs with ease in R studio. Figure 8 (page 9) depicts a plot in this panel. We will discuss how to plot data during Learning Days; for now, don’t worry about the graphing the code in the Editor.

knitr::include_graphics("Resources/Images/graph.png")

## 11.4 Learning to Use R

### 11.4.1 Online Resources

• Code School, which runs entirely through your browser https://www.codeschool.com/courses/try-r.
• Coursera, via an online R Programming course organized by Johns Hopkins University:
1. Go to coursera.org
2. Create an account (this is free!)
3. Sign up for R Programming at Johns Hopkins University (instructor: Roger Peng) under the “Courses” tab
4. Read the materials and watch the videos from the first week. The videos from the first week are about 2.5 hours long total.

### 11.4.2 Basic Practice

Here we provide some fragments of code to familiarize you with some basic practices in R. We recommend that you practice by typing the code fragments into your Editor and then evaluating them.

#### 11.4.2.1 Setting up an R Session

In general, we read other files such as data or functions into R and output results like graphs or tables into files not contained within an R session. To do this, we must give R an “address” at which it can locate such files. It may be most efficient to do this by setting a working directory, a file path at which relevant files are stored. We can identify the current working directory using getwd() and set a new one using setwd(). Note that the syntax of these filepaths varies by operating system.

getwd()
[1] "/Users/nichino/Projects/learningdays-guide/Guide"
setwd("~TaraLyn/Dropbox/EGAP Learning Days Admin/Workshop 2018_2 (Uruguay)/")

You may need to install packages beyond those listed above to execute certain functions. To install packages we use install.packages(""), filling in the package name between the "" marks, as follows. You need only install packages once.

install.packages("Hmisc")

Once a package is downloaded, it can be loaded and accessed using library() where the package name is inserted between the parentheses (no "" marks).

library(Hmisc)

To clear R’s memory, namely the stored data, functions, or values that appear in the accounting tab, use rm(list = ls()). It may be useful to set a random number seed to ensure that replication is possible, particularly when we work with simulation-based methods.

rm(list = ls())
set.seed(2018) # Optional: Set a seed to make output replicable

#### 11.4.2.2 R Basics

We now explore some of the basic commands we will use during learning days. In order to assign a scalar (single element) to a variable, we use the <- command as discussed previously:

(a <- 5) # "<-"  is the assignment command; it is used to define things. eg:
[1] 5

We may also want to assign a vector of elements to a variable. Here we use the same <- command, but focus on how to create the vector.

(b <- 1:10) # ":"  is used to define a string of integers
 [1]  1  2  3  4  5  6  7  8  9 10
(v <- c(1, 3, 2, 4, pi)) # use c() to make a vector with anything in it
[1] 1.000 3.000 2.000 4.000 3.142

We can then refer to elements of a vector by denoting their position in a vector inside hard brackets [].

# Extract elements of a vector:
b[1]                   # Returns position 1
b[5:4]                 # Returns positions 5 and 4, in that order
b[-1]                  # Returns all but the first number

# Returns all numbers indicated as "TRUE"
b[c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)]

# Assign new values to particular elements of a vector
b[5] <- 0

There are a set of built-in functions that can be applied to vectors like b.

sum(b) # Sum of all elements
[1] 55
mean(b) # Mean of all elements
[1] 5.5
max(b) # Maximum of all elements
[1] 10
min(b) # Minimum of all elements
[1] 1
sd(b) # Standard deviation of all elements
[1] 3.028
var(b) # Variance of all elements
[1] 9.167

We can also apply arithmetic transformations to all elements of a vector:

b^2 # Square the variable
 [1]   1   4   9  16  25  36  49  64  81 100
b^.5 # Square root of the variable
 [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162
log(b) # Log of variable
 [1] 0.0000 0.6931 1.0986 1.3863 1.6094 1.7918 1.9459 2.0794 2.1972 2.3026
exp(b) # e to the b
 [1]     2.718     7.389    20.086    54.598   148.413   403.429  1096.633  2980.958  8103.084 22026.466

Finally, we can evaluate logical statements (i.e. is condition X true?’’) on all elements of a vector:

b == 2 # Is equal to
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
b < 5 # Less than
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
b >= 5 # Greater than or equal to
 [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
b <= 5 | b / 4 == 2 # | means OR
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
b > 2 & b < 9 # & means AND
 [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
is.na(b) # Indicates if data is missing
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
which(b < 5) # Gives indices of values meeting logical requirement
[1] 1 2 3 4

The basic logic of these commands applies to data structures much more complex than scalars and vectors. Understanding of these basic features will help facilitate your understanding of more advanced topics during Learning Days.