Overview
In this class you are going to be doing a lot of calculations. Many of these calculations are simple enough that they can be done with a calculator (which you will need for this class). Other calculations are sufficiently complicated that you will need to use something more powerful. R is the computer language that you will be using for most of your complicated statistical calculations and this tutorial will introduce you to the basics of the language R. You will use R to to perform calculations, to produce professional-quality graphics, and to wrangle data (we’ll define that term a bit more carefully in another tutorial).
When you are asked to solve a problem involving statistics there is always some communication that is necessary in order to share your results. In an English class you would use a program like Microsoft Word to produce a paper or an essay. In this class you will use a program called RStudio. The program RStudio will allow you to both document the process you use to solve a problem and to perform some of the calculations. (Some of the calculations you will be doing yourself and using RStudio to show your intermediate work). You can think of RStudio as a cross between a word processor (for statistics) and a sophisticated calculator. One important distinctionto keep in mind is the difference between the language (R) and the tool for using that language (RStudio).
For this class RStudio will be used via the cloud. This means that you can use most browsers (such as Safari, Firefox, Chrome, etc.) to access the program and store your files.
Advantages:
There are so many advantages to the approach that you will be learning that the statistics department requires you to use these tools. Here are a few of them:
- Your homework files will be stored online so you can use any internet-connected computer with a browser to access, edit, review, and ultimately create your homework.
- It will ultimately be very efficient to produce and communicate statistical results.
- You will be able to use the computer to perform calculations that are unfeasible to do by hand.
- If you use the tools correctly you will be self-documenting your process. (Something that is vitally important in all STEM disciplines)
Disadvantages:
There are two disadvantages:
- It takes more work, initially, to do your homework.
- You will need to have internet access.
I say initially because once you get the hang of the process it will actually take you far less time to properly communicate and produce your statistical results but there is a learning curve that can be very frustrating. However, any student in a STEM field will need to produce some form of technical writing at a professional level before graduation and the skills you learn mastering R and RStudio will make that substantially easier for you. In other words suffer now to avoid suffering later.
Just like Microsoft Word has a special type of file for storing documents (a .docx
file) RStudio has its own file format known as RMarkdown (or an .rmd
file) . For homework I will provide you with an RMarkdown file (on Canvas) that you will edit. The final step is to knit that document. The knit process will create an output file that you will submit to Canvas. As of 2020 RStudio allows you to knit an RMarkdownfile into the following three formats:
- a
.pdf
file, - an
.html
file, - and a
.docx
file.
I’ll walk you through an example in class and you can see a video about it in the next section of this tutorial.
For this class you will need to turn in your homework to Canvas as a pdf file.
In your RMarkdown file you can embed a code chunk. You can use theCode
menu and the Insert Chunk
option to have RStudio type all the special characters necessary to define a code chunk. When the document is knitted the code chunk will be executed and the output will become part of your document. The language used is called R
. This is how you perform your calculations and produce your graphs. The console
is the portion of RStudio where you can run R immediately.
Quiz One
Answer the questions below:
Workflow for Homework
Below is the Basic Outline for doing homework (more discussion below). We will go over this carefully in class:
- Get into RStudio using a web browser
- Get the assignment from Canvas to RStudio
- Do the Assignment (this means answering the questions)
- Prepare the Assignment to be turned in (this is called knitting)
- Get the knitted assignment from RStudio to Canvas
- Double check the assignment uploaded correctly and looks right (it is your responsibility to ensure there are no formatting issues)
You may find it helpful to watch a screencast showing how to do Assignment 0. . Warning in the video steps 1 and 2 (in the list above) are mixed togther for the sake of efficiency.
Access RStudio
Use the link on Canvas to access the RStudio webpage. You will be asked for a username and password. Use your internetID (mine is dolan118– yours is different) and your email password. (You can also use the following address if you prefer to type it yourself: http://umm-rstudio.morris.umn.edu/
Canvas to RStudio
For most assignments Canvas contains two versions of the homework– one is computer-ready it is a PDF file that you can download and look at immediately– this will let you start working on how to solve the problems immediately and provide a reference to make sure you haven’t messed anything up when you produce your file homework.
The second version is an RMarkdown file (We will call this file the raw assignment. When the raw assignment file is transferred to RStudio it can be knitted into a computer-ready PDF file.
I use the following process to get my raw assignment to RStudio:
- Download the RMarkdown version of the assignment to my local computer
- Upload the RMarkdown file on my local computer to RStudio
The first step is fairly straightforward (although it is a bit different for some Macs).
The second step requires you to use the “Upload button”
Do the Assignment
This is where you will edit the RMarkdown file using RStudio. The RMarkdown file converts simple text decorations to formatted output when it is knitted (for example, the text **example**
will turn into bold text when knitted: example).
Knitting the assignment
There is a button in RStudio that says one of three things:
- Knit PDF
- Knit HTML
- Knit Word
Next to it is a tiny down arrow that lets you change between the 3. Make certain your button says “Knit PDF”. Then click it. This creates a new file called Assignment0.pdf
You may need to click on “Files” (notice the plural– it’s over on the right– not in the menu) to see it.
RStudio to Canvas
Again, there are a few steps:
- Get the file from RStudio to your computer
- Get the file from your computer to Canvas
==RStudio to computer:
- Find your pdf file and put a checkmark in the box next to it
- Click on the “More” button and select “Export…”
- Tell your computer where to store the file on your computer
==Your computer to Canvas
- Find where the file was downloaded
- Log into the Canvas
- Click on the Upload link
- Follow the Directions
Be sure to SUBMIT the file.
Practice with R
The computer language understood by RStudio is R. Any R code in a code chunk that is part of an RMarkdown file is executed when the file is knitted. You can have your R code be executed immediately in the Console.
R understands the basic arithmetic operations and it respects the order of operations. Thus 2+3*5
is evaluated as 2+(3*5)
. To over-ride the usual order of operations you’ll need to use parenthesis (2+3)*5
. NOTE: R requires all the operations to be explicity specified. In other words (2+3)5
will generate an error.
operation | R Symbol | Example |
---|---|---|
Addition | + |
2+3 |
Subtraction | - |
2.1 - 3.4 |
Multiplication | * |
15*20.7 |
Division | / |
1/3 |
Grouping | () |
(2+3)*5 |
Here’s a chance to practice some R. Write the R code required to add two plus two, and then run the code to see the results:
Write some R code that uses +
, -
, *
, /
,^
, and some parenthesis (you can use several lines):
Variables in R
Just like algebra, R can use variables. Unlike algebra R remembers what you assign to a variable from line to line:
x<-10
2*x+3
## [1] 23
It’s a good idea to use informative variables:
age.of.teacher=46
2*age.of.teacher-12
## [1] 80
R has multiple ways to assign values to a variable. I would prefer to not add to the confusion by discussing them now, but you’ll be looking things up online so you need to know about them all. There are subtle differences between them that won’t matter to most of you so I’m not going to explain them. Here are the three most common forms of assignment:
x=10
2*x+1
## [1] 21
x<-11
2*x+1
## [1] 23
12->x
2*x+1
## [1] 25
Assign some value to the variable my.var
and then divide it by 10:
Assigning Multiple Values to a Variable
Use the c()
function to assign multiple values to a variable:
myVar<-c(1,2,7)
myVar
## [1] 1 2 7
Use []
to access individual elements:
myVar[3]
## [1] 7
Modify the code below to assign the values 2,3,10,5,-3
to the variable my.var
and then examine the first element in my.var
:
my.var<-c()
my.var[4]
A basic plot
Use the plot()
function to graph the contents of a variable. You can control the title using main=
:
x<-rnorm(100) # This makes 100 random values
y<-rnorm(100) # So does this
plot(x,y,main="my scatterplot")
I’ll put the code below. Run it a few times and see how the output changes. Change the title to something different too:
x<-rnorm(100) # This makes 100 random values
y<-rnorm(100) # So does this
plot(x,y,main="my scatterplot")