R Setup


I’ll skip through the steps needed to install R. I happen to currently be using RStudio desktop so I will briefly go through some of the steps needed to setup various version control applications.

I’ll also go through the basic of packages and other necessary processes.

Packages

A repository is a central location where many developed packages are located and available for download.

There are three big repositories:
1. CRAN (Comprehensive R Archive Network): R’s main repository (>12,100 packages available!)
2. BioConductor: A repository mainly for bioinformatic-focused packages
3. GitHub: A very popular, open source repository (not R specific!)

CRAN

If you are installing from the CRAN repository, use the install.packages() function, with the name of the package you want to install in quotes between the parentheses (note: you can use either single or double quotes). For example, if you want to install the package “ggplot2”, you would use:

install.packages("ggplot2")

If you want to install multiple packages at once, you can do so by using a character vector, like:

install.packages(c("ggplot2", "devtools", "lme4"))

Bioconductor

The BioConductor repository uses their own method to install packages. First, to get the basic functions required to install through BioConductor, use:

source("https://bioconductor.org/biocLite.R")

This makes the main install function of BioConductor, biocLite(), available to you. Following this, you call the package you want to install in quotes, between the parentheses of the biocLite command, like so:

biocLite("GenomicFeatures")

GitHub

This is a more specific case that you probably won’t run into too often. In the event you want to do this, you first must find the package you want on GitHub and take note of both the package name AND the author of the package. Check out this guide for installing from GitHub, but the general workflow is:

  1. install.packages("devtools") - only run this if you don’t already have devtools installed. If you’ve been following along with this lesson, you may have installed it when we were practicing installations using the R console

  2. library(devtools) - more on what this command is doing immediately below this

  3. install_github("author/package") replacing “author” and “package” with their GitHub username and the name of the package.

Load packages

Installing a package does not make its functions immediately available to you. First you must load the package into R; to do so, use the library() function. Think of this like any other software you install on your computer. Just because you’ve installed a program, doesn’t mean it’s automatically running - you have to open the program. Same with R. You’ve installed it, but now you have to “open” it. For example, to “open” the “ggplot2” package, you would run:

library(ggplot2)

Checking for updates

If you aren’t sure if you’ve already installed a package, or want to check what packages are installed, you can use either of:

installed.packages() or library()

with nothing between the parentheses to check! Or look in the Packages Tab in the 4th quadrant.

Updating

You can check what packages need an update with a call to the function

old.packages()

This will identify all packages that have been updated since you installed them/last updated them. To update all packages, use

update.packages()

If you only want to update a specific package, just use once again

install.packages("packagename")  #which is the same as installing it

Unload packages

Sometimes you want to unload a package in the middle of a script - the package you have loaded may not play nicely with another package you want to use.

To unload a given package you can use the detach() function. For example,

detach("package:ggplot2", unload=TRUE)

would unload the ggplot2 package (that we loaded earlier). Within the RStudio interface, in the Packages tab, you can simply unload a package by unchecking the box beside the package name.

Uninstall packages

If you no longer want to have a package installed, you can simply uninstall it using the function remove.packages(). For example,

remove.packages("ggplot2")

Within RStudio, in the Packages tab, clicking on the “X” at the end of a package’s row will uninstall that package.

Help

Once you know what function within a package you want to use, you simply call it in the console like any other function we’ve been using throughout this lesson. Once a package has been loaded, it is as if it were a part of the base R functionality.

If you still have questions about what functions within a package are right for you or how to use them, many packages include “vignettes.” These are extended help files, that include an overview of the package and its functions, but often they go the extra mile and include detailed examples of how to use the functions in plain words that you can follow along with to see how to use the package. To see the vignettes included in a package, you can use the browseVignettes() function. For example, let’s look at the vignettes included in ggplot2:

browseVignettes("ggplot2")

You should see that there are two included vignettes: “Extending ggplot2” and “Aesthetic specifications.” Exploring the Aesthetic specifications vignette is a great example of how vignettes can be helpful, clear instructions on how to use the included functions.

Version control


Vocabulary

There is a lot of vocabulary involved in working with Git, and often the understanding of one word relies on your understanding of a different Git concept. Take some time to familiarize yourself with the words below and read over it a few times to see how the concepts relate.

Repository

Equivalent to the project’s folder/directory - all of your version controlled files (and the recorded changes) are located in a repository. This is often shortened to repo. Repositories are what are hosted on GitHub and through this interface you can either keep your repositories private and share them with select collaborators, or you can make them public - anybody can see your files and their history.

Commit

To commit is to save your edits and the changes made. A commit is like a snapshot of your files: Git compares the previous version of all of your files in the repo to the current version and identifies those that have changed since then. Those that have not changed, it maintains that previously stored file, untouched. Those that have changed, it compares the files, logs the changes and uploads the new version of your file. We’ll touch on this in the next section, but when you commit a file, typically you accompany that file change with a little note about what you changed and why.

When we talk about version control systems, commits are at the heart of them. If you find a mistake, you revert your files to a previous commit. If you want to see what has changed in a file over time, you compare the commits and look at the messages to see why and who.

Push

Updating the repository with your edits. Since Git involves making changes locally, you need to be able to share your changes with the common, online repository. Pushing is sending those committed changes to that repository, so now everybody has access to your edits.

Pull

Updating your local version of the repository to the current version, since others may have edited in the meanwhile. Because the shared repository is hosted online and any of your collaborators (or even yourself on a different computer!) could have made changes to the files and then pushed them to the shared repository, you are behind the times! The files you have locally on your computer may be outdated, so you pull to check if you are up to date with the main repository.

Staging

The act of preparing a file for a commit. For example, if since your last commit you have edited three files for completely different reasons, you don’t want to commit all of the changes in one go; your message on why you are making the commit and what has changed will be complicated since three files have been changed for different reasons. So instead, you can stage just one of the files and prepare it for committing. Once you’ve committed that file, you can stage the second file and commit it. And so on. Staging allows you to separate out file changes into separate commits. Very helpful!

To summarize these commonly used terms so far and to test whether you’ve got the hang of this, files are hosted in a repository that is shared online with collaborators. You pull the repository’s contents so that you have a local copy of the files that you can edit. Once you are happy with your changes to a file, you stage the file and then commit it. You push this commit to the shared repository. This uploads your new file and all of the changes and is accompanied by a message explaining what changed, why and by whom.

Branch

When the same file has two simultaneous copies. When you are working locally and editing a file, you have created a branch where your edits are not shared with the main repository (yet) - so there are two versions of the file: the version that everybody has access to on the repository and your local edited version of the file. Until you push your changes and merge them back into the main repository, you are working on a branch. Following a branch point, the version history splits into two and tracks the independent changes made to both the original file in the repository that others may be editing, and tracking your changes on your branch, and then merges the files together.

Merge

Independent edits of the same file are incorporated into a single, unified file. Independent edits are identified by Git and are brought together into a single file, with both sets of edits incorporated. But, you can see a potential problem here - if both people made an edit to the same sentence that precludes one of the edits from being possible, we have a problem! Git recognizes this disparity (conflict) and asks for user assistance in picking which edit to keep.

Flow of version control.

Conflict

When multiple people make changes to the same file and Git is unable to merge the edits. You are presented with the option to manually try and merge the edits or to keep one edit over the other.

Clone

Making a copy of an existing Git repository. If you have just been brought on to a project that has been tracked with version control, you would clone the repository to get access to and create a local version of all of the repository’s files and all of the tracked changes.

Fork

A personal copy of a repository that you have taken from another person. If somebody is working on a cool project and you want to play around with it, you can fork their repository and then when you make changes, the edits are logged on your repository, not theirs.

GitHub


Install Git

git-scm.com/download

Learned this the hard way. So I’ll put it here and now since I spent 8 hours trying to figure out why RStudio wouldn’t show the Git Tab, as described later in this section. The problem is when I installed Git, it installed it in C/Program Files/Git. That’s not a problem or so I thought till later when the Git Tab didn’t appear. After 8 hours of trying I decided to uninstall Git and reinstall it in a different directory: C/Users/XXXX/Git.

Configure Git

Now that Git is installed, we need to configure it for use with GitHub, in preparation for linking it with RStudio.

We need to tell Git what your username and email are, so that it knows how to name each commit as coming from you. To do so, in the command prompt (either Git Bash for Windows or Terminal for Mac), type:

git config --global user.name "Jane Doe"

with your desired username in place of “Jane Doe.” This is the name each commit will be tagged with. Following this, in the command prompt, type:

git config --global user.email janedoe@gmail.com

At this point, you should be set for the next step, but just to check, confirm your changes by typing:

git config --list

Doing so, you should see the username and email you selected above. If you notice any problems or want to change these values, just retype the originalconfigcommands from earlier with your desired changes.

Once you are satisfied that your username and email is correct, exit the command line by typing exit and hit Enter.

Global options

  • In RStudio, go to Tools > Global Options > Git/SVN

  • Sometimes the default path to the Git executable is not correct. Confirm that git.exe resides in the directory that RStudio has specified; if not, change the directory to the correct path. Otherwise, click OK or Apply.

  • So look for git.exe and once I installed Git in the Users directory RStudio found it automatically and pre-filled the location …../bin/git.exe

SSH RSA key

  • Still in the same window as above, click on Create RSA Key

  • Enter a phrase if you want

  • Create

  • Close

  • Back at the same window, View Public Key

  • Copy the key

Github SSH key

To do so, go to github, log-in if you are not already, and go to your account settings. There, go to “SSH and GPG keys” and click “New SSH key”. Paste in the public key you have copied from RStudio into the Key box and give it a Title related to RStudio. Confirm the addition of the key with your GitHub password.

Github GPG key

I was getting all kinds of errors as to missing GPG key when I was trying to submit changes to github from Rstudio, so I decided to create a gpg key:

  • I looked up the steps on github.com on how to create it

  • Long process that starts in CMD

  • After generating the key, copy from command prompt window

  • Back to github > Click on new GPG key

  • Follow steps

Clone repository


  • Create new repository on github

  • Copy link to new repository

New RStudio project

  • Back at Rstudio

  • Create a new project

  • Version Control, we want to clone the repository from github here

  • Choose Git next

  • Enter the link to the new repository

  • Create

  • Rstudio will clone the git repository here

  • Creat a new R script file

  • Enter code in it

  • Save the file

Commit file

  • In this pane you’ll see a Git tab

  • That took me 8 hours to figure out why mine never showed that tab

  • In this tab you’ll see the files created and cloned from github

  • Click on checkbox next to file just created

  • Next window shows all the files that have been modified with the one file with a checkmark

  • Type in a commit message in the box

  • Click commit

  • Close window

Push file

  • So far we’ve created a file

  • Saved it

  • Staged it

  • Committed it

  • What’s left is to PUSH it to github

  • PUSH by clicking on the green UP arrow button

  • You get a response that it is completed, and if you want go to github and check

Existing RStudio project

Sometimes, however, you may already have an R Project that isn’t yet under version control or linked with GitHub. So what if you already have an R Project that you’ve been working on, but don’t have it linked up to any version control software.

If you have an existing project then skip the first section:

Go to File > New Project > New Directory > New Project and name your project. Since we are trying to emulate a time where you have a project not currently under version control, do NOT click “Create a git repository”. Click Create Project

git bash

We’ve now created an R Project that is not currently under version control. Let’s fix that. First, let’s set it up to interact with Git. Open Git Bash or Terminal and navigate to the directory containing your project files. Move around directories by typing

cd dir/name/of/path/to/file
cd D/Education/~/~/filename

Initiate a local git repository

When the command prompt in the line before the dollar sign says the correct directory location of your project, you are in the correct location. Once here, type

git init #(master) is now appearing after the prompt. next type
git add .  

this initializes (init) this directory as a git repository and adds all of the files in the directory (.) to your local repository.

Commit to repository

Commit these changes to the git repository using

~ /d/Education/~/~/~udy_1 (master)
$ git commit -m "Initial commit"
[master (root-commit) 9368e77] Initial commit
9 files changed, 50 insertions(+)
create mode 100644 .Rhistory
create mode 100644 .Rproj.user/3D181177/pcs/files-pane.pper
create mode 100644 .Rproj.user/3D181177/pcs/source-pane.pper
create mode 100644 .Rproj.user/3D181177/pcs/windowlayoutstate.pper
create mode 100644 .Rproj.user/3D181177/pcs/workbench-pane.pper
create mode 100644 .Rproj.user/3D181177/rmd-outputs
create mode 100644 .Rproj.user/3D181177/saved_source_markers
create mode 100644 .Rproj.user/shared/notebooks/patch-chunk-names
create mode 100644 ~.Rproj

R Markdown


R Markdown is a way of creating fully reproducible documents, in which both text and code can be combined. One of the main benefits is the reproducibility of using R Markdown. Since you can easily combine text and code chunks in one document, you can easily integrate introductions, hypotheses, your code that you are running, the results of that code and your conclusions all in one document. Sharing what you did, why you did it and how it turned out becomes so simple - and that person you share it with can re-run your code and get the exact same answers you got.

See sample file I created in Rstudio.

Easy commands

At this point, I hope we’ve convinced you that R Markdown is a useful way to keep your code/data and have set you up to be able to play around with it. To get you started, we’ll practice some of the formatting that is inherent to R Markdown documents.

To start, let’s look at bolding and italicising text. To bold text, you surround it by two asterisks on either side. Similarly, to italicise text, you surround the word with a single asterisk on either side. **bold** and *italics* respectively.

We’ve also seen from the default document that you can make section headers. To do this, you put a series of hash marks (#). The number of hash marks determines what level of heading it is. One hash is the highest level and will make the largest text (see the first line of this lecture), two hashes is the next highest level and so on. Play around with this formatting and make a series of headers, like so:

# Header level 1
## Header level 2
### Header level 3...

The other thing we’ve seen so far is code chunks. To make an R code chunk, you can type the three backticks, followed by the curly brackets surrounding a lower case R, put your code on a new line and end the chunk with three more backticks. Thankfully, RStudio recognized you’d be doing this a lot and there are short cuts, namely Ctrl+Alt+I (Windows) or Cmd + Option + I (Mac). Additionally, along the top of the source quadrant, there is the “Insert” button, that will also produce an empty code chunk. Try making an empty code chunk. Inside it, type the code print("Hello world"). When you knit your document, you will see this code chunk and the (admittedly simplistic) output of that chunk.

If you aren’t ready to knit your document yet, but want to see the output of your code, select the line of code you want to run and use Ctrl+Enter or hit the “Run” button along the top of your source window. The text “Hello world” should be output in your console window. If you have multiple lines of code in a chunk and you want to run them all in one go, you can run the entire chunk by using Ctrl+Shift+Enter OR hitting the green arrow button on the right side of the chunk OR going to the Run menu and selecting Run current chunk.

One final thing we will go into detail on is making bulleted lists, like the one at the top of this lesson. Lists are easily created by preceding each prospective bullet point by a single dash, followed by a space. Importantly, at the end of each bullet’s line, end with TWO spaces. This is a quirk of R Markdown that will cause spacing problems if not included.

  • Try

  • Making

  • Your

  • Own

  • Bullet

  • List!

This is a great starting point and there is so much more you can do with R Markdown. Thankfully, RStudio developers have produced an “R Markdown cheatsheet” that we urge you to go check out and see everything you can do with R Markdown! The sky is the limit!