/name/of/path/to/file
cd dir/Education/~/~/filename cd D
R Setup
I’ll skip through the steps needed to install R. I happen to currently be using RStudio desktop so I will briefly go through some of the steps needed to setup various version control applications.
I’ll also go through the basic of packages and other necessary processes.
Packages
A repository is a central location where many developed packages are located and available for download.
There are three big repositories:
1. CRAN (Comprehensive R Archive Network): R’s main repository (>12,100 packages available!)
2. BioConductor: A repository mainly for bioinformatic-focused packages
3. GitHub: A very popular, open source repository (not R specific!)
CRAN
If you are installing from the CRAN repository, use the install.packages()
function, with the name of the package you want to install in quotes between the parentheses (note: you can use either single or double quotes). For example, if you want to install the package “ggplot2”, you would use:
install.packages("ggplot2")
If you want to install multiple packages at once, you can do so by using a character vector, like:
install.packages(c("ggplot2", "devtools", "lme4"))
Bioconductor
The BioConductor repository uses their own method to install packages. First, to get the basic functions required to install through BioConductor, use:
source("https://bioconductor.org/biocLite.R")
This makes the main install function of BioConductor, biocLite()
, available to you. Following this, you call the package you want to install in quotes, between the parentheses of the biocLite
command, like so:
biocLite("GenomicFeatures")
GitHub
This is a more specific case that you probably won’t run into too often. In the event you want to do this, you first must find the package you want on GitHub and take note of both the package name AND the author of the package. Check out this guide for installing from GitHub, but the general workflow is:
install.packages("devtools")
- only run this if you don’t already have devtools installed. If you’ve been following along with this lesson, you may have installed it when we were practicing installations using the R consolelibrary(devtools)
- more on what this command is doing immediately below thisinstall_github("author/package")
replacing “author” and “package” with their GitHub username and the name of the package.
Load packages
Installing a package does not make its functions immediately available to you. First you must load the package into R; to do so, use the library()
function. Think of this like any other software you install on your computer. Just because you’ve installed a program, doesn’t mean it’s automatically running - you have to open the program. Same with R. You’ve installed it, but now you have to “open” it. For example, to “open” the “ggplot2” package, you would run:
library(ggplot2)
Checking for updates
If you aren’t sure if you’ve already installed a package, or want to check what packages are installed, you can use either of:
installed.packages() or library()
with nothing between the parentheses to check! Or look in the Packages Tab in the 4th quadrant.
Updating
You can check what packages need an update with a call to the function
old.packages()
This will identify all packages that have been updated since you installed them/last updated them. To update all packages, use
update.packages()
If you only want to update a specific package, just use once again
install.packages("packagename") #which is the same as installing it
Unload packages
Sometimes you want to unload a package in the middle of a script - the package you have loaded may not play nicely with another package you want to use.
To unload a given package you can use the detach()
function. For example,
detach("package:ggplot2", unload=TRUE)
would unload the ggplot2 package (that we loaded earlier). Within the RStudio interface, in the Packages tab, you can simply unload a package by unchecking the box beside the package name.
Uninstall packages
If you no longer want to have a package installed, you can simply uninstall it using the function remove.packages()
. For example,
remove.packages("ggplot2")
Within RStudio, in the Packages tab, clicking on the “X” at the end of a package’s row will uninstall that package.
Help
Once you know what function within a package you want to use, you simply call it in the console like any other function we’ve been using throughout this lesson. Once a package has been loaded, it is as if it were a part of the base R functionality.
If you still have questions about what functions within a package are right for you or how to use them, many packages include “vignettes.” These are extended help files, that include an overview of the package and its functions, but often they go the extra mile and include detailed examples of how to use the functions in plain words that you can follow along with to see how to use the package. To see the vignettes included in a package, you can use the browseVignettes()
function. For example, let’s look at the vignettes included in ggplot2:
browseVignettes("ggplot2")
You should see that there are two included vignettes: “Extending ggplot2” and “Aesthetic specifications.” Exploring the Aesthetic specifications vignette is a great example of how vignettes can be helpful, clear instructions on how to use the included functions.
Version control
Vocabulary
There is a lot of vocabulary involved in working with Git, and often the understanding of one word relies on your understanding of a different Git concept. Take some time to familiarize yourself with the words below and read over it a few times to see how the concepts relate.
Repository
Equivalent to the project’s folder/directory - all of your version controlled files (and the recorded changes) are located in a repository. This is often shortened to repo. Repositories are what are hosted on GitHub and through this interface you can either keep your repositories private and share them with select collaborators, or you can make them public - anybody can see your files and their history.
Commit
To commit is to save your edits and the changes made. A commit is like a snapshot of your files: Git compares the previous version of all of your files in the repo to the current version and identifies those that have changed since then. Those that have not changed, it maintains that previously stored file, untouched. Those that have changed, it compares the files, logs the changes and uploads the new version of your file. We’ll touch on this in the next section, but when you commit a file, typically you accompany that file change with a little note about what you changed and why.
When we talk about version control systems, commits are at the heart of them. If you find a mistake, you revert your files to a previous commit. If you want to see what has changed in a file over time, you compare the commits and look at the messages to see why and who.
Push
Updating the repository with your edits. Since Git involves making changes locally, you need to be able to share your changes with the common, online repository. Pushing is sending those committed changes to that repository, so now everybody has access to your edits.
Pull
Updating your local version of the repository to the current version, since others may have edited in the meanwhile. Because the shared repository is hosted online and any of your collaborators (or even yourself on a different computer!) could have made changes to the files and then pushed them to the shared repository, you are behind the times! The files you have locally on your computer may be outdated, so you pull to check if you are up to date with the main repository.
Staging
The act of preparing a file for a commit. For example, if since your last commit you have edited three files for completely different reasons, you don’t want to commit all of the changes in one go; your message on why you are making the commit and what has changed will be complicated since three files have been changed for different reasons. So instead, you can stage just one of the files and prepare it for committing. Once you’ve committed that file, you can stage the second file and commit it. And so on. Staging allows you to separate out file changes into separate commits. Very helpful!
To summarize these commonly used terms so far and to test whether you’ve got the hang of this, files are hosted in a repository that is shared online with collaborators. You pull the repository’s contents so that you have a local copy of the files that you can edit. Once you are happy with your changes to a file, you stage the file and then commit it. You push this commit to the shared repository. This uploads your new file and all of the changes and is accompanied by a message explaining what changed, why and by whom.
Branch
When the same file has two simultaneous copies. When you are working locally and editing a file, you have created a branch where your edits are not shared with the main repository (yet) - so there are two versions of the file: the version that everybody has access to on the repository and your local edited version of the file. Until you push your changes and merge them back into the main repository, you are working on a branch. Following a branch point, the version history splits into two and tracks the independent changes made to both the original file in the repository that others may be editing, and tracking your changes on your branch, and then merges the files together.
Merge
Independent edits of the same file are incorporated into a single, unified file. Independent edits are identified by Git and are brought together into a single file, with both sets of edits incorporated. But, you can see a potential problem here - if both people made an edit to the same sentence that precludes one of the edits from being possible, we have a problem! Git recognizes this disparity (conflict) and asks for user assistance in picking which edit to keep.
Conflict
When multiple people make changes to the same file and Git is unable to merge the edits. You are presented with the option to manually try and merge the edits or to keep one edit over the other.
Clone
Making a copy of an existing Git repository. If you have just been brought on to a project that has been tracked with version control, you would clone the repository to get access to and create a local version of all of the repository’s files and all of the tracked changes.
Fork
A personal copy of a repository that you have taken from another person. If somebody is working on a cool project and you want to play around with it, you can fork their repository and then when you make changes, the edits are logged on your repository, not theirs.
GitHub
Install Git
git-scm.com/download
Learned this the hard way. So I’ll put it here and now since I spent 8 hours trying to figure out why RStudio wouldn’t show the Git Tab, as described later in this section. The problem is when I installed Git, it installed it in C/Program Files/Git. That’s not a problem or so I thought till later when the Git Tab didn’t appear. After 8 hours of trying I decided to uninstall Git and reinstall it in a different directory: C/Users/XXXX/Git.
Configure Git
Now that Git is installed, we need to configure it for use with GitHub, in preparation for linking it with RStudio.
We need to tell Git what your username and email are, so that it knows how to name each commit as coming from you. To do so, in the command prompt (either Git Bash for Windows or Terminal for Mac), type:
git config --global user.name "Jane Doe"
with your desired username in place of “Jane Doe.” This is the name each commit will be tagged with. Following this, in the command prompt, type:
git config --global user.email janedoe@gmail.com
At this point, you should be set for the next step, but just to check, confirm your changes by typing:
git config --list
Doing so, you should see the username and email you selected above. If you notice any problems or want to change these values, just retype the originalconfig
commands from earlier with your desired changes.
Once you are satisfied that your username and email is correct, exit the command line by typing exit
and hit Enter.
Link Git & RStudio
Global options
In RStudio, go to Tools > Global Options > Git/SVN
Sometimes the default path to the Git executable is not correct. Confirm that git.exe resides in the directory that RStudio has specified; if not, change the directory to the correct path. Otherwise, click OK or Apply.
So look for git.exe and once I installed Git in the Users directory RStudio found it automatically and pre-filled the location …../bin/git.exe
SSH RSA key
Still in the same window as above, click on Create RSA Key
Enter a phrase if you want
Create
Close
Back at the same window, View Public Key
Copy the key
Github SSH key
To do so, go to github, log-in if you are not already, and go to your account settings. There, go to “SSH and GPG keys” and click “New SSH key”. Paste in the public key you have copied from RStudio into the Key box and give it a Title related to RStudio. Confirm the addition of the key with your GitHub password.
Github GPG key
I was getting all kinds of errors as to missing GPG key when I was trying to submit changes to github from Rstudio, so I decided to create a gpg key:
I looked up the steps on github.com on how to create it
Long process that starts in CMD
After generating the key, copy from command prompt window
Back to github > Click on new GPG key
Follow steps
Clone repository
Create new repository on github
Copy link to new repository
New RStudio project
Back at Rstudio
Create a new project
Version Control, we want to clone the repository from github here
Choose Git next
Enter the link to the new repository
Create
Rstudio will clone the git repository here
Creat a new R script file
Enter code in it
Save the file
Commit file
In this pane you’ll see a Git tab
That took me 8 hours to figure out why mine never showed that tab
In this tab you’ll see the files created and cloned from github
Click on checkbox next to file just created
Next window shows all the files that have been modified with the one file with a checkmark
Type in a commit message in the box
Click commit
Close window
Push file
So far we’ve created a file
Saved it
Staged it
Committed it
What’s left is to PUSH it to github
PUSH by clicking on the green UP arrow button
You get a response that it is completed, and if you want go to github and check
Existing RStudio project
Sometimes, however, you may already have an R Project that isn’t yet under version control or linked with GitHub. So what if you already have an R Project that you’ve been working on, but don’t have it linked up to any version control software.
If you have an existing project then skip the first section:
Go to File > New Project > New Directory > New Project and name your project. Since we are trying to emulate a time where you have a project not currently under version control, do NOT click “Create a git repository”. Click Create Project
git bash
We’ve now created an R Project that is not currently under version control. Let’s fix that. First, let’s set it up to interact with Git. Open Git Bash or Terminal and navigate to the directory containing your project files. Move around directories by typing
Initiate a local git repository
When the command prompt in the line before the dollar sign says the correct directory location of your project, you are in the correct location. Once here, type
#(master) is now appearing after the prompt. next type
git init git add .
this initializes (init) this directory as a git repository and adds all of the files in the directory (.) to your local repository.
Commit to repository
Commit these changes to the git repository using
~ /d/Education/~/~/~udy_1 (master)
$ git commit -m "Initial commit"
master (root-commit) 9368e77] Initial commit
[9 files changed, 50 insertions(+)
100644 .Rhistory
create mode 100644 .Rproj.user/3D181177/pcs/files-pane.pper
create mode 100644 .Rproj.user/3D181177/pcs/source-pane.pper
create mode 100644 .Rproj.user/3D181177/pcs/windowlayoutstate.pper
create mode 100644 .Rproj.user/3D181177/pcs/workbench-pane.pper
create mode 100644 .Rproj.user/3D181177/rmd-outputs
create mode 100644 .Rproj.user/3D181177/saved_source_markers
create mode 100644 .Rproj.user/shared/notebooks/patch-chunk-names
create mode 100644 ~.Rproj create mode
Link to github
At this point, we have created an R Project and have now linked it to Git version control. The next step is to link this with GitHub.
To do this, go to GitHub.com, and again, create a new repository:
Make sure the name is the exact same as your R project
Do NOT initialize a README file, .gitignore, or license.
Now you should see a page like this:
You should see that there is an option to “Push an existing repository from the command line” with instructions below containing code on how to do so.
In Git Bash or Terminal, copy and paste these lines of code to link your repository with GitHub.
After doing so, refresh your GitHub page.
When you re-open your project in RStudio, you should now have access to the Git tab in the upper right quadrant and can push to GitHub from within RStudio any future changes.
://github.com/LEEVE/JH_study_1.git
git remote add origin https-M main
git branch -u origin main
git push : 15, done.</span>
Enumerating objects: 100% (15/15), done.</span>
Counting objects12 threads</span>
Delta compression using up to : 100% (9/9), done.</span>
Compressing objects: 100% (15/15), 1.23 KiB | 1.23 MiB/s, done.</span>
Writing objects15 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)</span>
Total ://github.com/LEEVE/JH_study_1.git</span>
To https* [new branch] main -> main</span>
'main' set up to track 'origin/main'. branch
R Markdown
R Markdown is a way of creating fully reproducible documents, in which both text and code can be combined. One of the main benefits is the reproducibility of using R Markdown. Since you can easily combine text and code chunks in one document, you can easily integrate introductions, hypotheses, your code that you are running, the results of that code and your conclusions all in one document. Sharing what you did, why you did it and how it turned out becomes so simple - and that person you share it with can re-run your code and get the exact same answers you got.
See sample file I created in Rstudio.
Easy commands
At this point, I hope we’ve convinced you that R Markdown is a useful way to keep your code/data and have set you up to be able to play around with it. To get you started, we’ll practice some of the formatting that is inherent to R Markdown documents.
To start, let’s look at bolding and italicising text. To bold text, you surround it by two asterisks on either side. Similarly, to italicise text, you surround the word with a single asterisk on either side. **bold**
and *italics*
respectively.
We’ve also seen from the default document that you can make section headers. To do this, you put a series of hash marks (#). The number of hash marks determines what level of heading it is. One hash is the highest level and will make the largest text (see the first line of this lecture), two hashes is the next highest level and so on. Play around with this formatting and make a series of headers, like so:
# Header level 1
## Header level 2
### Header level 3...
The other thing we’ve seen so far is code chunks. To make an R code chunk, you can type the three backticks, followed by the curly brackets surrounding a lower case R, put your code on a new line and end the chunk with three more backticks. Thankfully, RStudio recognized you’d be doing this a lot and there are short cuts, namely Ctrl+Alt+I (Windows) or Cmd + Option + I (Mac). Additionally, along the top of the source quadrant, there is the “Insert” button, that will also produce an empty code chunk. Try making an empty code chunk. Inside it, type the code print("Hello world")
. When you knit your document, you will see this code chunk and the (admittedly simplistic) output of that chunk.
If you aren’t ready to knit your document yet, but want to see the output of your code, select the line of code you want to run and use Ctrl+Enter or hit the “Run” button along the top of your source window. The text “Hello world” should be output in your console window. If you have multiple lines of code in a chunk and you want to run them all in one go, you can run the entire chunk by using Ctrl+Shift+Enter OR hitting the green arrow button on the right side of the chunk OR going to the Run menu and selecting Run current chunk.
One final thing we will go into detail on is making bulleted lists, like the one at the top of this lesson. Lists are easily created by preceding each prospective bullet point by a single dash, followed by a space. Importantly, at the end of each bullet’s line, end with TWO spaces. This is a quirk of R Markdown that will cause spacing problems if not included.
Try
Making
Your
Own
Bullet
List!
This is a great starting point and there is so much more you can do with R Markdown. Thankfully, RStudio developers have produced an “R Markdown cheatsheet” that we urge you to go check out and see everything you can do with R Markdown! The sky is the limit!