Basics - General R

Directory


I’m going to assume you already installed R and possibly using Rstudio as an IDE (as I am throughout this document).

Many times you’ll set each project in its own directory to simplify matters, in Rstudio it’s very easy to create a project and set the directory for that project when you use the dropdown File menu item and create project.

Working directory

Many times you might have to import a data file from another directory, or for whatever other reason you need to know which directory are you actually in at the moment.

getwd

setwd

Self-explanatory:

  • one will get you the working directory
  • one will set the wd for you
getwd()
[1] "D:/yourdataiq/datawr/rfordata"
#setwd("....blah...")

Packages


You’ll make use of numerous packages in R. Of course R comes with many base functions but as an analyst you’ll need access to many more than the basics.

install

As you write scripts in separate files you might need to reference a new package. I like to keep all my packages in one file not spread out all over the individual files.

Here is a line to use for installing a package if needed

if(!require('dplyr')) {
      install.packages('dplyr')
      library('dplyr')
}
Loading required package: dplyr

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

or if you simply want to install a package

install.packages("lubridate")
library(lubridate)

General


~

Tilde operator is used to define the relationship between dependent variable and independent variables in a statistical model formula. The variable on the left-hand side of tilde operator is the dependent variable and the variable(s) on the right-hand side of tilde operator is/are called the independent variable(s). So, tilde operator helps to define that dependent variable depends on the independent variable(s) that are on the right-hand side of tilde operator. (retrieved from tutorialspoint.com)

In the case of this example, the independent variable at the end of the syntax line would be species, and the dependent variable in this case is facet_wrap. The facet_wrap is actually a function that is being called to separate and distribute the independent data within the plot.

help

just type ?function name and it will open a help page in the other tab/window

case sensitive

R is case sensitive so remember that

variables

It can be pretty time-consuming to type out lots of values.  To save time, we can use variables to represent the values. This lets us call out the values any time we need to with just the variable. Earlier, we learned about variables in SQL. 

A variable is a representation of a value in R that can be stored for use later during programming. Variables can also be called objects. 

As a data analyst, you’ll find variables are very useful when programming. For example, if you want to filter a dataset, just assign a variable to the function you used to filter the data. That way, all you have to do is use that variable to filter the data later. 

When naming a variable in R, you can use a short phrase. 

A variable name should start with a letter and can also contain numbers and underscores.

So the variable 5penguin wouldn’t work well because it starts with a number. Also just like functions, variable names are case-sensitive. Using all lower case letters is good practice whenever possible.

# comments

same as SQL we can use #

Arithmetic Operators


Arithmetic operators let you perform basic math operations like addition, subtraction, multiplication, and division.

The table below summarizes the different arithmetic operators in R. The examples used in the table are based on the creation of two variables: : x equals 2 and y equals 5. Note that you use the assignment operator to store these values:

x <- 2
y <- 5
Operator Description Example Code Result/ Output
+ Addition x + y [1] 7
- Subtraction x - y [1] -3
* Multiplication x * y [1] 10
/ Division x / y [1] 0.4
%% Modulus (returns the remainder after division) y %% x [1] 1
%/% Integer division (returns an integer value after division) y%/% x [1] 2
^ Exponent y ^ x [1]25

Relational Operators


Relational operators, also known as comparators, allow you to compare values. Relational operators identify how one R object relates to another—like whether an object is less than, equal to, or greater than another object. The output for relational operators is either TRUE or FALSE (which is a logical data type, or boolean).

Operator Description Example Code Result/Output
< Less than x < y [1] TRUE
> Greater than x > y [1] FALSE
<= Less than or equal to x < = 2 [1] TRUE
>= Greater than or equal to y >= 10 [1] FALSE
== Equal to y == 5 [1] TRUE
!= Not equal to x != 2 [1] FALSE

Logical Operators


Logical operators return a logical data type such as TRUE or FALSE.

Operator Description
& Element-wise logical AND
&& Logical AND
| Element-wise logical OR
|| Logical OR
! Logical NOT

There are three primary types of logical operators:

  • AND (sometimes represented as & or && in R)  Solar.R > 150 & Wind > 10
  • OR (sometimes represented as | or || in R)  Solar.R > 150 | Wind > 10
  • NOT (!)  !(Solar.R > 150 | Wind > 10) another is Day != 1

Assignment Operators


Assignment operators let you assign values to variables.

In many scripting programming languages you can just use the equal sign (=) to assign a variable. For R, the best practice is to use the arrow assignment (<-). Technically, the single arrow assignment can be used in the left or right direction. But the rightward assignment is not generally used in R code.

You can also use the double arrow assignment, known as a scoping assignment. But the scoping assignment is for advanced R users, so you won’t learn about it in this reading.

<- assign variable

To a assign a value to a variable we

  • first type the variable name: first_variable <- 12
  • <- less than sign followed by - which looks like an arrow pointing from right side (value) to left side(variable name)
  • when you push enter the environment panel will display the variables with their assignments in a table
first_variable <- "yasha"
second_variable <- 7.7

The table below summarizes the assignment operators and example code in R. Notice that the output for each variable is its assigned value.

Operator Description Example Code (after the sample code below, typing x will generate the output in the next column) Result/ Output
<- Leftwards assignment x <- 2 [1] 2
<<- Leftwards assignment x <<- 7 [1] 7
= Leftwards assignment x = 9 [1] 9
-> Rightwards assignment 11 -> x [1] 11
->> Rightwards assignment 21 ->> x [1] 21

Conditional Statements


A conditional statement is a declaration that if a certain condition holds, then a certain event must take place. For example, “If the temperature is above freezing, then I will go outside for a walk.” If the first condition is true (the temperature is above freezing), then the second condition will occur (I will go for a walk). Conditional statements in R code have a similar logic. SeeContol Structures Loops page.

Preview Data


data

data(cars) loads the prebuilt dataset cars.

tail

tail(cars,15) gives us the last 15, of course if we omit the argument we get the last 6 rows

dim

dim(cars) gives us the row and col count

dim(cars)
[1] 50  2

nrow

 nrow(cars)
[1] 50

ncol

 ncol(cars)
[1] 2

object.size

 object.size(cars)
1648 bytes

names

names(cars)
[1] "speed" "dist" 

summary

Is covered in detail in Summarize - Arrange

summary(cars)  
     speed           dist       
 Min.   : 4.0   Min.   :  2.00  
 1st Qu.:12.0   1st Qu.: 26.00  
 Median :15.0   Median : 36.00  
 Mean   :15.4   Mean   : 42.98  
 3rd Qu.:19.0   3rd Qu.: 56.00  
 Max.   :25.0   Max.   :120.00  

table

Gives us a table summary format of the data

table(cars)
     dist
speed 2 4 10 14 16 17 18 20 22 24 26 28 32 34 36 40 42 46 48 50 52 54 56 60 64
   4  1 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   7  0 1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   8  0 0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   9  0 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   10 0 0  0  0  0  0  1  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0
   11 0 0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
   12 0 0  0  1  0  0  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
   13 0 0  0  0  0  0  0  0  0  0  1  0  0  2  0  0  0  1  0  0  0  0  0  0  0
   14 0 0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0  1  0
   15 0 0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  0  0
   16 0 0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0
   17 0 0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  1  0  0  0  0  0
   18 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0
   19 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0
   20 0 0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  1  0  1  0  1
   22 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   23 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0
   24 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   25 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
     dist
speed 66 68 70 76 80 84 85 92 93 120
   4   0  0  0  0  0  0  0  0  0   0
   7   0  0  0  0  0  0  0  0  0   0
   8   0  0  0  0  0  0  0  0  0   0
   9   0  0  0  0  0  0  0  0  0   0
   10  0  0  0  0  0  0  0  0  0   0
   11  0  0  0  0  0  0  0  0  0   0
   12  0  0  0  0  0  0  0  0  0   0
   13  0  0  0  0  0  0  0  0  0   0
   14  0  0  0  0  1  0  0  0  0   0
   15  0  0  0  0  0  0  0  0  0   0
   16  0  0  0  0  0  0  0  0  0   0
   17  0  0  0  0  0  0  0  0  0   0
   18  0  0  0  1  0  1  0  0  0   0
   19  0  1  0  0  0  0  0  0  0   0
   20  0  0  0  0  0  0  0  0  0   0
   22  1  0  0  0  0  0  0  0  0   0
   23  0  0  0  0  0  0  0  0  0   0
   24  0  0  1  0  0  0  0  1  1   1
   25  0  0  0  0  0  0  1  0  0   0

str

Gives us the structure makeup of the data

str(cars)
'data.frame':   50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

Logic


isTRUE

This function will return TRUE if argument evaluates to TRUE, otherwise returns FALSE.

isTRUE(6>4) 
[1] TRUE

identical

Will return T if the two R objects passed to it are identical.

identical('twins','twins')
[1] TRUE
identical(5>4, 3<3.1)
[1] TRUE

xor

Exclusive or, takes two arguments, if one argument evaluates to TRUE and one argument evaluates to FALSE, then it returns TRUE, otherwise it returns FALSE even if both arguments are TRUE

xor(5 == 6, !FALSE)
[1] TRUE

which

The which() function takes a logical vector as an argument and returns the indices of the vector that are TRUE

ints <- sample(10)
ints
 [1]  6  8  4  9  1  2  5 10  7  3
which(ints > 7)
[1] 2 4 8

any

The function any() takes a logical vector as an argument and returns TRUE if one or more of the elements in the logical vector is TRUE.

any(ints<0)
[1] FALSE

all

Similar to any but they all have to be TRUE for it to return TRUE.

all(ints>0)
[1] TRUE

Pipes


A pipe is a tool in R for expressing a sequence of multiple operations. It’s used to apply the output of one function into another function.

%>%

A pipe is represented by a % sign, followed by a > sign, and another % sign.   Pipes can make your code easier to read and understand. 

For example, this pipe filters and sorts the data. You can use CTRL+SHIFT+M in windows

ToothGrowth |>  filter(dose==0.5) |>  arrange(len)

Swirl


https://swirlstats.com/

It can be found as a package in Rstudio:

install.packages("swirl")

#after you install it once just load library as usual
library("swirl") 

# to start swirling use
swirl()