getwd()
[1] "D:/yourdataiq/datawr/rfordata"
#setwd("....blah...")
I’m going to assume you already installed R and possibly using Rstudio as an IDE (as I am throughout this document).
Many times you’ll set each project in its own directory to simplify matters, in Rstudio it’s very easy to create a project and set the directory for that project when you use the dropdown File menu item and create project.
Many times you might have to import a data file from another directory, or for whatever other reason you need to know which directory are you actually in at the moment.
Self-explanatory:
You’ll make use of numerous packages in R. Of course R comes with many base functions but as an analyst you’ll need access to many more than the basics.
As you write scripts in separate files you might need to reference a new package. I like to keep all my packages in one file not spread out all over the individual files.
Here is a line to use for installing a package if needed
Loading required package: dplyr
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
or if you simply want to install a package
Tilde operator is used to define the relationship between dependent variable and independent variables in a statistical model formula. The variable on the left-hand side of tilde operator is the dependent variable and the variable(s) on the right-hand side of tilde operator is/are called the independent variable(s). So, tilde operator helps to define that dependent variable depends on the independent variable(s) that are on the right-hand side of tilde operator. (retrieved from tutorialspoint.com)
In the case of this example, the independent variable at the end of the syntax line would be species, and the dependent variable in this case is facet_wrap. The facet_wrap is actually a function that is being called to separate and distribute the independent data within the plot.
just type ?function name and it will open a help page in the other tab/window
R is case sensitive so remember that
It can be pretty time-consuming to type out lots of values. To save time, we can use variables to represent the values. This lets us call out the values any time we need to with just the variable. Earlier, we learned about variables in SQL.
A variable is a representation of a value in R that can be stored for use later during programming. Variables can also be called objects.
As a data analyst, you’ll find variables are very useful when programming. For example, if you want to filter a dataset, just assign a variable to the function you used to filter the data. That way, all you have to do is use that variable to filter the data later.
When naming a variable in R, you can use a short phrase.
A variable name should start with a letter and can also contain numbers and underscores.
So the variable 5penguin wouldn’t work well because it starts with a number. Also just like functions, variable names are case-sensitive. Using all lower case letters is good practice whenever possible.
Arithmetic operators let you perform basic math operations like addition, subtraction, multiplication, and division.
The table below summarizes the different arithmetic operators in R. The examples used in the table are based on the creation of two variables: : x equals 2 and y equals 5. Note that you use the assignment operator to store these values:
Operator | Description | Example Code | Result/ Output |
---|---|---|---|
+ | Addition | x + y | [1] 7 |
- | Subtraction | x - y | [1] -3 |
* | Multiplication | x * y | [1] 10 |
/ | Division | x / y | [1] 0.4 |
%% | Modulus (returns the remainder after division) | y %% x | [1] 1 |
%/% | Integer division (returns an integer value after division) | y%/% x | [1] 2 |
^ | Exponent | y ^ x | [1]25 |
Relational operators, also known as comparators, allow you to compare values. Relational operators identify how one R object relates to another—like whether an object is less than, equal to, or greater than another object. The output for relational operators is either TRUE or FALSE (which is a logical data type, or boolean).
Operator | Description | Example Code | Result/Output |
---|---|---|---|
< | Less than | x < y | [1] TRUE |
> | Greater than | x > y | [1] FALSE |
<= | Less than or equal to | x < = 2 | [1] TRUE |
>= | Greater than or equal to | y >= 10 | [1] FALSE |
== | Equal to | y == 5 | [1] TRUE |
!= | Not equal to | x != 2 | [1] FALSE |
Logical operators return a logical data type such as TRUE or FALSE.
Operator | Description |
---|---|
& | Element-wise logical AND |
&& | Logical AND |
| | Element-wise logical OR |
|| | Logical OR |
! | Logical NOT |
There are three primary types of logical operators:
Assignment operators let you assign values to variables.
In many scripting programming languages you can just use the equal sign (=) to assign a variable. For R, the best practice is to use the arrow assignment (<-). Technically, the single arrow assignment can be used in the left or right direction. But the rightward assignment is not generally used in R code.
You can also use the double arrow assignment, known as a scoping assignment. But the scoping assignment is for advanced R users, so you won’t learn about it in this reading.
To a assign a value to a variable we
The table below summarizes the assignment operators and example code in R. Notice that the output for each variable is its assigned value.
Operator | Description | Example Code (after the sample code below, typing x will generate the output in the next column) | Result/ Output |
---|---|---|---|
<- | Leftwards assignment | x <- 2 | [1] 2 |
<<- | Leftwards assignment | x <<- 7 | [1] 7 |
= | Leftwards assignment | x = 9 | [1] 9 |
-> | Rightwards assignment | 11 -> x | [1] 11 |
->> | Rightwards assignment | 21 ->> x | [1] 21 |
A conditional statement is a declaration that if a certain condition holds, then a certain event must take place. For example, “If the temperature is above freezing, then I will go outside for a walk.” If the first condition is true (the temperature is above freezing), then the second condition will occur (I will go for a walk). Conditional statements in R code have a similar logic. SeeContol Structures Loops page.
data(cars) loads the prebuilt dataset cars.
To print the first 5 rows and first five columns of a large dataset
tail(cars,15) gives us the last 15, of course if we omit the argument we get the last 6 rows
dim(cars) gives us the row and col count
Is covered in detail in Summarize - Arrange
Gives us a table summary format of the data
dist
speed 2 4 10 14 16 17 18 20 22 24 26 28 32 34 36 40 42 46 48 50 52 54 56 60 64
4 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 1 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0
15 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dist
speed 66 68 70 76 80 84 85 92 93 120
4 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 1 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0
18 0 0 0 1 0 1 0 0 0 0
19 0 1 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0
24 0 0 1 0 0 0 0 1 1 1
25 0 0 0 0 0 0 1 0 0 0
Gives us the structure makeup of the data
This function will return TRUE if argument evaluates to TRUE, otherwise returns FALSE.
Will return T if the two R objects passed to it are identical.
Exclusive or, takes two arguments, if one argument evaluates to TRUE and one argument evaluates to FALSE, then it returns TRUE, otherwise it returns FALSE even if both arguments are TRUE
The which() function takes a logical vector as an argument and returns the indices of the vector that are TRUE
The function any() takes a logical vector as an argument and returns TRUE if one or more of the elements in the logical vector is TRUE.
Similar to any but they all have to be TRUE for it to return TRUE.
A pipe is a tool in R for expressing a sequence of multiple operations. It’s used to apply the output of one function into another function.
A pipe is represented by a % sign, followed by a > sign, and another % sign. Pipes can make your code easier to read and understand.
For example, this pipe filters and sorts the data. You can use CTRL+SHIFT+M in windows
It can be found as a package in Rstudio:
# comments
same as SQL we can use #