#---------------------- Load files
<- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/subject_test.txt")
subject_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/x_test.txt")
X_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/y_test.txt")
y_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/subject_train.txt")
subject_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/X_train.txt")
X_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/y_train.txt")
y_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/features.txt")
features <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/activity_labels.txt") labelfile
Merge
Countless times we’ll need to extract data from one source and merge it with data from another source. I’ll cover many real cases here up to a point where it becomes redundant and duplicate.
Tables & Dfs
More on this example can be found in Case Study: Samsung Fitbit - Tidying.
Data
Let’s bring in the data:
- Function to bring in the data (covered in function document)
- Data contains both training: _train and testing: _test data
- We won’t use the code below to bring in the data as Quarto has a problem making connection via a function
- So we’ll use direct read statements below
- Import these .txt files: subject_test, x_test, y_test, subject_train, x-train, y-train, features, activity_labels
- Note that I used both the function loadfile_to_table() and read.table()
Preview
- using str() we get the following description of each df
- subject_test: data.frame of 2947 rows and 1 variable
- y_test: data.frame of 2947 rows of 1 variable
- X_test: data.frame of 2947 rows of 561 variables
#----------------------- Preview
head(subject_test)
V1
1 2
2 2
3 2
4 2
5 2
6 2
str(subject_test)
'data.frame': 2947 obs. of 1 variable:
$ V1: int 2 2 2 2 2 2 2 2 2 2 ...
head(y_test)
V1
1 5
2 5
3 5
4 5
5 5
6 5
str(y_test)
'data.frame': 2947 obs. of 1 variable:
$ V1: int 5 5 5 5 5 5 5 5 5 5 ...
dim(X_test)
[1] 2947 561
Cbind
As you see from the structure:
- subject_test, y_test & X_test all have the same number of rows
- different number of columns
- cbind will work well with same number of rows
- we do the same with the _train data as well
<- cbind(subject_test,y_test,X_test)
testfile <- cbind(subject_train,y_train,X_train)
trainfile
dim(testfile)
[1] 2947 563
dim(trainfile)
[1] 7352 563
Rbind
We’ll combine both training and testing cbound/horizontally merged datasets from above but now we’ll merge them vertically: add one to the bottom of the other.
<- rbind(testfile,trainfile)
mergedfile dim(mergedfile)
[1] 10299 563