Merge

Countless times we’ll need to extract data from one source and merge it with data from another source. I’ll cover many real cases here up to a point where it becomes redundant and duplicate.

Tables & Dfs

More on this example can be found in Case Study: Samsung Fitbit - Tidying.

Data

Let’s bring in the data:

Function to bring in the data (covered in function document)
Data contains both training: _train and testing: _test data
We won’t use the code below to bring in the data as Quarto has a problem making connection via a function
So we’ll use direct read statements below
Import these .txt files: subject_test, x_test, y_test, subject_train, x-train, y-train, features, activity_labels
Note that I used both the function loadfile_to_table() and read.table()

Preview

using str() we get the following description of each df
subject_test: data.frame of 2947 rows and 1 variable
y_test: data.frame of 2947 rows of 1 variable
X_test: data.frame of 2947 rows of 561 variables

#---------------------- Load files
subject_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/subject_test.txt") 
X_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/x_test.txt")
y_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/y_test.txt")
subject_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/subject_train.txt")
X_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/X_train.txt")
y_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/y_train.txt")
features <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/features.txt")
labelfile <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/activity_labels.txt")

#----------------------- Preview

head(subject_test)

str(subject_test)

'data.frame':   2947 obs. of  1 variable:
 $ V1: int  2 2 2 2 2 2 2 2 2 2 ...

head(y_test)

str(y_test)

'data.frame':   2947 obs. of  1 variable:
 $ V1: int  5 5 5 5 5 5 5 5 5 5 ...

dim(X_test)

[1] 2947  561

Cbind

As you see from the structure:

subject_test, y_test & X_test all have the same number of rows
different number of columns
cbind will work well with same number of rows
we do the same with the _train data as well

testfile <- cbind(subject_test,y_test,X_test)
trainfile <- cbind(subject_train,y_train,X_train)

dim(testfile)

[1] 2947  563

dim(trainfile)

[1] 7352  563

Rbind

We’ll combine both training and testing cbound/horizontally merged datasets from above but now we’ll merge them vertically: add one to the bottom of the other.

mergedfile <- rbind(testfile,trainfile)
dim(mergedfile)

[1] 10299   563