Merge


Countless times we’ll need to extract data from one source and merge it with data from another source. I’ll cover many real cases here up to a point where it becomes redundant and duplicate.

Tables & Dfs


More on this example can be found in Case Study: Samsung Fitbit - Tidying.

Data

Let’s bring in the data:

  • Function to bring in the data (covered in function document)
  • Data contains both training: _train and testing: _test data
  • We won’t use the code below to bring in the data as Quarto has a problem making connection via a function
  • So we’ll use direct read statements below
  • Import these .txt files: subject_test, x_test, y_test, subject_train, x-train, y-train, features, activity_labels
  • Note that I used both the function loadfile_to_table() and read.table()

Preview

  • using str() we get the following description of each df
  • subject_test: data.frame of 2947 rows and 1 variable
  • y_test: data.frame of 2947 rows of 1 variable
  • X_test: data.frame of 2947 rows of 561 variables
#---------------------- Load files
subject_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/subject_test.txt") 
X_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/x_test.txt")
y_test <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/test/y_test.txt")
subject_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/subject_train.txt")
X_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/X_train.txt")
y_train <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/train/y_train.txt")
features <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/features.txt")
labelfile <- read.table("D:/Education/R/Data/har/unzipped/UCI HAR Dataset/activity_labels.txt")
#----------------------- Preview

head(subject_test)
  V1
1  2
2  2
3  2
4  2
5  2
6  2
str(subject_test)
'data.frame':   2947 obs. of  1 variable:
 $ V1: int  2 2 2 2 2 2 2 2 2 2 ...
head(y_test)
  V1
1  5
2  5
3  5
4  5
5  5
6  5
str(y_test)
'data.frame':   2947 obs. of  1 variable:
 $ V1: int  5 5 5 5 5 5 5 5 5 5 ...
dim(X_test)
[1] 2947  561

Cbind


As you see from the structure:

  • subject_test, y_test & X_test all have the same number of rows
  • different number of columns
  • cbind will work well with same number of rows
  • we do the same with the _train data as well
testfile <- cbind(subject_test,y_test,X_test)
trainfile <- cbind(subject_train,y_train,X_train)

dim(testfile)
[1] 2947  563
dim(trainfile)
[1] 7352  563

Rbind


We’ll combine both training and testing cbound/horizontally merged datasets from above but now we’ll merge them vertically: add one to the bottom of the other.

mergedfile <- rbind(testfile,trainfile)
dim(mergedfile)
[1] 10299   563