Convert Column

data

library(gt)
library(dplyr)

Conversion Key

There are times when observations are obscure and make no logical sense but are easier to collect that way, so we go ahead and collect them in the most efficient way and when it’s time to process we can always convert the data to something more understandable
Often it is more cost effective to follow this process than spend the resources designing mechanisms to alter the collection methods
Let’s look at extracteddata

extracteddata[ ,1:3] |> 
        gt_preview(top_n = 5)

	subject	activity	tBodyAcc-mean()-X
1	2	5	0.2571778
2	2	5	0.2860267
3	2	5	0.2754848
4	2	5	0.2702982
5	2	5	0.2748330
6..10298
10299	30	2	0.3515035

#Let's look at the way the activities are displayed in the column before conversion
unique(extracteddata$activity)

[1] 5 4 6 1 3 2

column to be edited

extracteddata is brought over from how_to_extract page
Looking at the first 3 columns of extracteddata
The second column named activity contains numbers from 1-6 which correspond to the keys for each activity
It would make more sense to replace the integers in column 2 with their respective description “walking, standing….

conversion keys

gt_preview(labelfile)

	V1	V2
1	1	WALKING
2	2	WALKING_UPSTAIRS
3	3	WALKING_DOWNSTAIRS
4	4	SITTING
5	5	STANDING
6	6	LAYING

so far we’ve identified activity as the column to edit
we’ll use the keys found in labelfile to swap out the values in activity
note that the key values are all capitalized, so we’ll convert to lower case as well

case_match

case_match is identical to CASE WHEN in SQL and goes back decades as older languages had similar functions
I’ll break this down into smaller steps:
- we know extracteddata$activity is a number 1:6
- we know labelfile(1,2) = WALKING
- we know labelfile(2,2) = WALKING_UPSTAIRS….
- extracteddata$activity contains values 1:6 so we can match the value in extracteddata$activity (which is 1:6) with the row number of labelfile and use the value column 2 of labelfile to assign to the old value of extracteddata$acitivity
- before we assign the value we convert it to lower case
- here is the code broken down by steps
- we have already seen labelfile and extracteddata$activity above, let’s look at what it would look like if we use extracteddata$activity as the row number of labelfile
- as you see now that new_labelfile has the same number of rows as extracteddata, and the new_labelfile$activity column is not longer numbers from 1:6 but the assigned values from the key df labelfile

new_labelfile <- labelfile[extracteddata$activity, 2]
gt_preview(new_labelfile, top_n = 5)

	data
1	STANDING
2	STANDING
3	STANDING
4	STANDING
5	STANDING
6..10298
10299	WALKING_UPSTAIRS

all we do now is use case_match to assign the values in new_labelfile column 2 to extracteddata$activity
after we convert to lower case
the code below shows the process

extracteddata$activity <- case_match(
        extracteddata$activity,
        extracteddata$activity ~ tolower(new_labelfile))

one line of code

#---------- Convert labelfile to lower case and assign to their numeric values above
extracteddata$activity <- case_match( 
        extracteddata$activity, 
        extracteddata$activity ~ tolower(labelfile[extracteddata$activity,2]))

confirm results

Let’s look at the activity labels after the case_match
And the first 3 columns of extracteddata

unique(extracteddata$activity)

[1] "standing"           "sitting"            "laying"            
[4] "walking"            "walking_downstairs" "walking_upstairs"

extracteddata[ ,1:3] |>  
        gt_preview(top_n = 5)

	subject	activity	tBodyAcc-mean()-X
1	2	standing	0.2571778
2	2	standing	0.2860267
3	2	standing	0.2754848
4	2	standing	0.2702982
5	2	standing	0.2748330
6..10298
10299	30	walking_upstairs	0.3515035