library(gt)
library(dplyr)
library(tidyverse)
library(stats) # for quantile
Combine Columns
Data
We’ll be using another version of the EPA PM2.5 data used in the the previous EPA - EDA pages. This version has been modified and saved locally in several csv files.
- pm0 contains data from 1999
- pm1 contains data from2012
- cnames.txt is a variable containing list of characters for column names
- wcol_df is a df holds the indices of the 5 columns we
- site0 and site1 are for sensors in the NY state area for 1999 and 2012 respectively described as County.Code and Site.ID concatenated together with “.” as seperator
- both contains the list of sensors that were in use for NY State in both years 1999 and 2012
Case Study
Let’s concatenate both County.Code and Site.ID columns into one
- call it county.site, with the values being separated by “.” just as the column name is
- the resulting values will match the values in both so we can complete our analysis
Paste
<- named_pm0 |>
named_pm0 mutate(county.site = paste(County.Code,Site.ID,sep = "."))
<- named_pm1 |>
named_pm1 mutate(county.site = paste(County.Code,Site.ID,sep = "."))
head(named_pm0)
State.Code County.Code Site.ID Date Sample.Value county.site
1 1 27 1 19990103 NA 27.1
2 1 27 1 19990106 NA 27.1
3 1 27 1 19990109 NA 27.1
4 1 27 1 19990112 8.841 27.1
5 1 27 1 19990115 14.920 27.1
6 1 27 1 19990118 3.878 27.1
head(named_pm1)
State.Code County.Code Site.ID Date Sample.Value county.site
1 1 3 10 20120101 6.7 3.10
2 1 3 10 20120104 9.0 3.10
3 1 3 10 20120107 6.5 3.10
4 1 3 10 20120110 7.0 3.10
5 1 3 10 20120113 5.8 3.10
6 1 3 10 20120116 8.0 3.10