Import & Export

Import

download.file

Download.file will download any file regardless if it’s csv, xls, or….
we’ve already created the directory we’ll use
let’s say we have to download a .zip file from a site
set a time marker dateDownloaded so you can always tell which version of the data you are working on in the event the data gets updated

fileUrl <- "https://d396qusza40orc.cloudfront.net 
        /getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"

download.file(fileUrl, destfile = "zipped.zip", method="curl")
dateDownloaded <- date()
dateDownloaded           # you can always print out date() without saving it

Unzip

In the event that you want to unzip an entire folder
without seeing the list of files
or if you already have seen it as described in the section below

unzip("zipped.zip", exdir= "D:/~/Data/har/unzipped")

Load

RDS

You can read these files directly but I tend to break the code down into two parts

con1 <- file("D:/Education/R/Data/EPA/summarySCC_PM25.rds")
con2 <- file("D:/Education/R/Data/EPA/Source_Classification_Code.rds")
NEI <- readRDS(con1)
SCC <- readRDS(con2)

Zipped

zipped .bz2

Zipped .bz2 file can be read directly with read.csv

storm_data <- read.csv("D:/Education/R/Data/JH_C5_week2/
                       repdata_data_StormData.csv.bz2", header = TRUE)

Continuing with the example above “zipped.zip”, at times the zipped folder contains many files
you can list the files within the zipped folder prior to unzipping it
reason being: if you only need 1 or 2 files and not an entire large dataset you can read those files specifically

List files

zipped

You can list all the files in the zipped folder using the same command to read them but set list=TRUE

all_files <- unzip("zipped.zip", list=TRUE)

File List

lapply

If you have a list of wanted files that you chose from above, or possibly all_files in a directory
you can use lapply to scan through the list and read them
lapply will give the output in a list, so it will output all the files in a list of dfs one for each file in the list

dataIn <- lapply(all_files, read.csv)

read.table

refer to Basics - In & Out
as handy as read.table is it has some drawbacks
one major one is that it reads the data into RAM, so large sets might cause issues
can always sub with read.csv or in the readr package: read_csv

labelfile <- read.table("D:/~/har/activity_labels.txt")

read.csv

pm0 <- read.csv("D:/yourdataiq/dataiq/datasets/pm0.csv")

readLines

used for .txt files instead of read.table

cnames <- readLines("D:/yourdataiq/dataiq/datasets/cnames.txt")

Function

What if you want the user to input the directory, file name, and extension
create a function that does just that
sometimes it’s just easier to write the code directly, but coding is to make our life easier so here is such a function
quarto doesn’t work with a function to read the files as it cannot establish a connection but in R script it works (seehow_to_merge )

loadfile_to_table <- function(directory, name, extension){ 
        fileDir <- setwd("D:/~/Data/har") 
        wantedfile = file.path(fileDir,directory,paste(name, extension ,sep = "")
                               ,fsep="/") 
        return(read.table(wantedfile)) 
        }

then you just call it using

subject_test <- loadfile_to_table("test","subject_test",".txt")

Save

File Output

.txt & .csv

I’ll save both files in .txt and .csv formats
Verify the files were saved in the correct directory
Confirm operation with a timestamp

 library(readr)
 if(!file.exists("har/meanPerSubject.csv"))
         {write_csv(persubfile,"har/meanPerSubject.csv")}

 #______Save in txt format as well using both write.table & write_csv
 if(!file.exists("har/meanPerSubject.txt"))
         {write.table(persubfile,"har/meanPerSubject.txt")}
 if(!file.exists("har/meanPerActivity.txt"))
         {write_csv(peractivityfile,"har/meanPerActivity.txt")}
 
 list.files("har")
 dateUploaded <- date()

png Output

save png

We can save a plot as a png with exact dimensions given
Here we first process the data
Set the png() function and parameters
Plot the graph, which will automatically save it into a png
It will not display the .png file until we turn
dev.off()

emm_year <- NEI |> 
        group_by(year) |> 
        mutate(Emm_per_year=sum(Emissions))

png(filename = "D:/yourdataiq/dataiq/images/plot1.png",
    width=480, height = 480, units = "px")
with(emm_year,
     plot(year,Emm_per_year, type="l", col="green",
          lwd=2, ylab="totalPM2.5 emmission"))
dev.off()

Import & Export

Directory

set directory

get directory

Import

download.file

Unzip

Load

RDS

Zipped

zipped .bz2

List files

zipped

directory

File List

lapply

read.table

read.csv

readLines

Function

Save

File Output

.txt & .csv

png Output

save png