library(tidyverse)
library(lubridate)
library(stringr) # for mutate
Dates - Times
Dates and Times deserve their own page, as they are unique and a bit different than what we’ve seen so far.
Packages
In R, there are three types of data that refer to an instant in time:
- A date (“2016-08-16”)
- A time within a day (“20:11:59 UTC”)
- And a date-time. This is a date plus a time (“2018-03-31 18:15:48 UTC”)
The time is given in UTC, which stands for Universal Time Coordinated, more commonly called Universal Coordinated Time. This is the primary standard by which the world regulates clocks and time.
Dates are stored and represented as an object of the Date class
- Dates are stored internally as the number of days since 1970-01-01
- Times are represented by the POSIXct or the POSIXlt class
- Times are stored internally as the number of seconds since 1970-01-01 for POSIXct
- Times are stored as a list of seconds, minutes, hours …. for POSIXlt
Lubridate
ymd
When you run the function, R returns the date in yyyy-mm-dd format. It works the same way for any order. For example, month, day, and year.R still returns the date in yyyy-mm-dd format.
These functions also take unquoted numbers and convert them into the yyyy-mm-dd format.
ymd("20210120") #you can run it without ""
[1] "2021-01-20"
ymd("2021-01-20")
[1] "2021-01-20"
mdy
mdy("January 20th, 2021")
[1] "2021-01-20"
dmy
Or, day, month, and year. R still returns the date in yyyy-mm-dd format.
dmy("20-Jan-2021")
[1] "2021-01-20"
Dates
today
For example, to get the current date you can run the today() function. The date appears as year, month, and day.
today()
[1] "2024-12-04"
now
To get the current date-time you can run the now() function. Note that the time appears to the nearest second.
now()
[1] "2024-12-04 09:22:52 CST"
When working with R, there are three ways you are likely to create date-time formats:
- From a string
- From an individual date
- From an existing date/time object
R creates dates in the standard yyyy-mm-dd format by default.
Convert
Integer to Date
<- c(20120101, 20120104, 20120107, 20120110, 20120113, 20120116,
date_integer 20120119, 20120122)
date_integer
[1] 20120101 20120104 20120107 20120110 20120113 20120116 20120119 20120122
as.date
- The first option is to use as.date
as.Date(as.character(date_integer), "%Y%m%d")
[1] "2012-01-01" "2012-01-04" "2012-01-07" "2012-01-10" "2012-01-13"
[6] "2012-01-16" "2012-01-19" "2012-01-22"
ymd
- or can use ymd
ymd(date_integer)
[1] "2012-01-01" "2012-01-04" "2012-01-07" "2012-01-10" "2012-01-13"
[6] "2012-01-16" "2012-01-19" "2012-01-22"
String to Date
as.date
Dates can be coerced from character string to date using as.Date(). It prints out as character string but it is not a string.
# Coerce a 'Date' object from character
<- as.Date("1970-01-01")
x x
[1] "1970-01-01"
class(x)
[1] "Date"
unclass
You can see the internal representation of a Date object by using the unclass() function. - Remember that date is stored as the # of days since 1970-01-01
unclass(x)
[1] 0
unclass(as.Date("1970-01-02"))
[1] 1
unclass(as.Date("1960-12-25"))
[1] -3294
Date/time data often comes as strings. You can convert strings into dates and date-times using the tools provided by lubridate. These tools automatically work out the date/time format. - First, identify the order in which the year, month, and day appear in your dates. - Then, arrange the letters y, m, and d in the same order. - That gives you the name of the lubridate function that will parse your date. For example, for the date 2021-01-20, you use the order ymd:
format(as.Date)
- We already saw how to use as.date. Now we can use format(as.Date)() which does what it sounds like.
- If we want to convert a string to date and we want to format it a specific way.
- I’ll start the code here and the rest will be used in Extract further down in the Convert section.
- So all we did is format the started_at value as.Date with the format = “option”, and we assigned the value as.Date to date and created a new column with the mutate()
<- trips19 |>
test_trip mutate( date = format(as.Date(started_at), format = "%m%d%Y")
Date to String
tostring
= toString("12/26/2024")
y y
[1] "12/26/2024"
format
This one is very similar to format(as.Date())
= format(latest_weight$ActivityDate, "%m-%d-%y") beDate
Date to Datetime
The ymd() function and its variations create dates.
ymd_hms
To create a date-time from a date, add an underscore and one or more of the letters h, m, and s (hours, minutes, seconds) to the name of the function:
ymd_hms("2021-01-20 20:11:59")
[1] "2021-01-20 20:11:59 UTC"
mdy_hm("01/20/2021 08:01")
[1] "2021-01-20 08:01:00 UTC"
Datetime to Date
Ok so how about when we want to switch back to date? Do you remember earlier we used now() and we got this value
now()
[1] "2024-12-04 09:22:52 CST"
What if we want the date only?
as.date
as_date(now())
[1] "2024-12-04"
Split & Extract
Well if we want to extract specific days, months, year. Well we already used mdy() earlier so now we can use
day
wday
This signifies the start of a week, we can set the first day of the week to be Monday = 1, Sunday = 7
wday(x, label=TRUE, week_start = 1) # for monday
month
year
To extract what we need. Here is an example where we use mdy(ActivityDate) and mutate it into a new column DateofActivity, this way we don’t edit the original data. Then we can extract day, month, and year and mutate each into a new column (day, month, year)
%>%
dailyactivity_df_3_4 mutate(DateOfActivity = mdy(ActivityDate)) %>%
mutate(day = day(DateOfActivity)) %>%
mutate (month = month(DateOfActivity)) %>%
mutate (year = year(DateOfActivity)) %>%
glimpse()
timeframes
Here is a complete example I pulled from a project I worked on, where I took a certain string (started_at) converted to date with the as.date(), then I formatted the value mdY and saved it all in date.
I also extracted the month as a string, weekdays as string, quarter as string…. more information can be found here.
# LET'S BREAK started_at INTO >DATE,YEAR,QUARTER,MONTH(NUM),DAY(NUM),WEEKDAY(STRING)
<- trips19 %>%
test_trip mutate(
date = format(as.Date(started_at), format = "%m%d%Y"), #monthdayYYYY
week_day = weekdays(started_at), #text for the day
month_wor = months(started_at), #month in text
quarter = quarters(started_at), #quarter
num_day = day(started_at), #gives the day of the month in number
blah = wday(started_at), #number for the day of week with sunday=1
blah_blah = wday(started_at, label = TRUE), #only 3 letters text for the day
blue = format(as.Date(started_at), format = "%A"), #same as week_day
month = format(as.Date(started_at), format = "%m"), #months in number
day = format(as.Date(started_at), format = "%d"), #same as num_day
year = format(as.Date(started_at), format = "%Y") #Y > 1111 and y > 11
)
Time
You can always load the lubridate package if you plan on working with time
POSIXct is just a very large integer under the hood, it uses a useful class when you want to store times in something like a data frame. You can coerce a number using as.POSIXct()
POSIXlt is a list underneath and it stores a bunch of other useful information like the day of the week, day of the year, month, day of the month. You can use as.POSIXlt() Some generic functions that work on dates and times: . weekdays: gives the day of week . months: gives the month name . quarters: gives the quarter number
POSIXlt
<- Sys.time()
x x
[1] "2024-12-04 09:22:52 CST"
<- as.POSIXlt(x)
p names(unclass(p))
[1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday"
[9] "isdst" "zone" "gmtoff"
$sec #extract seconds p
[1] 52.7057
$mday #day of the month p
[1] 4
POSIXct
So you see here that POSIXCt() doesn’t have a list. We can coerce it from POSIXct to Xlt by using as.POSIXlt()
<- Sys.time()
x x
[1] "2024-12-04 09:22:52 CST"
unclass(x)
[1] 1733325773
<- as.POSIXct(x)
p p
[1] "2024-12-04 09:22:52 CST"
unclass(p) #so you see it's already in POSIXct format it doesn't have any list elements
[1] 1733325773
names(unclass(p))
NULL
$sec #so I get an error message if I try it, so I need to convert it or coerce it x
coerce
Since Sys.time() is POSIXct by default, I can coerce it to being POSIXlt by using: as.POSIXlt()
<- as.POSIXlt(x)
p $sec p
[1] 52.80618
#or for a new one here:
<- Sys.time()
t1 t1
[1] "2024-12-04 09:22:52 CST"
class(t1)
[1] "POSIXct" "POSIXt"
unclass(t1)
[1] 1733325773
<- as.POSIXlt(Sys.time())
t2 class(t2)
[1] "POSIXlt" "POSIXt"
t2
[1] "2024-12-04 09:22:53 CST"
unclass(t2)
$sec
[1] 53.04481
$min
[1] 22
$hour
[1] 9
$mday
[1] 4
$mon
[1] 11
$year
[1] 124
$wday
[1] 3
$yday
[1] 338
$isdst
[1] 0
$zone
[1] "CST"
$gmtoff
[1] -21600
attr(,"tzone")
[1] "" "CST" "CDT"
attr(,"balanced")
[1] TRUE
str(unclass)
To have a more compact view of unclass()
#to have a more compact view use
str(unclass(t2))
List of 11
$ sec : num 53
$ min : int 22
$ hour : int 9
$ mday : int 4
$ mon : int 11
$ year : int 124
$ wday : int 3
$ yday : int 338
$ isdst : int 0
$ zone : chr "CST"
$ gmtoff: int -21600
- attr(*, "tzone")= chr [1:3] "" "CST" "CDT"
- attr(*, "balanced")= logi TRUE
extract elements
If we want to just use the minutes from t2 above we use
$min t2
[1] 22
weekday
This function returns the day of the week. Remember d1 contains todays date so let’s extract the day of the week from it
<- today()
d1 weekdays(d1)
[1] "Wednesday"
months
Works for month of the year
months(d1)
[1] "December"
months(t1)
[1] "December"
strptime
strptime() converts your dates if they are written in a different format strings. Look at the examples below. I have to pass it a format strings in the arguments. Check ?strptime for details
<- c("January 10, 2012 10:40", "December 9, 2011 9:10")
datestring <- strptime(datestring, "%B %d, %Y %H:%M")
x x
[1] "2012-01-10 10:40:00 CST" "2011-12-09 09:10:00 CST"
class(x)
[1] "POSIXlt" "POSIXt"
strptime 2
Store this in t3: “October 17, 1986 08:24”
<- "October 17, 1986 08:24"
t3 <- strptime(t3, "%B %d, %Y %H:%M")
t4 t4
[1] "1986-10-17 08:24:00 CDT"
difftime
We can substract one time from another for example we assigned Sys.time() to t1 earlier, so we can in a sense check to see if the time now is different than t1 and then we can subtract one from the other. Or we can use difftime() which allows us to set the format/units we are needing to retrieve. Of course if the two vars are within minutes of each other like the first example, it’s useless to be looking for a day difference because it’ll be zero. But let’s use it anyways:
Sys.time() > t1 #this will tell us if time has elapsed
[1] TRUE
Sys.time() - t1 #gives is the difference in this case minutes
Time difference of 0.3285639 secs
difftime(Sys.time(), t1, units ='days')
Time difference of 4.098898e-06 days
difftime() calculates the difference between two timevalues. Using it in many cases will give the right answer but might not be useful for calculations you are doing. In the example below, it gave the correct value in seconds but classified the answer as drtn which when placed in the dataset places a value followed by secs like this: 360 secs, 445 secs… and on and on.
So I had to coerce it into a number by using the as.number function which converted it to a num right in the code, which then gave us the value without the secs part.
#____________________CALCULATE ride_length AND DROP tripduration
<- all_trips19_20 %>%
all_trips19_20 mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = 'secs')))
as.numeric
Aside from using difftime() you can always convert the date to numeric and subtract
<- as.Date("1970-01-01")
date1 date1
[1] "1970-01-01"
<- as.Date("2012-06-21")
date2 date2
[1] "2012-06-21"
as.numeric(date1 - date2)
[1] -15512