library(tidyverse)
library(tidydr)
library(dplyr)
library(gt)
Severe weather effects - U.S.
Case Study
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to answer two basic questions about sever weather events:
Which types of severe weather events are most harmful with respect to population health in the U.S.?
Which types of severe weather events have the greatest economic impact?
Data Processing
Packages
Read data
Since this happens to be a .csv.bz2 folder we can read it directly instead of unzipping it first.
<- read.csv("D:/Education/R/Data/JH_C5_week2/repdata_data_StormData.csv.bz2", header = TRUE) storm_data
Data types
Let’s make sure our data is of a type we can calculate
str(storm_data)
Impact on health
The first goal of our study is to discover the storm events that are most harmful to population health, so we’ll need to
Group the data by storm events and sum the effects caused to the population in fatalities and injuries due to each one
Rank the events from most impact-full to least in order to answer our question
Aggregate
Aggregate() will allow us to group by EVTYP and sum
Separate fatalities and injuries into separate dataframes
Order the totals in descending and choose the most costly 10 events to show in the results section
<- storm_data |>
impact_fatal aggregate(FATALITIES~EVTYPE, sum)
<- impact_fatal[order(impact_fatal$FATALITIES,decreasing = TRUE),]
impact_fatal
<- storm_data |>
impact_injured aggregate(INJURIES~EVTYPE,sum)
<- impact_injured[order(impact_injured$INJURIES,decreasing = TRUE),] impact_injured
Tables
In this section we’ll present the data in table format showing only the 10 most impact-full events, the events that cost the most impact to human health.
Fatalities
<- impact_fatal[1:10,] |>
table_fatal gt() |>
tab_header( title = md("**Number of Fatalities per Event**"),
subtitle = "10 most impactful events") |>
tab_options(table.align = "left", table.width = pct(50))
Injuries
<- impact_injured[1:10,] |>
table_injured gt() |>
tab_header( title = md("**Number of Injuries per Event**"),
subtitle = "10 most impactful events") |>
tab_options(table.align = "left", table.width = pct(50))
Combine the two tables
<- data.frame(fatal=table_fatal, injured=table_injured)
tables |>
tables gt() |>
cols_label(
fatal.EVTYPE = md("**Event**"),
fatal.FATALITIES = md("**Fatalities**"),
injured.EVTYPE = md("**Event**"),
injured.INJURIES = md("**Injuries**")
|>
) tab_header(title= md("**Event Type and Effect on Population of the U.S.**"),
subtitle = "10 most impactful events")
Economic Consequences
Similar to storm effects on human fatality and health, let’s calculate storm damages caused on property and crops. But before we proceed with the aggregation it appears that the data is saved in 5 columns:
colnames(storm_data[c(8,25:28)])
[1] "EVTYPE" "PROPDMG" "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
Multipliers
The documentation implied that the cost columns are abbreviations that contained the following: “K”, “M”, “B”
Verify that with
unique(storm_data$PROPDMGEXP)
[1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
“..EXP” columns are multipliers for values in the “..DMG” columns
K for thousands, M for millions, B for billions and we’ll ignore the rest
We need to convert the “..DMG” columns to real figures before we aggregate over the values
Convert damage costs columns
We’ll just multiply the representation of columns “…EXP” with the “…DMG” columns to give us whole numbers we can perform addition on.
<- storm_data |>
storm_data mutate(PROPDMG_COST= as.numeric(case_when(
== "K" ~ as.character(PROPDMG*1000),
PROPDMGEXP == "M" ~ as.character(PROPDMG*1000000),
PROPDMGEXP == "B" ~ as.character(PROPDMG*1000000000),
PROPDMGEXP TRUE ~ PROPDMGEXP))
|>
) mutate(CROPDMG_COST= as.numeric(case_when(
== "K" ~ as.character(CROPDMG*1000),
CROPDMGEXP == "M" ~ as.character(CROPDMG*1000000),
CROPDMGEXP == "B" ~ as.character(CROPDMG*1000000000),
CROPDMGEXP TRUE ~ PROPDMGEXP))
)
Could use something like this as well:
# could use something like this
#storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "2") | (storm.selected$PROPDMGEXP == "h") | (storm.selected$PROPDMGEXP == "H")] <- 10^2
Aggregate
So let’s group and sum the costs to property and crops for each event:
Aggregate() will group by EVTYPE and sum
Order the totals in descending and choose the most costly 10 events to show in the results section
<- storm_data |>
impact_prop aggregate(PROPDMG_COST~EVTYPE, sum)
<- impact_prop[order(impact_prop$PROPDMG_COST,decreasing = TRUE),]
impact_prop
<- storm_data |>
impact_crop aggregate(CROPDMG_COST~EVTYPE,sum)
<- impact_crop[order(impact_crop$CROPDMG_COST,decreasing = TRUE),] impact_crop
Results
Question 1
- Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Answer
Tornadoes appear to be the most harmful with respect to population health with 5633 fatalities and 91346 injuries
Plot
<- ggplot(impact_fatal[1:5, ],
figure1 aes(x= reorder(EVTYPE, - FATALITIES),
y= FATALITIES,
fill = EVTYPE))
+ geom_col(show.legend = FALSE, width = 0.8, color="black") +
figure1 coord_flip() +
labs(x= "Weather Event Type",
y= "# of People Effected") +
theme_bw()
Tables
Fatalities and Injuries per Event
<- impact_fatal[1:10,] |>
table_fatal gt() |>
tab_header( title = md("**Number of Fatalities per Event**"),
subtitle = "10 most impactful events") |>
tab_options(table.align = "left", table.width = pct(50))
<- impact_injured[1:10,] |>
table_injured gt() |>
tab_header( title = md("**Number of Injuries per Event**"),
subtitle = "10 most impactful events") |>
tab_options(table.align = "left", table.width = pct(50))
Event Type and Effect on Population of the U.S. | |||
---|---|---|---|
10 most impactful events | |||
Event | Fatalities | Event | Injuries |
TORNADO | 5633 | TORNADO | 91346 |
EXCESSIVE HEAT | 1903 | TSTM WIND | 6957 |
FLASH FLOOD | 978 | FLOOD | 6789 |
HEAT | 937 | EXCESSIVE HEAT | 6525 |
LIGHTNING | 816 | LIGHTNING | 5230 |
TSTM WIND | 504 | HEAT | 2100 |
FLOOD | 470 | ICE STORM | 1975 |
RIP CURRENT | 368 | FLASH FLOOD | 1777 |
HIGH WIND | 248 | THUNDERSTORM WIND | 1488 |
AVALANCHE | 224 | HAIL | 1361 |
Question 2:
- Across the United States, which types of events have the greatest economic consequences?
Answer
Flood appears to be the most costly with $144.67 B
Plot
<- ggplot(impact_prop[1:5, ],
figure2 aes(x= reorder(EVTYPE, - PROPDMG_COST),
y= PROPDMG_COST,
fill = EVTYPE))
+ geom_col(show.legend = FALSE, width = 0.8, color="black") +
figure2 coord_flip() +
labs(x= "Weather Event Type",
y= "Economic Loss in (USD)") +
theme_bw()
Tables
Damage to Property
Property Damage Cost per Event | |
---|---|
10 most impactful events | |
EVTYPE | PROPDMG_COST |
FLOOD | 144.66B |
HURRICANE/TYPHOON | 69.31B |
TORNADO | 56.93B |
STORM SURGE | 43.32B |
FLASH FLOOD | 16.14B |
HAIL | 15.73B |
HURRICANE | 11.87B |
TROPICAL STORM | 7.70B |
WINTER STORM | 6.69B |
HIGH WIND | 5.27B |
Damage to Crops
Crop Damage Cost per Event | |
---|---|
10 most impactful events | |
EVTYPE | CROPDMG_COST |
DROUGHT | 13.97B |
FLOOD | 5.66B |
RIVER FLOOD | 5.03B |
ICE STORM | 5.02B |
HAIL | 3.03B |
HURRICANE | 2.74B |
HURRICANE/TYPHOON | 2.61B |
FLASH FLOOD | 1.42B |
EXTREME COLD | 1.29B |
FROST/FREEZE | 1.09B |