Collaborators: moi, yo, myself
Required libraries:
library(tidyverse)
Read the COVID-19 data:
covid <- read_csv("../data/coronavirus.csv",
col_types = cols(Admin2 = col_character(),
Date = col_date('%m/%d/%Y')))
sampleSD <- function() sd(as.numeric(str_trim(str_split(readline('Enter samples, comma-separated: '), ',')[[1]])))
kurtosis <- function(x) mean((x - mean(x))^4)
normSamp <- rnorm(10^5)
heavyTail <- normSamp^2 * sign(normSamp)
thinTail <- sqrt(abs(normSamp)) * sign(normSamp)
kurtosis(normSamp)
## [1] 3.040607
kurtosis(thinTail)
## [1] 1.005853
kurtosis(heavyTail)
## [1] 109.2112
ggplot() +
geom_histogram(aes(x = normSamp, fill = 'normal'), alpha = 0.5) +
geom_histogram(aes(x = heavyTail, fill = 'heavy'), alpha = 0.5) +
geom_histogram(aes(x = thinTail, fill = 'thin'), alpha = 0.5) +
scale_fill_discrete()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
sampleStat <- function() {
fName <- str_trim(readline('Sample statistic name: ' ))
samps <- as.numeric(str_trim(str_split(readline('Enter samples, comma-separated: '), ',')[[1]]))
do.call(fName, list(samps))
}
covid %>%
select(Date, Country_Region, Province_State, Admin2, Case_Type, Cases) %>%
spread(Case_Type, Cases) ->
covid
marchUSA <- covid %>% filter(Country_Region == 'US',
Date >= '2020-03-01',
Date <= '2020-03-31')
ggplot(marchUSA %>% filter(Province_State == "Washington"),
aes(x = Date, y = Confirmed, color = Admin2)) +
geom_smooth() + geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
marchUSA$IFR0 <- marchUSA$Deaths/marchUSA$Confirmed
kingCo <- marchUSA %>% filter(Province_State == 'Washington',
Admin2 == 'King') %>% arrange(Date)
marchUSAtot <- marchUSA %>%
group_by(Date) %>%
summarize(Deaths = sum(Deaths),
Confirmed = sum(Confirmed)) %>%
mutate(IFR0 = Deaths/Confirmed)
ggplot() +
geom_smooth(data = kingCo, aes(x = Date, y = IFR0, color = 'King best guess')) +
geom_smooth(data = kingCo, aes(x = Date, y = IFR0*1.5, color = 'King high')) +
geom_smooth(data = kingCo, aes(x = Date, y = IFR0/10, color = 'King low')) +
geom_smooth(data = marchUSAtot, aes(x = Date, y = IFR0, color = 'USA best guess')) +
geom_smooth(data = marchUSAtot, aes(x = Date, y = IFR0*1.5, color = 'USA high')) +
geom_smooth(data = marchUSAtot, aes(x = Date, y = IFR0/10, color = 'USA low')) +
geom_abline(slope = 0, intercept = 0.001)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
From the March data, it appears at first glance that King County’s Infection Fatality Ratio is higher than the country’s as a whole: King County’s best-case scenario line is over twice as high as the country’s on the whole. However, the sensitivity scenarios show that this difference may be illusory: if the low scenario holds in King County (i.e. the virus spread much more widely than we initially thought, and 10x as many people had it as we believed), then King’s IFR would be below the best guess for the US. Other factors not on the plot, like the fact that the pandemic start in King County and thus there was more time for the disease to take its toll there, are not considered on this plot. Thus, even though the grey uncertainty estimates seem to indicate that the King best guess is notably higher than the US best guess, our sensitivity analysis indicates that the truth may be far less certain.
Finally, all scenarios’ fatality rates are well above that of the seasonal flu. Even if coronavirus isn’t as deadly as we think it is, it’s still worth worrying about much more than the flu, which justifies us taking it seriously.