I liked the Financial Times plots for tracking the evolution of COVID-19 (https://www.ft.com/coronavirus-latest), but then they changed to different plots. So here I am more-or-less reproducing those plots (and adding some others). This is generated from an Rmarkdown
document that I’ll be rerendering daily.
For now using the country data from https://covid.ourworldindata.org:
cases = read.csv("https://covid.ourworldindata.org/data/ecdc/total_cases.csv",
stringsAsFactors=FALSE)
cases$date = as.POSIXct(cases$date)
cases$doy = round(as.numeric(difftime(cases$date, as.POSIXct("2019-12-31"), units="days")))
deaths = read.csv("https://covid.ourworldindata.org/data/ecdc/total_deaths.csv",
stringsAsFactors=FALSE)
deaths$date = as.POSIXct(deaths$date)
deaths$doy = round(as.numeric(difftime(deaths$date, as.POSIXct("2019-12-31"), units="days")))
# World population data from worldbank.org:
temp = tempfile()
download.file("http://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=csv", temp)
pop = read.csv(unz(temp, "API_SP.POP.TOTL_DS2_en_csv_v2_1120881.csv"),
skip=4, header=TRUE, stringsAsFactors=FALSE)
rownames(pop) = pop$Country.Name
#deaths.cty = colnames(deaths)
#deaths.cty = gsub(".", " ", deaths.cty, fixed=TRUE)
#ctymap = match(deaths.cty, rownames(pop))
# deaths.cty[which(is.na(ctymap))]
unlink(temp)
Gray dashed lines are doubling times of 1, 2, 3, and 7 days (from steepest to shallowest)
Gray dashed lines are doubling times of 1, 2, 3, and 7 days (from steepest to shallowest)
By “naive” I mean: at any point in time, divide the total cumulative number of deaths by the total cumulative number of confirmed cases. This will be biased high because the denominator is too small because not all cases are detected (due to lack of testing), but on the other hand will be biased low because some current active cases will result in deaths later. It should eventually converge on the true CFR if testing becomes widespread and the epidemic dies down.
So far I’m using data collated by the NY Times. It is pretty strangely structured, but oh well:
states = read.csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv",
stringsAsFactors=FALSE)
states$date = as.POSIXct(states$date)
states$doy = round(as.numeric(difftime(states$date, as.POSIXct("2019-12-31"), units="days")))
# Get this into a more sensible structure:
us.deaths = t(tapply(states$deaths,
list(factor(states$state, levels=sort(unique(states$state))),
factor(states$doy, levels=min(states$doy):max(states$doy))),
identity))
us.deaths = as.data.frame(us.deaths)
dates = sort(unique(states$date))
us.deaths$date = dates
doys = sort(unique(states$doy))
us.deaths$doy = doys
us.cases = t(tapply(states$cases,
list(factor(states$state, levels=sort(unique(states$state))),
factor(states$doy, levels=min(states$doy):max(states$doy))),
identity))
us.cases = as.data.frame(us.cases)
us.cases$date = dates
us.cases$doy = doys
Gray dashed lines are doubling times of 1, 2, 3, and 7 days (from steepest to shallowest)
Gray dashed lines are doubling times of 1, 2, 3, and 7 days (from steepest to shallowest)