There are various packages available in R for retrieving data on covid19 cases. In a previous post I tested coronavirus, a package available in R providing access to data from J.H. University.
In this post I want to quickly introduce tidycovid19, a nice R package available on github. In the code below I import the package (comments show how to install it from github):
#library(devtools)
#install_github("joachim-gassen/tidycovid19")
library(tidycovid19)
With tidycovid19 it is possible to retrieve data from different sources. Using the function “download_merged_data” data is retrieved from all sources and merged into one data set. In the code below I use the function and display the variables contained by the data set (which is returned in list format):
merged = download_merged_data(cached = TRUE)
colnames(merged)
## [1] "country" "iso3c" "date"
## [4] "confirmed" "deaths" "recovered"
## [7] "soc_dist" "mov_rest" "pub_health"
## [10] "soc_econ" "lockdown" "gcmr_retail_recreation"
## [13] "gcmr_grocery_pharmacy" "gcmr_parks" "gcmr_transit_stations"
## [16] "gcmr_workplaces" "gcmr_residential" "gtrends_score"
## [19] "gtrends_country_score" "region" "income"
## [22] "population" "land_area_skm" "pop_density"
## [25] "pop_largest_city" "life_expectancy" "gdp_capita"
## [28] "timestamp"
I can get the data from a specific source using a specific function. E.g. “download_wbank_data” pulls the data from the World Bank database.
The tidycovid19 package provides a function for default visualization. This works great for quick visualization. The function is called “plot_covid19_spread” and it creates a plot showing the development of cumulative death cases starting from the day when 100 covid19 cases were known in the respective country, in total.
Below I demonstrate the function, using the merged data set and highlighting Germany, Spain and USA. I show only 25 days after the 100th positive case was registered. By default, the plot uses logarithmic scaling:
plot_covid19_spread(merged,
highlight = c("DEU","USA","ESP"),
intervention = "lockdown",
edate_cutoff = 25)
“plot_covid19_spread” provides a wide range of parameters for adjusting the plotting. E.g. the metric type can be changed from “death” count to “confirmed” cases. Also, log-scaling can e.g. be set to FALSE (defaults to TRUE):
plot_covid19_spread(merged,
highlight = c("DEU","USA","ESP"),
intervention = "lockdown",
type = "confirmed",
log_scale = FALSE,
edate_cutoff = 25)
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply