Testing tidycovid19 data interface in R

There are various packages available in R for retrieving data on covid19 cases. In a previous post I tested coronavirus, a package available in R providing access to data from J.H. University.

In this post I want to quickly introduce tidycovid19, a nice R package available on github. In the code below I import the package (comments show how to install it from github):


With tidycovid19 it is possible to retrieve data from different sources. Using the function “download_merged_data” data is retrieved from all sources and merged into one data set. In the code below I use the function and display the variables contained by the data set (which is returned in list format):

merged = download_merged_data(cached = TRUE)
##  [1] "country"                "iso3c"                  "date"                  
##  [4] "confirmed"              "deaths"                 "recovered"             
##  [7] "soc_dist"               "mov_rest"               "pub_health"            
## [10] "soc_econ"               "lockdown"               "gcmr_retail_recreation"
## [13] "gcmr_grocery_pharmacy"  "gcmr_parks"             "gcmr_transit_stations" 
## [16] "gcmr_workplaces"        "gcmr_residential"       "gtrends_score"         
## [19] "gtrends_country_score"  "region"                 "income"                
## [22] "population"             "land_area_skm"          "pop_density"           
## [25] "pop_largest_city"       "life_expectancy"        "gdp_capita"            
## [28] "timestamp"

I can get the data from a specific source using a specific function. E.g. “download_wbank_data” pulls the data from the World Bank database.

The tidycovid19 package provides a function for default visualization. This works great for quick visualization. The function is called “plot_covid19_spread” and it creates a plot showing the development of cumulative death cases starting from the day when 100 covid19 cases were known in the respective country, in total.

Below I demonstrate the function, using the merged data set and highlighting Germany, Spain and USA. I show only 25 days after the 100th positive case was registered. By default, the plot uses logarithmic scaling:

                    highlight = c("DEU","USA","ESP"),
                    intervention = "lockdown", 
                    edate_cutoff = 25)

“plot_covid19_spread” provides a wide range of parameters for adjusting the plotting. E.g. the metric type can be changed from “death” count to “confirmed” cases. Also, log-scaling can e.g. be set to FALSE (defaults to TRUE):

                    highlight = c("DEU","USA","ESP"),
                    intervention = "lockdown",
                    type = "confirmed",
                    log_scale = FALSE,
                    edate_cutoff = 25)

You May Also Like

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.