In previous posts I have demonstrated how to access automotive industry data from e.g. the US Federal Reserve, the German VDA automotive manufacturer lobby organisation and also e.g. Kaggle (where you can, as an example, find web-scraped data on second hand car ebay postings in Germany).
In this post I want to provide another link from where you can access data related to US automotive industry. The data in this post has been collected and published by the United States Environmental Protection Agency (EPA). The data can be found here, alongside with other interesting data related to US automotive industry: https://www.epa.gov/automotive-trends/download-automotive-trends-report#Full%20Report
Below I visualize a small subset of the data provided by EPA. The subset provides annual US automobile sales shares by powertrain category. I downloaded the data and subsetted it into a .xls file. In R I can read .xls-files using the readxl package.
The R code for data reading, manipulation and visualization follows below:
# empty memory before we start our analysis rm(list=ls()) # import readxl, a package for reading in excel files in R library(readxl) # read in data stored in .xls file, using read_xls data_df = as.data.frame(read_xls("powertrains.xls")) # let us see the header of the data frame head(data_df)
## year gasoline hybrid diesel other ## 1 1975 0.998 0 0.002 0 ## 2 1976 0.998 0 0.002 0 ## 3 1977 0.996 0 0.004 0 ## 4 1978 0.991 0 0.009 0 ## 5 1979 0.980 0 0.020 0 ## 6 1980 0.957 0 0.043 0
library(ggplot2) ggplot(data_df) + geom_path(mapping=aes(x=year,y=gasoline, color="gas"),size=2,alpha=0.5) + geom_path(mapping=aes(x=year,y=diesel, color="diesel"),size=2,alpha=0.5) + geom_path(mapping=aes(x=year,y=hybrid, color="hybrid"),size=2,alpha=0.5) + geom_path(mapping=aes(x=year,y=other, color="other"),size=2,alpha=0.5) + scale_color_manual(name="Legends", values=c(gas="red", diesel="blue", hybrid="green", other="orange")) + labs(title="US car sale power train time series", subtitle="EPA data, for 1975 - 2019") + xlab("year") + ylab("US automobile sales share [%]")
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python