EPA data analyzed in R for visualizing US powertrain shares

In previous posts I have demonstrated how to access automotive industry data from e.g. the US Federal Reserve, the German VDA automotive manufacturer lobby organisation and also e.g. Kaggle (where you can, as an example, find web-scraped data on second hand car ebay postings in Germany).

In this post I want to provide another link from where you can access data related to US automotive industry. The data in this post has been collected and published by the United States Environmental Protection Agency (EPA). The data can be found here, alongside with other interesting data related to US automotive industry: https://www.epa.gov/automotive-trends/download-automotive-trends-report#Full%20Report

Below I visualize a small subset of the data provided by EPA. The subset provides annual US automobile sales shares by powertrain category. I downloaded the data and subsetted it into a .xls file. In R I can read .xls-files using the readxl package.

The R code for data reading, manipulation and visualization follows below:

# empty memory before we start our analysis
rm(list=ls())
# import readxl, a package for reading in excel files in R
library(readxl)
# read in data stored in .xls file, using read_xls
data_df = as.data.frame(read_xls("powertrains.xls"))
# let us see the header of the data frame
head(data_df)
##   year gasoline hybrid diesel other
## 1 1975    0.998      0  0.002     0
## 2 1976    0.998      0  0.002     0
## 3 1977    0.996      0  0.004     0
## 4 1978    0.991      0  0.009     0
## 5 1979    0.980      0  0.020     0
## 6 1980    0.957      0  0.043     0
library(ggplot2)
ggplot(data_df) + 
  geom_path(mapping=aes(x=year,y=gasoline, color="gas"),size=2,alpha=0.5) + 
  geom_path(mapping=aes(x=year,y=diesel, color="diesel"),size=2,alpha=0.5) + 
  geom_path(mapping=aes(x=year,y=hybrid, color="hybrid"),size=2,alpha=0.5) + 
  geom_path(mapping=aes(x=year,y=other, color="other"),size=2,alpha=0.5) +
  scale_color_manual(name="Legends",
                     values=c(gas="red",
                              diesel="blue",
                              hybrid="green",
                              other="orange")) +
  labs(title="US car sale power train time series",
       subtitle="EPA data, for 1975 - 2019") +
  xlab("year") + 
  ylab("US automobile sales share [%]")

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Close

Meta