In this post I provide an example of CAGR-based forecasting, using OICA vehicle production data for Chinese automotive industry.
CAGR is the compounded average growth rate.
If e.g. production output in year 2000 is 1,000,000 units then, if CAGR = 3% is expected, production output after 10 years would be calculated and expected to be:
CAGR-based forecasting consists of a two-step workflow:
- calculate CAGR from historical data
- calculate future values assuming historical CAGR
If annual values are predicted then CAGR must be calculated based on annual historical values. If monthly values are predicted then CAGR must be calculated from monthly historical values. And so on.
Below I use CAGR methodology to forecast future automotive production output in number of units produced annually. I use OICA automotive indistry production data to calculate historical annual CAGR and to predict future production output based on the calculated CAGR value.
First step is to read in the data. This step comprises filtering, too.
# import packages library(readxl) library(dplyr) # import data data_df = as.data.frame(read_xls("oica.xls")) # filter out years of interest and Chinese data only data_df = dplyr::filter(data_df,year>=2005,country=="China") # view header of filtered table head(data_df)
## year country total ## 1 2018 China 27809196 ## 2 2017 China 29015434 ## 3 2016 China 28118794 ## 4 2015 China 24503326 ## 5 2014 China 23731600 ## 6 2013 China 22116825
# view tail of filtered table tail(data_df)
## year country total ## 9 2010 China 18264761 ## 10 2009 China 13790994 ## 11 2008 China 9299180 ## 12 2007 China 8882456 ## 13 2006 China 7188708 ## 14 2005 China 5717619
Next, the historical CAGR can be calculated. In this case for years 2005 to 2018:
# calculate historical CAGR cagr = (data_df$total/data_df$total[length(data_df$total)])^(1/(length(data_df$total)-1)) - 1
Using the historical annual CAGR value I predict Chinese automotive production output measured in units produced annually, for 2030:
# predict production output in 2030, based on production output in 2018 and based on CAGR value from 2005 to 2018 data_df$total*(1+cagr)^(2030-2018)
##  119761592
CAGR-based forecasting is clearly naive. This forecasting methodology only works under strong assumptions, such as e.g. the assumption that growth is unlimited. In the case if automotive vehicle production this is not feasible. Hence, this forecasting methodology may only be applied to a limited time horizon. Moreover, CAGR-based forecasting requires some variance tolerance.
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python