# Total Vehicle Sales Forecast

ECO 309 Economic Forecasting Final Project

Projektarbeit 2013 48 Seiten

## Executive Summary

For this project I created a twelve month forecast for Total Vehicle Sales in the United States using four different methods. These four techniques are called exponential smoothing, decomposition, ARIMA, and multiple regression. To do so I picked one dependent (Y) variable along with two independent (X) variables and collected 80 monthly observations for each variable. This historical data allowed me to create four different forecasting models which predict future Vehicle Sales with low risk of error. The best model according to the lowest error measures was winter’s exponential smoothing method because it had the lowest MAPE along with the lowest RMSE for the fit period as well as the forecast period.

## Introduction

I chose the Y variable to be Total Vehicle Sales in the United States because I have a strong interest in the auto industry and would like to work for a German car maker in the future. The auto industry is very vulnerable to the state of the economy because people tend to postpone high-item purchases like a car when times are tough. Therefore, the variables that cause a change in vehicle sales numbers must be indicators of economic performance. In order to forecast the dependent variable Y (Total Vehicle Sales), I chose two independent variables, X1 and X2 that are closely related to Y. These are going to be Employment non-farm and the Personal Saving Rate. The hypothesis I make for the first X variable is that employment numbers are logically related to vehicle sales because the more people are in the workforce, the more people earn an income which is necessary to make high-item purchases like a personal car. The hypothesis for the second X variable is that the personal saving rate has an inverse linear relationship to vehicle sales because the more people hold on to their disposable income, the less spending occurs which hurts vehicle sales numbers.

Since I am using three completely different variables in my forecast, the means, ranges, and standard deviations for each variable are going to differ from each other. In order to avoid forecasting difficulty, it is important to look at the variations about the mean value for each variable. The Y variable Total Vehicle Sales has a mean value of 1130.5, a range of 919.9, and a standard deviation of 243.0. Since it is important that the standard deviation is less than 50% of the mean value to avoid forecasting difficulty, these numbers indicate that I should be able to get a pretty accurate forecast. The X variable Employees non-farm shows a mean value of 133,784 with a standard deviation of 3,463 and a low range of 11,769 which are great numbers for an independent variable. The X variable Personal Saving Rate with a mean value of 4.157 and a standard deviation of 1.389 also indicate that I should not run into difficulties producing a forecast. Below are descriptive statistics for all the variables used:

## Descriptive Statistics: Total Vehicle Sales, Employees non farm, Saving Rate

Variable N N* Mean SE Mean StDev Minimum Q1 Median

Total Vehicle Sales 68 0 1130.5 29.5 243.0 670.3 967.4 1090.3

Employees non farm 68 0 133784 420 3463 127374 130916 133209

Saving Rate 68 0 4.157 0.168 1.389 2.000 2.800 4.350

Variable Q3 Maximum

Total Vehicle Sales 1282.5 1590.2

Employees non farm 137029 139143

Saving Rate 5.200 8.300

Looking at the time series plot for the Y variable Total Vehicle Sales, one can notice a slight negative trend over the 68 observations studied. This characteristic is proven by the autocorrelation function which shows that the autocorrelation coefficients remain fairly large for several time periods before slowly declining. Furthermore, there could be a seasonal pattern in the Y variable because there are spikes at the 12th and 24th lag of the autocorrelation function. This can be logically explained by the holiday sales events car dealers have during the Christmas season. The time series plot for the X variable Employees non-farm is very closely related to the Y variable and also shows a negative trend and seasonality. Furthermore, there is a noticeable cyclical pattern as well. The second X variable Personal saving rate shows a slight positive trend and cycle only. There is no seasonality here because the data I found was seasonally adjusted. Below are the three time series plots for all variables:

illustration not visible in this excerpt

In order to be able to show the YX variable relationship, scatter plots are a great tool. Both of the X variables have a moderate to strong linear relationship with the Y variable. However, the linear relationships are of different nature. While the variables Employees and Vehicle Sales are positively linearly related, the X variable Personal Saving Rate and Vehicle Sales exhibit a pretty strong negative linear relationship. The strength of this linear relationship is shown by the slope of the regression line in each scatter plot. Many values are very close to the regression line which is indicative of a strong linear relationship. However, there are also a few values that are far from the regression line which shows that there are extremes as well. Below are the scatter plots for each XY relationship:

illustration not visible in this excerpt

In researching X variables that help forecast the Y variable, the correlation matrix is the most important tool for forecasting personnel. It shows two values that measure the relationship between each variable. The Pearson correlation shows how strong the linear relationship is between two variables and the P-Value states the confidence interval which is an important factor in the decision to use a certain X variable. One wants to have at least 95% confidence. Both X variables have strong Pearson correlations with the Y variable and a perfect 0 P-Value which makes these variables significant and acceptable to use in the forecast. Furthermore, the correlations between the two X variables are less than each X variable’s correlation with the Y variable. Since these correlations are logical and prove the hypothesis made earlier, I will go on with the forecast. Below is the correlation matrix for all variables:

Correlations: Total Vehicle Sales, Employees non-farm, Saving Rate

Total Vehicle Sa Employees non-fa

Employees non fa 0.673

0.000

Saving Rate -0.608 -0.438

0.000 0.000

Cell Contents: Pearson correlation

P-Value

Body

Exponential Smoothing

The correct exponential smoothing method depends on the characteristics of the Y data. As you can tell from the time series plot below, the data series has a negative trend and seasonality shown by the repeated annual spikes in the data.

illustration not visible in this excerpt

In order to further analyze the characteristics of the Y data, it is helpful to look at the autocorrelation function. The autocorrelation coefficients reveal that the data series definitely has negative trend as seen by the slowly decreasing autocorrelation coefficients. There is seasonality, although not significant, as shown by the spike in the 12th lag. Furthermore, there is also some sort of a cycle since the coefficients often go up and down.

illustration not visible in this excerpt

Since the data series has a trend and seasonality, the best method to use is winter’s exponential smoothing technique. It is the only method that can capture seasonality. The plot for Total Vehicle Sales using Winter’s method of exponential smoothing is seen below:

illustration not visible in this excerpt

The exponential smoothing model coefficients that gave the lowest MAPE accuracy measures are alpha (level)= 0.6, gamma (trend)= 0.1, and delta (seasonal)= 0.8. A table showing the Y data (excluding hold out period), the fit values, and the corresponding residuals is included in the appendix.

The goodness to fit measures attained with this model are MAPE= 5.89% , MAD= 63.70, MSD= 7698.66, and RMSE= 87.74. These accuracy measures are pretty good and indicate that an accurate forecast can be made. It can be seen that the fit graph is very close to the Y data graph which indicates that trend, cycle, and seasonality has been accounted for. Below is a time series plot of the Y data compared with the Fit period and a time series plot of the residuals:

illustration not visible in this excerpt

illustration not visible in this excerpt

It can be seen that there are no significant signs of trend, cycle, or seasonality in the residual’s time series plot. Most values are around 0 which indicates randomness. In order to prove randomness, the autocorrelation function of the residuals can help. Below is the autocorrelation function of the residuals:

illustration not visible in this excerpt

Residual analysis with the autocorrelation function shows that there is no autoregressiveness in the residuals because no coefficients exceed the t-value lines. Furthermore, the LBQ coefficient at the 24th lag is 33.33 which is below the critical value of chi-square of 36.41.

The histogram of the residuals however does show a slight skew to the left which is indicating an underestimation bias. This fact is supported by the mean shift to the right. The mean of the residuals is 5.351 which is still very close to zero. Therefore, one can say that the residuals are random and the distribution is normal.

illustration not visible in this excerpt

It is proven that the residuals are random. The trend, cycle, and seasonality that existed in the original data series is not seen in the residuals. This shows that the model is successful at picking up the systematic variation of the Y data series. Therefore, the model will be able to generate an accurate forecast.

Below are the one year forecast and a time series plot for the Y data series including the hold out period (index 69-80):

illustration not visible in this excerpt

The accuracy of the forecast for the hold out period is MAPE= 5.16809 and RMSE= 74.7947. When comparing the time series plot of the one year forecast with the actual hold out data for this period, we can see that both variables are very close to each other. The forecast and hold out variable cross each other multiple times. Therefore, there is little over- or underestimation.

illustration not visible in this excerpt

The forecast period residuals seem to be pretty random with the exception of Index 7 where the residual has an extreme negative. However, the autocorrelation function proves that there are no significant systematic patterns in the residuals because all coefficients are far from the t-value boundaries. Furthermore, the LBQ value of 7.13 at the 10th lag is far below the critical value of chi-square of 18.3070. Below are the time series plot of the forecast residuals and the autocorrelation function:

illustration not visible in this excerpt

The error measures improved from the fit to the hold out period. The MAPE of 5.89% improved to 5.17% and RMSE of 87.74 was lowered to 74.79. Furthermore, the forecast residuals show that trend, cycle, and seasonality have been accounted for so that there is no bias in the forecast. These results prove that the forecast accuracy is acceptable and the model successful.

## Decomposition

The results table of the decomposition method of forecasting is included in the appendix.

In order to determine the seasonal component of the Y data, one needs to look at the seasonal indices. The seasonal indices for 12 periods (monthly data) as well as a time series plot of the seasonal indices are displayed below:

SeasonalIndices

Period Index

1 1.14258

2 1.01495

3 1.05177

4 1.13192

5 0.94791

6 0.92322

7 0.84718

8 1.06925

9 0.79222

10 0.91807

11 1.12809

12 1.03285

illustration not visible in this excerpt

When looking at the seasonal indices, one can notice that Vehicle Sales are periodic. There are relatively high sales in the Christmas season, early spring, and late summer and relatively low sales in late spring, early summer. The seasonal analysis will help to adjust the Y data for seasonality.

Below is a time series plot comparing the Y data with the decomposition deseasonalized data:

illustration not visible in this excerpt

From this comparison it is noticeable that the deseasonalized variable contains much less spikes and extreme up and down movements as the original Y data. This shows us that the strong seasonality that exists in the original Y data has been adjusted in the deseasonalized plot.

The “goodness to fit” as measured by the MAPE and RMSE indicates a large decrease in accuracy compared to the previous model. The MAPE went up to 12.7% and the RMSE went up to 157.146. This accuracy might be too high to accept, however it can be adjusted through the cycle factors.

To determine the residual distribution, one needs to look at the time series plot, the autocorrelation function, as well as a histogram of the residuals of the fit period:

illustration not visible in this excerpt

Based on these graphs, it can be said that the residuals are definitely not random. I can detect significant trend and some cycle by looking at the autocorrelation function. The very high LBQ value of 245 at the 12th lag proves that the residual distribution is autoregressive which means it is not random. The mean however is very close to 0. Since it is negative, there will be a slight tendency to over forecast.

The one-year forecast for the hold out period using the decomposition model is displayed below:

illustration not visible in this excerpt

However, since the accuracy measures we observed earlier were too high, I have adjusted the forecast data with the last cycle factor. The time series plot of the Y data including the adjusted one-year forecast is displayed below:

illustration not visible in this excerpt

The adjustment was necessary because the decomposition model did not pick up cycle. The adjustment raised the forecast data by about 33% because the last cycle factor in the Y data was 1.33. The new forecast improved accuracy measures significantly. The MAPE was lowered to 8.22103% and the RMSE is now at 130.279.

The closeness of the new forecast to the actual hold out period can be seen in the following time series plot:

illustration not visible in this excerpt

Finally, we will look at the time series plot of the forecast residuals. The forecast residuals are much closer to 0 than before the adjustment through the cycle factor. However, there is still trend and cycle in the residuals.

illustration not visible in this excerpt

ARIMA

Based on the analysis of the Y time series plot and autocorrelation function, it will be necessary to difference the data to make it stationary. Since the Y data has significant trend, the first step is to difference the data for trend. The time series plot and autocorrelation function for the first trend difference are: Abbildung in dieser Leseprobe nicht enthaltenAbbildung in dieser Leseprobe nicht enthalten

We can see that there is no significant trend anymore. Therefore we will use only one difference for the nonseasonal model. To determine which model to use we need to also show the PACF for the first trend difference:

illustration not visible in this excerpt

Based on the ACF and PACF of the first trend difference, we can determine that this is an MA 1 model because of one significant negative spike in the ACF and the PACF coefficients slowly approaching zero. Since the data also has seasonality, we will also take the seasonal difference. The time series plot and ACF for the first seasonal difference is shown below: Abbildung in dieser Leseprobe nicht enthalten Abbildung in dieser Leseprobe nicht enthalten

We can see that the first difference is sufficient to take out significant seasonality and make the time series stationary. To determine which seasonal model coefficient to use, we need to look at the PACF below:

illustration not visible in this excerpt

It can be determined that the ARIMA model for the seasonal difference is also MA1. Therefore the menu section of the best ARIMA model should be (0,1,1) for the seasonal as well as non-seasonal section. After running this model we get the following results:

Final Estimates of Parameters

Type Coef SE Coef T P

MA 1 0.3938 0.1263 3.12 0.003

SMA 12 0.8476 0.1199 7.07 0.000

Differencing: 1 regular, 1 seasonal of order 12

Number of observations: Original series 68, after differencing 55

Residuals: SS = 451577 (backforecasts excluded)

MS = 8520 DF = 53

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag 12 24 36 48

Chi-Square 10.8 31.8 44.0 51.2

DF 10 22 34 46

P-Value 0.371 0.081 0.116 0.277

We can see that the coefficients for the MA 1 and SMA 12 model have a t-value over 1.96 and a p-value very close to zero. Also, the P-value in the Box-Pierce statistic is above 0.05. These results are very good because they indicate that the model coefficients are significant and the ARIMA model should produce good forecast results.

In order to determine the accuracy for the fit period, we will need to look at the MAPE and RMSE. These are shown below:

MAPE= 6.51131% RMSE= 90.6118

The Fit MAPE of 6.5% is pretty decent along with the RMSE of 90.6118. Therefore we can determine that this model is accurate based on error measures. Furthermore, like said above, the LBQ associated p-values for the lag periods 12, 24, 36, and 48 are all above 0.05 which allows us to declare the residuals random. After running the ARIMA model we produce a 12-month forecast. A time series plot of the forecast residuals is displayed below:

illustration not visible in this excerpt

Based on the LBQ values we can say that the forecast residuals are random because they are below the chi square value at lag 12. However, the time series plot does indicate a little trend.

The accuracy measures for the forecast period compared to the hold out period are displayed below:

MAPE= 6.49819% RMSE= 96.7536

The accuracy measures went down slightly. However, it can be said that this is normal and we can still declare the ARIMA model accurate. A time series plot of the Y variable including the 12-month forecast is shown below:

[...]

## Details

Seiten
48
Jahr
2013
ISBN (eBook)
9783656735625
ISBN (Buch)
9783656735601
Dateigröße
880 KB
Sprache
Englisch
Katalognummer
v279609
Note
1,0
Schlagworte
Eco forecasting vehicle sales arima winter's method variables 