Application of Time Series Analysis to Traffic Accidents in Los Angeles

11/28/2019 ∙ by Qinghao Ye, et al. ∙ University of California, Riverside 0

With the improvements of Los Angeles in many aspects, people in mounting numbers tend to live or travel to the city. The primary objective of this paper is to apply a set of methods for the time series analysis of traffic accidents in Los Angeles in the past few years. The number of traffic accidents, collected from 2010 to 2019 monthly reveals that the traffic accident happens seasonally and increasing with fluctuation. This paper utilizes the ensemble methods to combine several different methods to model the data from various perspectives, which can lead to better forecasting accuracy. The IMA(1, 1), ETS(A, N, A), and two models with Fourier items are failed in independence assumption checking. However, the Online Gradient Descent (OGD) model generated by the ensemble method shows the perfect fit in the data modeling, which is the state-of-the-art model among our candidate models. Therefore, it can be easier to accurately forecast future traffic accidents based on previous data through our model, which can help designers to make better plans.



There are no comments yet.


page 14

page 15

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As the country with the highest car ownership and the second largest car ownership per capita in the world, the traffic problem in the United States has always been the focus of people. According to The World Bank’s statistics "World Bank Data: Motor vehicles (per 1,000 people)" 26, the number of cars per thousand people in the United States reached 910 in 2017, and the total amount reached 255,009,283 13.

Los Angeles is the largest city in California and the second largest city in the United States, second only to New York City. The city covers an area of 469.1 square miles (1214.9 square kilometers). The Los Angeles metropolitan area consists of Los Angeles, Long Beach and Anaheim, has a population of approximately 13.31 million, and the greater Los Angeles area contains 18.7 million residents totally Bureau (2016). Both statistical data are roughly equal to New York.

Today, Los Angeles has developed into one of the greatest international metropolis, with highly developed industry of culture, science, technology, sports, international trade and education. Among all these industries, the thriving development of the culture and entertainment industry, which produces great films, TV series, and music, plays the most significant role, also forms the foundation of Los Angeles’ international reputation and global status.

For entertainment industry, Hollywood is the entertainment center of Los Angeles. It takes about 40 minutes by bus from downtown to Hollywood. Since, 1911, the very year the first film company established in Hollywood, the area has become the world’s film center. With so many stars and great movies, it is safe to say that Hollywood is the pioneer of global film industry. Celebrity Avenue is a world famous sight located in Hollywood, records more than 2000 highly-achieved celebrities in the fields of film, broadcasting and music.

When it comes to the education industry, the education in Los Angeles is well-known all over the world. Los Angeles is home to the world-famous California Institute of Technology, the University of California Los Angeles and University of Southern California. Apart from famous universities, Los Angeles Public Library, Huntington Library and Getty Museum are also important cultural and educational facilities.

In the sports industry, Los Angeles has also developed a very great environment for sports, provides opportunities for all kinds of sports. Los Angeles Angels, a Major League Baseball team in Los Angeles, California, part of the Pacific Coast League. Los Angeles Lakers, a basketball club founded in Minneapolis in 1947 but developed in Los Angeles, won sixteen championships in National Basketball Association. Also, Los Angeles held Olympic Summer Games twice in 1932 and 1984.

Due to the high level of development in various industries, Los Angeles, as an international metropolis, inevitably attracts a large number of immigrants to settle down. Among all the residents in Los Angeles, 11.2% of them are immigrants from other countries, of which Chinese are the most, accounting for 7.1%.

As the population continues to expand, the traffic, inevitably become a serious question. Simultaneously, Los Angeles government is constantly improving the urban transportation system: Los Angeles has one of the world’s largest highway systems, yet it is still the top 10 worst traffic city in the United States Sorensen (2009). The main roads connecting Los Angeles and surrounding cities include Interstate 5, Interstate 10, and US National Highway 101. The Los Angeles County Metropolitan Transportation Authority and other agencies operate numerous bus lines as well as subway and light rail lines. Los Angeles rail transit includes red and purple subway lines, as well as gold, blue, expo and green light rail lines. The orange and silver lines are BRT lines with stops and frequencies similar to light rails. The main train station in Los Angeles is the Union Station in the northeast of the city center, the center of the Southern California Metropolis Railway. The system connects Los Angeles with all neighboring counties and many suburbs.

Los Angeles metropolis (surrounding areas included) is the second largest metropolitan area in the United States. According to survey, approximately a third of the total population resides in the City of Los Angeles Ong and Miller (2005). However, the annual flow of passengers in the Los Angeles subway is only about 48.7 million. In contrast, in Hong Kong, where the population is much smaller than Los Angeles, the subway transports 1.563 billion passengers a year, almost the 30 times of that of Los Angeles.

A large number of population will inevitably bring a lot of traffic demand, but the low public transportation utilization rate has caused Los Angeles to become one of the ten most congested cities in the United States. What followed is the inevitable traffic accidents and safety problems. Our group selected traffic collision reported to LAPD as our data set, hoping to have a better understanding of the traffic condition in LA.

In short, the main contribution of this work can be highlighted in following aspects:

  • By adding the Fourier items, the complex seasonality can be well modeled by an intuitive manner. The proposed ensembled model outperforms other methods by a large margin.

  • Model diagnostics are fully conducted, and the components of ensemble model are also investigated.

The remaining parts are organized as follows. Section 2 generally explores the time series we are interested in and Section 3 introduces plenty of time series modeling methods. Then, Section 4 describes a number of experiments carried out on the training and validation set and reports the models’ performance. After that, Section 5 uses several diagnostic techniques to evaluate the validity of the candidate models. Finally, we conduct the forecast in Section 6 and conclude this paper.

2 Data Exploration

2.1 Data Description

(a) Original series.
(b) Seasonal plot of original series distribution.
Figure 1: Time series of traffic accidents in Los Angeles.

The Los Angeles traffic accidents are collected by City of Los Angeles from 2010 to 2019 monthly. The corresponding values are integer. The data set contains 113 data points. The plot of the data and data distribution are demonstrated in Figure 1. It shows that accidents are more likely to happen in October and less likely in February.

2.2 Box-Cox Transformation

In the real-world scenario, the distribution of running times data may not be normally distributed, which results in the distribution being asymmetrical leading to a bias in the model. To deal with the problem, George Box et al.

Box and Cox (1964)

propose a data transformation technique named Box-Cox Transformation which helps to stabilize the variance of the series. The mathematical expression is written as


where is the power coefficient of the transformation that we need to select. We can find out that when , the whole series is identical only with shift. When , the transformation is the logarithm. However, it is difficult to choose the parameter since the searching space is infinite. Therefore, Guerrero (Guerrero, 1993)

proposes a method to estimate the value of

which aims to minimize the coefficient of variation for subsequences of the original series. Figure 2 shows the original series and transformed series with .

When we tend to inverse the process of Box-Cox transformation, the transformed predicted value will not be the mean of the forecast distribution but the median. Since the median cannot be added up, so we should use bias adjustments to find the corrected version of the inverse of Box-Cox transformation, which is formulated as follows


where is the -step predict variance, the larger variance, the larger difference between mean and median.

2.3 Data Characteristics

Understanding of time series data is an essential part of the time series analysis process. Before we apply the model to fit the data, we should explore the statistical characteristics of the time series data which contains seasonality, trend, noisy, chaos, etc. In this part, we consider six main statistical characteristics of the given data. In order to vividly understand the data, we normalize the different metrics into to the significance of the characteristics that are presented or not. The larger the normalized value, the more significance it shows.

Trend and Seasonality

Because decomposition of time series can be applied to measure the trend and seasonality in time series. Following the STL decomposition Cleveland et al. (1990), we decompose the series as , which are trend component, seasonal component, and remainder component, respectively. Here, we utilize the strength of trend and the strength of seasonality proposed by Wang et al. Wang et al. (2006) to measure the trend and seasonality. The mathematical expressions are presented as

For strongly trend or seasonal series, there should be have more variation in trend or seasonality than remainder component leading to the larger or .

(a) Origin series without Box-Cox transformation.
(b) Box-Cox transformed series with .
Figure 2: Number of Traffic Accidents in Los Angeles data, which is a daily time series. The original series and transformed series are shown in the above figures.
Serial Correlation

To estimate the serial correlation, we use the Box-Pierce statistics in our time series analysis process. Box-Pierce statistic is formulated as , where is the autocorrelation with time lag , is the length of the sequence, and is the maximum lag we want to measure (we set ).


When analyzing the data, we are interested in the symmetry of the distribution of the data. To quantify the the degree of asymmetry of sequence values around the mean value, the skewness coefficient

adopted. It should be noticed that symmetric data lead to the skewness coefficient close to zero. Oppositely, if is large, the data would more likely be asymmetrical around the mean value.


In order to measure whether the time series is peaked or flat, we use Kurtosis coefficient

. For low kurtosis coefficient , the series tend to become flat rather than a sharp peak near the mean. So we can use the Kurtosis coefficient to measure whether it has a heavy tail or not.


Since the traditional autoregressive models are only capable of capturing short-range correlation of the time series data

Hosking (1984). To measure whether the data is self-similarity (long-range dependent), we use self-similarity parameter Hurst exponent Teräsvirta (1996) to quantify the long-range dependence. We follow Wang et al. Wang et al. (2006) to achieve the value. The higher the value , the more long-range dependence the data exhibits.

Trend Seasonality Serial Correlation Skewness Kurtosis Self-similarity
0.9370 0.6215 0.9852 0.0567 0.0039 0.9818
Table 1: Scaled statistical characteristics of the traffic accident time series. The higher statistics means higher possibility that the phenomenon would occur.

By using these statistical characteristics, we compute these statistics on the time series of traffic accidents in Log Angeles. Then we scale these values into for easier interpretation. The scaled statistical characteristics are recorded in Table 1. As the table shows, the traffic accident sequence has a trend and slightly seasonality, and we can observe that the series is highly correlated. Additionally, the distribution of the series value is balance since the skewness and Kurtosis coefficient are relatively low, showing the normality assumption of the data. On the other hand, the self-similarity parameter is high showing that the current value in the sequence is correlated with the data from a long time ago. Therefore, we might use some models with the ability to capture long-range dependence.

Besides, we examine the time series of traffic accidents is stationary or not. We adopt Augmented Dickey-Fuller test (ADF) to verify the result. ADF tests the null hypothesis that there is a unit root is present in the time series. If the time series has a unit root, it means that the testing series is not stationary and has some time-related structure. If the series is not stationary, we can use take the difference to the series several times to make it stationary. The results of ADF test are described in Table

2. In the table, we can observe that the p-value is large when we testing original series indicating that we failed to reject null hypothesis, which means that the original series has a unit root and is non-stationary. On the other hand, after taking the first-order difference of the series, it results in the small p-value showing the differenced series does not have a unit root and is stationary, which suggests that we should take first-order difference to the series before we applying the stationary model.

Testing Series Dickey-Fuller Lag order p-value Stationary
-1.6938 4 0.7022 No
-5.0588 4 < 0.01 Yes
Table 2: Augmented Dickey-Fuller test results for the traffic accidents series.

In addition, we compute the autocorrelation and partial autocorrelation among original and first-order differenced series, which is presented in Figure 3. In Figure 2(a), we can clearly see from the ACF plot that the time series is strongly correlated with its past, showing that it is not proper to use ARMA family model. Therefore, after taking the first-order difference, we still can clearly see the data is highly correlated since there are a large quantity of spikes in the ACF plot. But according to the result of ADF test above, we know the first-order differenced series is stationary, which means that we can fit ARMA family model to the data. From the PACF plot in Figure 2(b), we empirically regard its shape as the MA(1) process due to the fact that only lag 1 shows negative strong partial autocorrelation. Therefore, when we fit MA model to the first-order differenced series, we may choose as the order.

(a) ACF plot and PACF plot for the original series.
(b) ACF plot and PACF plot for the first-order differenced series.
Figure 3: ACF plots and PACF plots for original and first-order differented series, which are used for choosing the proper AR or MA family model.

When it comes to ARMA model choosing, we also calculate the extended autocorrelation function (EACF) of first-order differenced series in order to select the proper order . The calculated extended autocorrelations are demonstrated in Table 3. From the table, we can observe plenty of false negative results. However, it can be clearly seen that the triangle of "o"s starts at meaning that we might use ARIMA(1,1,1) model.

AR 0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o x x x x o o x x x o
1 x o o o o o x x o o o x o o
2 x o x o o o o o o o o x o o
3 o x o o o o o o o o o x o o
4 x x x o x o o o o o o o o o
5 x x x o o o o o o o o o o o
6 x x x o o o o o o o o o o o
7 x o x o x o x o o o o o o o
Table 3: EACF table for the first-order differenced series used for choosing the proper orders for ARMA model.

3 The Proposed Approaches

In this section, we first introduce a variety of parametric methods to model the time series data, which employ parameters to model the linear relation within residuals or previous data, like AutoRegressive model (AR), Moving Average model (MA), and AutoRegressive Integrated Moving Average model (ARIMA). After introducing ARIMA family model, we utilize bootstrapping sampling methods to augment the data set and perform the exponential smoothing methods to model the series. Beyond these two family models, we also perform Fourier terms to enhance the models’ ability to model the complex and multiple seasonality. At last, we propose the final model by adapting ensemble learning technique from the statistical learning area.

3.1 AutoRegressive Model (AR)

Intuitively, the sequence data tends to show some correlations with the previous data or trend, which means that future trend might base on the previous data points. Therefore, we can forecast the variable of interest using a linear combination of past values of the variable. Yule Udny Yule (1927) formulated an autoregressive model with the order (i.e. AR()) as



is the white noise at time step

, and is the drift value based on previous observed data points.

Once we have the observed data, we can estimate the parameters

using the ordinary least squares which can be derived by the following the equation


The Eq. (4

) can be solved by the Yule-Walker equations

Udny Yule (1927).

3.2 Moving Average (MA)

Instead of using the past values from the sequence, Slutzky Slutzky (1937) utilized past forecast errors to model the sequence. The proposed moving average model of order (i.e. MA()) is represented as


where is the white noise at time step , and is the drift value for the sequence.

Besides, it is trivial to verify that AR() model can be formulated as MA(), since


3.3 AutoRegressive Integrated Moving Average (ARIMA)

By combining AR model and MA model, Makridakis et al. proposed a non-seasonal autoregressive integrated moving average model (i.e. ARIMA()) which is written as


where denotes the times differenced series with respected to series .

Given the , we can estimate the parameters in the ARIMA(

) model by maximizing the likelihood of the probability of obtaining the observed data, which is equivalent to minimizing the summation of forecast errors as


Furthermore, in order to choose the proper order of and for ARIMA model, we use the automatic model selection technique proposed by Bozdogan Hamparsum Bozdogan (1987), which adapts Akaike’s Information Criterion (AIC) Aho et al. (2014), corrected AIC (AICc), and Bayesian Information Criterion (BIC) Aho et al. (2014) as the criterion to select the proper order and . The AIC, AICc, and BIC are formulated as


where when the condition holds true, else . is the likelihood of the sequence. According to the criteria of model selection Bozdogan (1987), a good model should have low values of AIC, AICc, and BIC. We follow this criteria to select the most proper order and . Hyndman et al. Hyndman et al. (2007) propose an algorithm to decide the proper orders, which is summarized in Algorithm 1.

0:    Training Sequence: .
0:    Optimal Model Configuration: .
1:  Using KPSS Test Kwiatkowski et al. (1992) to determine differences time .
2:  Differencing the times to get the differenced data .
3:  .
4:  Initialize base model ARIMA().
5:  repeat
6:     if  then
7:        .
8:        .
9:     end if
10:     Randomly adding or minusing or with () by grid search method.
11:  until No lower AICc can be found.
12:  return  .
Algorithm 1 Hyndman-Khandakar Algorithm

Simultaneously, we can incorporate the seasonality into the non-seasonal ARIMA to adapt the time series with seasonality.

3.3.1 Dynamic Harmonic Regression

Sometimes the series with long seasonal periods or complex seasonal patterns, the ARIMA model and Seasonal ARIMA model cannot capture the periodical pattern well. Meanwhile, if the series has more than one seasonal periods, it would be difficult to model the patterns only using ARIMA model. Therefore, to handle this problem, we extend ARIMA model with dynamic harmonic regression (DHR) due to the periodic seasonality can be handled using pairs of Fourier terms. We can formulate this model as


where is the intercept term, are estimated regression coefficients, and is seasonal period. is value of non-seasonal ARIMA() process at time step . By using Fourier terms, every period function can be approximated by the combination of sine and cosine function for large enough .

3.4 Exponential Smoothing

3.4.1 Simple Exponential Smoothing

Simple exponential smoothing is suitable for the scenario that the sequence shows no clear trend or seasonality. Mathematically, it can be written as


where is the fitted value at time step that needs to be estimated, and is the dampening value with . For the parameter estimation, we following the same protocol for solving parameters in AR model by minimizing Eq. (4). For solving this optimization problem, we can use some iterative optimization approaches such as gradient descent and Newton’s method. Moreover, we can divide Eq. (15) into two separate parts as , and . Here, we can interpret as the smoothed value (level) of the sequence at time step .

3.4.2 Error, Trend, Seasonality (ETS)

Adapting the idea of simple exponential smoothing, we can model the different aspects of the given series. For example, we are able to model the the trend, seasonal, and reminder components. These components can model long-term direction, periodicity, and error terms. Specially, the trend is usually expressed as the combination of a level term and slope term. Take Holt-Winters model Holt (2004); Winters (1960) for instance, this model has addictive trend and additive seasonality, which is formulated by


where is the slope at time step , denotes the estimate of seasonal component of the sequence at time step , and is the season number within a year. , and are the smoothing parameters with , . is the forecast step.

Moreover, we tend to regard Error, Trend, Seasonality (ETS) as the family of diverse combination of Error, Trend, and Seasonality. To specify, the error can be additive () or multiplicative (); the trend can be non-existent (), additive, damped additive (), multiplicative, or damped multiplicative (); the seasonality can be non-existent, additive, or multiplicative. For each combination of three components, we refer to the model as ETS(, , ). To summarize, there are total 30 different kind of models refer to Hyndman et al. Hyndman et al. (2002).

For components selection of ETS family models, we cab adopt the same protocol of ARIMA selection in which utilize the AICc as the criteria. Beside, the initial conditions and smoothing parameters can be estimated by maximizing the likelihood with respect to given data with simplex optimizer Nelder and Mead (1965).

3.4.3 Bootstrapping with ETS

In this part, we augment our data set with bootstrapping technique. For any time series data, we can use STL decomposition Cleveland et al. (1990) for seasonal series or loess Cleveland et al. (2017) for non-seasonal series to decompose the series into several components (i.e. trend, remainders, and seasonality only with seasonal series). Because the sequences generated by the STL decomposition are independent and other two terms are changed over time, we can independently simulate the remainder term in STL using bootstrapping method.

Therefore, to ensure the stationary property of the series, we bootstrap the remainder of the series after performing STL or loess decomposition. Kunsch proposed MBB process Kunsch (1989) which only requires stationary of the data, showing the desired data length is satisfied by sampling data block with equal size from the series repeatedly. Especially, given a series with length and the fixed block size , there exists blocks at the most. After bootstrapping the remainder, we combine them with the trend and seasonality term generated by STL. Then, the inverse of Box-Cox transformation with bias adjustment is applied to get the final bootstrapped series. The whole process is summarized in Algorithm 2. Figure 4 shows the series generated by bootstrapping. Empirically, we set the block size for monthly data since we should the seasonality is observed.

Figure 4: Number of Traffic Accidents in Los Angeles data after the bootstrapping procedure described in Algorithm 2. The black line is the original series, and the rest color ones are generated series. The graph is best viewed in color.
0:    Training Sequence: ts Number of Bootstrapped Times: Block Size:
0:    Bootstrapped Series: ts.boot
1:   BoxCox.lambda(ts)
2:  ts.transformed BoxCox(ts, )
3:  if ts is seasonal series then
4:     [trend, seasonal, remainder] stl(ts.transformed)
5:  else
6:     seasonal 0
7:     [trend, remainder] loess(ts.transformed)
8:  end if
9:  ts.boot[1] ts
10:  for i 2 to  do
11:     reminders.boot[i] mbb.bootstrap(remainder, )
12:     ts.boot.transformed[i] trend + seasonal + reminders.boot[i]
13:     ts.boot[i] InvBoxCox(ts.boot.transformed[i], )
14:  end for
15:  return  ts.boot
Algorithm 2 Generating Bootstrapped Time Series

3.4.4 ETS Based Complex Seasonal Patterns

Most canonical works do not consider and handle the complex seasonal patterns in the time series which are nested. Meanwhile, they are not capable of handling the non-linearity existing in the series. Besides, a large quantity of exponential smoothing models cannot cope with non-linearity problem and suffer from the overparameterization. So, TBATS De Livera et al. (2011) is a innovations space modeling framework used to alleviate these problems, which stands for Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components. Firstly, TBATS extend the Holt-winter addictive model with the Box-cox transformation, ARMA errors, and seasonal patterns formulated as following:


where is the time series after Box-cox transformation at time step , are the seasonal periods, is the number of seasonalities, is the trend with damping parameter , represents the local level in period , denotes the -th seasonal component at time , is an ARMA() process, and is the Gaussian white noise process with zero-mean and constant variance. are smoothing parameters. is the amount of harmonics for -th seasonal period. We maxmimize the likelihood and AICc to find the best model.

For TBATS model, each seasonal pattern is modeled by a trigonometric representation based on Fourier series. Meanwhile, it only needs two seed states without involving the length of the period, and it also can model the non-integer seasonal lengths. To specify the model’s structure, we denote model with its relevant arguments as TBATS

3.5 Ensemble Learning

For the standard supervised learning problem, the training examples tend to be involved with some random noise leading to the incorrect of the regressors or classifiers. To alleviate this problem, one of the practical techniques called ensemble learning

Dietterich (2000) is adopted in many fields. An ensemble of regressors is a set of regressors whose individual predictions are combined in some way to predict the new points or series. The more diverse the regressors, the more accurate the forecast can be. It has been verified by Bates and Granger Bates and Granger (1969) that the combination forecasts often lead to better forecast accuracy, and use a simple average has proven hard to beat. Suppose we have a set of candidate models , we can formulate this process as


where is the final prediction, is the weight of model at time step , and is the prediction generated by the model at time step . is the number of models in the set . In order to estimate the weight , we would use Online Gradient Descent (OGD) method Zinkevich (2003) to calculate the weight at each time step, and we name this model as OGD for abbreviation.

4 Experiments

4.1 Evaluation Metrics

For evaluation, we use the the Mean Average Precision Error (sMAPE) to measure the prediction errors due to the advantage of being unit-free. Formally, sMAPE is formulated as

where is the actual value of time series at time , and is the predicted value at time . Besides, the symmetric Mean Absolute Percentage Error (sMAPE) can also evaluate models’ performance and the formula is

In the same time, the Root Mean Square Error (RMSE) is adopted to measure the ability of model prediction. Besides, we evaluated our results using Mean Percentage Error (MPE) which can be written as

4.2 Evaluation Settings

For traffic accidents time series, we divide the series into three consecutive time periods: 80% time of time series for training, last 5% time of time series for testing, and the rest 15% are used for validation.

ARI(2,1) 199.667 3.077 3.801 3.921
IMA(1,1) 153.122 -0.077 3.429 3.435
ARIMA(1,1,1) 193.023 2.751 3.686 3.804
Simple Exponential Smoothing 193.816 2.824 2.735 3.844
ETS(A, N, A) 124.676 2.096 2.728 2.771
Bagged ETS 133.530 2.341 2.966 3.017
Dynamic Harmonic Regression () 121.624 1.974 2.661 2.701
TBATS(1,{0,0},-,{<12,5>}) 120.149 1.735 2.561 2.593
OGD 114.143 1.439 2.293 2.322
Table 4: Performance comparisons with different models on validation series.

4.3 Quantitative Results

We compare fitting results among the proposed models, and test these models on the validation series. The results are reported in Table 4. Several interesting observation can be found as follows.

  • IMA(1,1) model outperforms most of the competing ARIMA models by a large margin in the validation set. IMA(1,1) is better than the rest at least 30 on RMSE. This is because if we look at the forecast series generated by the IMA(1,1), we can see a clear increasing trend that is more compatible with the actual trend. However, for ARI(2,1) and ARIMA(1,1,1), these two models predict the series which perform constantly.

  • In the column of measuring MPE, only IMA(1,1) achieves negative value which closes to zero. It is on the grounds that the true series fluctuates around the IMA(1,1) predicted series. Therefore, according to the formula of MPE, we can see it can result in the negative value in the numerator if happens. So the smaller MPE cannot indicate the model fits better than other models.

  • The ETS(A, N, A) achieves better performance in all measurements than the Bagged version of ETS due to the random sampling noise in the generated bootstrapped series. As a consequence, ETS would result in more accurate predictions in the forecast.

  • Dynamic harmonic regression model and TBATS(1,{0,0},-,{<12,5>}) model obtain larger gains on each evaluation metrics. The shared characteristic of these two models is that they both utilize the Fourier terms to model the complex seasonality and non-linearity. For the estimated number of periods, it is surprising that both models give

    as the number of periods, which verifies that the original series involved with complex seasonality and non-linearity.

  • For the final model with ensemble method, we select models with small RMSE for the base model for final OGD model which consists of IMA(1,1), ETS(A, N, A), DHR (), and TBATS(1,{0,0},-,{<12,5>}). Meanwhile, the best performance strongly supports our intuition that different models would alleviate the random noise in the training series leading to a more accurate forecast.

Figure 5: Visualization of different models’ predictions on the validation series.

Besides, we visualize the forecast series and the ground truth series in Figure 5. For most predictions, their trends are similar to ground truth series, but these predictions tend to underestimate the actual number of accidents. It is possible that prediction errors would result from other factors such as policy, migration, and weather condition, etc.

5 Model Diagnostics

In this section, we would examine some good models mentioned in Table 4 to verify the results and choose the ultimate model for forecasting.

5.1 Residual Analysis of IMA(1,1) Model

Figure 6: Normality and independence assumptions checking for IMA(1,1) model.

Since we start from the ARIMA family model, we observe that IMA model is more proper than ARI or ARIMA model due to the fact that the first-order differenced series is stationary. So, we fit IMA(1,1) model represented as following:


where is the first-order differenced series. To assess the fitted IMA(1,1) model visually, we draw the figure of ACF plot of residuals generated by fitted IMA(1,1) model in Figure 6. From the ACF plot, we can clearly see that there exist some sharp spikes at lag 6, 7, 12, 16, 18, and 24, which indicates that the residuals are still correlated with each other. To verify our observation, Ljung-Box test shows that p-value is which is really close to 0. Therefore, we should reject the null hypothesis that the residuals are uncorrelated. We have sufficient evidence to conclude that the residuals are correlated. In consequence, the series cannot be well explained by IMA(1,1) model.

5.2 Residual Analysis of ETS(A, N, A) Model

From the fitted model, it can be pointed out that the ETS(A, N, A) model is equivalent to ARIMA. It shows that it not only considers the IMA(1, 1) process analyzed in Section 5.1 but also takes the seasonal components into account. Consequently, when we check the residuals of fitted ETS(A, N, A) model, we have enough evidence to conclude that the residuals are normally distributed due to the p-value 0.9916. Simultaneously, the ACF plot in Figure 7 demonstrates that all the autocorrelations are within the error bounds showing uncorrelation characteristic. However, to quantitatively uncorrelation, we use Ljung-Box test to check whether the residuals are uncorrelated from each other. Ljung-Box test generates the p-value 0.00214, showing that we should reject the null hypothesis that the data is uncorrelated under the significance level 0.05. Therefore, the model can not explain the series well.

Figure 7: Normality and independence assumptions checking for ETS(A, N, A).

5.3 Residual Analysis of DHR () Model

Figure 8: Normality and independence assumptions checking for DHR model.

With Fourier items being added, the ARIMA model will then have deterministic, but smoothed, seasonal effects. We choose the best according to AICc criteria. So after taking the Fourier item into consideration, we can see all the values in ACF plot are within the error bounds, which indicates that the residuals are uncorrelated. Meanwhile, we use Shapiro-Wilk test whether the residuals are normally distributed or not. It results in the p-value 0.9618, which points out the residuals are normally distributed. Moreover, we conduct the runs test to check the independence assumption which results in p-value 0.193 showing the residuals are independent. However, when we verify the autocorrelation with Ljung-Box test, we achieve the small p-value 0.007595 which means the residuals are still correlated indicating this model may not be a good fit.

5.4 Residual Analysis of TBATS(1,{0,0},-,{<12,5>}) Model

Figure 9: Normality and independence assumptions checking for TBATS(1,{0,0},-,{<12,5>}) model.

TBATS models the time series data from exponential smoothing method which is different from ARIMA family model due to the idea of decomposition. However, the shared feature of TBATS and Dynamic Harmonic Regression is that both methods utilize the Fourier items to model the complex seasonality. To check the validity of the model, we plot the residuals and their ACF in Figure 9. Similarly, following the protocol of checking model, we get the p-value 0.7057 from Shapiro-Wilk test, showing the residuals can be regarded as normally distributed. Besides, we conduct the run test for checking the independence which results in p-value 0.0382 showing that we do not have sufficient evidence to conclude that residuals are independent. Thus, the independence assumption is not satisfied. So the model cannot well model the series.

5.5 Composition Analysis and Residual Analysis of OGD Model

Through ensemble learning method, we build OGD model using IMA, ETS, DHR, and TBATS. Before we consider the residual of the fitted model, we visualize the weights of each model in OGD model as Figure 10

represents. In the subfigure on the left upper corner, we can clearly see that the total weights distribution is like uniform distribution. However, there are slight differences between uniform distribution since if we glance at the subfigure in the left lower corner since it shows that OGD method is slightly better than uniform method. At the same time, the ensemble method enables random errors to decrease as much as possible.

Figure 10: Visualization of the composition of OGD model by combining IMA(1,1), ETS(A, N, A), DHR (), and TBATS(1,{0,0},-,{<12,5>}).

Furthermore, we analysis the residual of fitted model for the sake of verifying its effectiveness. According to Figure 11

, we can observe the ACF plot that there is no spike in the plot and all the autocorrelation values are within the error bounds, which means that residuals of fitted OGD model are independent. To quantitatively verify the uncorrelation, we conduct Ljung-Box test to check whether the residuals are uncorrelated from each other. It shows that test statistic

with degree of freedom 20 has the p-value

indicating that we failed to reject the null hypothesis, which proves the uncorrelation. Apart from checking the uncorrelation, we also verify the normality of the residuals. In order to show the normality, we use Shapiro-Wilk test. It comes that the p-value is 0.9784 and means that the residuals are normally distributed. So we can regard these residuals as the errors generated by white noise process. Therefore, OGD models the whole time series quite well compared to other methods.

Figure 11: Normality and independence assumptions checking for OGD model.

6 Forecasting

In this section, we forecast the number of traffic accidents in the future using the OGD model mentioned in Section 3.5. For better understanding the accuracy of prediction, we predict the series on test series which dates from December 2018 to May 2019. In addition, we train our model on training series and validation series The testing results are summarized in Table 5.

Dec. 2018 Jan. 2019 Feb. 2019 Mar. 2019 Apr. 2019 May 2019 RMSE MPE MPAE sMPAE
Ground Truth 3699 3602 3597 3966 3766 4022 176.573 -3.573 4.256 4.151
Prediction (Mean) 3936.8 3870.3 3788.6 4020.6 3875.4 3875.4
SE () 232.7 234.4 253.8 261.4 268.9 276.1
SE () 357.3 359.9 289.7 401.5 412.9 424.1
Table 5:

Forecast results on the testing series with OGD Model compared to the ground truth series. The results include mean prediction value and standard error under 80% and 95% confident level.

As Table 5

presents, the predict values tend to be slightly higher than the actually observed values about 200, this might be caused by the population and economics in the Log Angeles, which suggests our model can explain adequately the traffic accident data. On the other hand, we can see as time pass by, the range of confidence intervals tends to be larger which means more uncertainty in future prediction, and actual observation values are in the interval showing the correctness of our method.

7 Discussion

7.1 Summary

In the paper, we tried to analyze the pattern of traffic accident happened last few years in Los Angeles. We investigated the time series we are interested in terms of statistical characteristics such as trend, seasonality, correlation, and skewness, etc. We also selected a set of candidate models and proposed ensemble method. Then, we fitted the models on the training series and validated these models on validation series to select the best model. Meanwhile, residual diagnostics of different models were fully presented. The ensembled model OGD showed the best fit among candidate models used for final forecasting on the test series.

7.2 Future Work

For further modeling the traffic accident time series, we can take the characteristic of the data. In other words, the distribution of traffic accident follows a Poisson distribution or negative binomial (NB) distribution. We can utilize these assumptions to adopt the Poisson regression or negative binomial regression model to fit the data. Moreover, the combination of generalized linear model and a stationary model can be more effective in modeling the data.


  • K. Aho, D. Derryberry, and T. Peterson (2014) Model selection for ecologists: the worldviews of aic and bic. Ecology 95 (3), pp. 631–636. Cited by: §3.3.
  • J. M. Bates and C. W. Granger (1969) The combination of forecasts. Journal of the Operational Research Society 20 (4), pp. 451–468. Cited by: §3.5.
  • G. E. Box and D. R. Cox (1964) An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) 26 (2), pp. 211–243. Cited by: §2.2.
  • H. Bozdogan (1987) Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52 (3), pp. 345–370. Cited by: §3.3.
  • U. C. Bureau (2016) Annual estimates of the resident population: april 1, 2010 to july 1, 2016. Cited by: §1.
  • R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning (1990) STL: a seasonal-trend decomposition. Journal of official statistics 6 (1), pp. 3–73. Cited by: §2.3, §3.4.3.
  • W. S. Cleveland, E. Grosse, and W. M. Shyu (2017) Local regression models. In Statistical models in S, pp. 309–376. Cited by: §3.4.3.
  • A. M. De Livera, R. J. Hyndman, and R. D. Snyder (2011) Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American Statistical Association 106 (496), pp. 1513–1527. Cited by: §3.4.4.
  • T. G. Dietterich (2000)

    Ensemble methods in machine learning

    In International workshop on multiple classifier systems, pp. 1–15. Cited by: §3.5.
  • V. M. Guerrero (1993) Time-series analysis supported by power transformations. Journal of Forecasting 12 (1), pp. 37–48. Cited by: §2.2.
  • C. C. Holt (2004) Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting 20 (1), pp. 5–10. Cited by: §3.4.2.
  • J. R. Hosking (1984) Modeling persistence in hydrological time series using fractional differencing. Water resources research 20 (12), pp. 1898–1908. Cited by: §2.3.
  • [13] How many cars per capita in the usa. Note: Cited by: §1.
  • R. J. Hyndman, Y. Khandakar, et al. (2007) Automatic time series for forecasting: the forecast package for r. Monash University, Department of Econometrics and Business Statistics …. Cited by: §3.3.
  • R. J. Hyndman, A. B. Koehler, R. D. Snyder, and S. Grose (2002) A state space framework for automatic forecasting using exponential smoothing methods. International Journal of forecasting 18 (3), pp. 439–454. Cited by: §3.4.2.
  • H. R. Kunsch (1989) The jackknife and the bootstrap for general stationary observations. The annals of Statistics, pp. 1217–1241. Cited by: §3.4.3.
  • D. Kwiatkowski, P. C. Phillips, P. Schmidt, and Y. Shin (1992) Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root?. Journal of econometrics 54 (1-3), pp. 159–178. Cited by: 1.
  • J. A. Nelder and R. Mead (1965) A simplex method for function minimization. The computer journal 7 (4), pp. 308–313. Cited by: §3.4.2.
  • P. M. Ong and D. Miller (2005) Spatial and transportation mismatch in los angeles. Journal of Planning Education and Research 25 (1), pp. 43–56. Cited by: §1.
  • E. Slutzky (1937) The summation of random causes as the source of cyclic processes. Econometrica: Journal of the Econometric Society, pp. 105–146. Cited by: §3.2.
  • P. Sorensen (2009) Moving los angeles. Cited by: §1.
  • T. Teräsvirta (1996) Power properties of linearity tests for time series. Studies in nonlinear dynamics & econometrics 1 (1). Cited by: §2.3.
  • G. Udny Yule (1927) On a method of investigating periodicities in disturbed series, with special reference to wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society of London Series A 226, pp. 267–298. Cited by: §3.1, §3.1.
  • X. Wang, K. Smith, and R. Hyndman (2006) Characteristic-based clustering for time series data. Data mining and knowledge Discovery 13 (3), pp. 335–364. Cited by: §2.3, §2.3.
  • P. R. Winters (1960) Forecasting sales by exponentially weighted moving averages. Management science 6 (3), pp. 324–342. Cited by: §3.4.2.
  • [26] World bank data: motor vehicles (per 1,000 people).. Note: Cited by: §1.
  • M. Zinkevich (2003) Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 928–936. Cited by: §3.5.