1 Introduction
As the country with the highest car ownership and the second largest car ownership per capita in the world, the traffic problem in the United States has always been the focus of people. According to The World Bank’s statistics "World Bank Data: Motor vehicles (per 1,000 people)" 26, the number of cars per thousand people in the United States reached 910 in 2017, and the total amount reached 255,009,283 13.
Los Angeles is the largest city in California and the second largest city in the United States, second only to New York City. The city covers an area of 469.1 square miles (1214.9 square kilometers). The Los Angeles metropolitan area consists of Los Angeles, Long Beach and Anaheim, has a population of approximately 13.31 million, and the greater Los Angeles area contains 18.7 million residents totally Bureau (2016). Both statistical data are roughly equal to New York.
Today, Los Angeles has developed into one of the greatest international metropolis, with highly developed industry of culture, science, technology, sports, international trade and education. Among all these industries, the thriving development of the culture and entertainment industry, which produces great films, TV series, and music, plays the most significant role, also forms the foundation of Los Angeles’ international reputation and global status.
For entertainment industry, Hollywood is the entertainment center of Los Angeles. It takes about 40 minutes by bus from downtown to Hollywood. Since, 1911, the very year the first film company established in Hollywood, the area has become the world’s film center. With so many stars and great movies, it is safe to say that Hollywood is the pioneer of global film industry. Celebrity Avenue is a world famous sight located in Hollywood, records more than 2000 highlyachieved celebrities in the fields of film, broadcasting and music.
When it comes to the education industry, the education in Los Angeles is wellknown all over the world. Los Angeles is home to the worldfamous California Institute of Technology, the University of California Los Angeles and University of Southern California. Apart from famous universities, Los Angeles Public Library, Huntington Library and Getty Museum are also important cultural and educational facilities.
In the sports industry, Los Angeles has also developed a very great environment for sports, provides opportunities for all kinds of sports. Los Angeles Angels, a Major League Baseball team in Los Angeles, California, part of the Pacific Coast League. Los Angeles Lakers, a basketball club founded in Minneapolis in 1947 but developed in Los Angeles, won sixteen championships in National Basketball Association. Also, Los Angeles held Olympic Summer Games twice in 1932 and 1984.
Due to the high level of development in various industries, Los Angeles, as an international metropolis, inevitably attracts a large number of immigrants to settle down. Among all the residents in Los Angeles, 11.2% of them are immigrants from other countries, of which Chinese are the most, accounting for 7.1%.
As the population continues to expand, the traffic, inevitably become a serious question. Simultaneously, Los Angeles government is constantly improving the urban transportation system: Los Angeles has one of the world’s largest highway systems, yet it is still the top 10 worst traffic city in the United States Sorensen (2009). The main roads connecting Los Angeles and surrounding cities include Interstate 5, Interstate 10, and US National Highway 101. The Los Angeles County Metropolitan Transportation Authority and other agencies operate numerous bus lines as well as subway and light rail lines. Los Angeles rail transit includes red and purple subway lines, as well as gold, blue, expo and green light rail lines. The orange and silver lines are BRT lines with stops and frequencies similar to light rails. The main train station in Los Angeles is the Union Station in the northeast of the city center, the center of the Southern California Metropolis Railway. The system connects Los Angeles with all neighboring counties and many suburbs.
Los Angeles metropolis (surrounding areas included) is the second largest metropolitan area in the United States. According to survey, approximately a third of the total population resides in the City of Los Angeles Ong and Miller (2005). However, the annual flow of passengers in the Los Angeles subway is only about 48.7 million. In contrast, in Hong Kong, where the population is much smaller than Los Angeles, the subway transports 1.563 billion passengers a year, almost the 30 times of that of Los Angeles.
A large number of population will inevitably bring a lot of traffic demand, but the low public transportation utilization rate has caused Los Angeles to become one of the ten most congested cities in the United States. What followed is the inevitable traffic accidents and safety problems. Our group selected traffic collision reported to LAPD as our data set, hoping to have a better understanding of the traffic condition in LA.
In short, the main contribution of this work can be highlighted in following aspects:

By adding the Fourier items, the complex seasonality can be well modeled by an intuitive manner. The proposed ensembled model outperforms other methods by a large margin.

Model diagnostics are fully conducted, and the components of ensemble model are also investigated.
The remaining parts are organized as follows. Section 2 generally explores the time series we are interested in and Section 3 introduces plenty of time series modeling methods. Then, Section 4 describes a number of experiments carried out on the training and validation set and reports the models’ performance. After that, Section 5 uses several diagnostic techniques to evaluate the validity of the candidate models. Finally, we conduct the forecast in Section 6 and conclude this paper.
2 Data Exploration
2.1 Data Description
The Los Angeles traffic accidents are collected by City of Los Angeles from 2010 to 2019 monthly. The corresponding values are integer. The data set contains 113 data points. The plot of the data and data distribution are demonstrated in Figure 1. It shows that accidents are more likely to happen in October and less likely in February.
2.2 BoxCox Transformation
In the realworld scenario, the distribution of running times data may not be normally distributed, which results in the distribution being asymmetrical leading to a bias in the model. To deal with the problem, George Box et al.
Box and Cox (1964)propose a data transformation technique named BoxCox Transformation which helps to stabilize the variance of the series. The mathematical expression is written as
(1) 
where is the power coefficient of the transformation that we need to select. We can find out that when , the whole series is identical only with shift. When , the transformation is the logarithm. However, it is difficult to choose the parameter since the searching space is infinite. Therefore, Guerrero (Guerrero, 1993)
proposes a method to estimate the value of
which aims to minimize the coefficient of variation for subsequences of the original series. Figure 2 shows the original series and transformed series with .When we tend to inverse the process of BoxCox transformation, the transformed predicted value will not be the mean of the forecast distribution but the median. Since the median cannot be added up, so we should use bias adjustments to find the corrected version of the inverse of BoxCox transformation, which is formulated as follows
(2) 
where is the step predict variance, the larger variance, the larger difference between mean and median.
2.3 Data Characteristics
Understanding of time series data is an essential part of the time series analysis process. Before we apply the model to fit the data, we should explore the statistical characteristics of the time series data which contains seasonality, trend, noisy, chaos, etc. In this part, we consider six main statistical characteristics of the given data. In order to vividly understand the data, we normalize the different metrics into to the significance of the characteristics that are presented or not. The larger the normalized value, the more significance it shows.
Trend and Seasonality
Because decomposition of time series can be applied to measure the trend and seasonality in time series. Following the STL decomposition Cleveland et al. (1990), we decompose the series as , which are trend component, seasonal component, and remainder component, respectively. Here, we utilize the strength of trend and the strength of seasonality proposed by Wang et al. Wang et al. (2006) to measure the trend and seasonality. The mathematical expressions are presented as
For strongly trend or seasonal series, there should be have more variation in trend or seasonality than remainder component leading to the larger or .
Serial Correlation
To estimate the serial correlation, we use the BoxPierce statistics in our time series analysis process. BoxPierce statistic is formulated as , where is the autocorrelation with time lag , is the length of the sequence, and is the maximum lag we want to measure (we set ).
Skewness
When analyzing the data, we are interested in the symmetry of the distribution of the data. To quantify the the degree of asymmetry of sequence values around the mean value, the skewness coefficient
adopted. It should be noticed that symmetric data lead to the skewness coefficient close to zero. Oppositely, if is large, the data would more likely be asymmetrical around the mean value.Kurtosis
In order to measure whether the time series is peaked or flat, we use Kurtosis coefficient
. For low kurtosis coefficient , the series tend to become flat rather than a sharp peak near the mean. So we can use the Kurtosis coefficient to measure whether it has a heavy tail or not.Selfsimilarity
Since the traditional autoregressive models are only capable of capturing shortrange correlation of the time series data
Hosking (1984). To measure whether the data is selfsimilarity (longrange dependent), we use selfsimilarity parameter Hurst exponent Teräsvirta (1996) to quantify the longrange dependence. We follow Wang et al. Wang et al. (2006) to achieve the value. The higher the value , the more longrange dependence the data exhibits.Trend  Seasonality  Serial Correlation  Skewness  Kurtosis  Selfsimilarity 
0.9370  0.6215  0.9852  0.0567  0.0039  0.9818 
By using these statistical characteristics, we compute these statistics on the time series of traffic accidents in Log Angeles. Then we scale these values into for easier interpretation. The scaled statistical characteristics are recorded in Table 1. As the table shows, the traffic accident sequence has a trend and slightly seasonality, and we can observe that the series is highly correlated. Additionally, the distribution of the series value is balance since the skewness and Kurtosis coefficient are relatively low, showing the normality assumption of the data. On the other hand, the selfsimilarity parameter is high showing that the current value in the sequence is correlated with the data from a long time ago. Therefore, we might use some models with the ability to capture longrange dependence.
Besides, we examine the time series of traffic accidents is stationary or not. We adopt Augmented DickeyFuller test (ADF) to verify the result. ADF tests the null hypothesis that there is a unit root is present in the time series. If the time series has a unit root, it means that the testing series is not stationary and has some timerelated structure. If the series is not stationary, we can use take the difference to the series several times to make it stationary. The results of ADF test are described in Table
2. In the table, we can observe that the pvalue is large when we testing original series indicating that we failed to reject null hypothesis, which means that the original series has a unit root and is nonstationary. On the other hand, after taking the firstorder difference of the series, it results in the small pvalue showing the differenced series does not have a unit root and is stationary, which suggests that we should take firstorder difference to the series before we applying the stationary model.Testing Series  DickeyFuller  Lag order  pvalue  Stationary 

1.6938  4  0.7022  No  
5.0588  4  < 0.01  Yes 
In addition, we compute the autocorrelation and partial autocorrelation among original and firstorder differenced series, which is presented in Figure 3. In Figure 2(a), we can clearly see from the ACF plot that the time series is strongly correlated with its past, showing that it is not proper to use ARMA family model. Therefore, after taking the firstorder difference, we still can clearly see the data is highly correlated since there are a large quantity of spikes in the ACF plot. But according to the result of ADF test above, we know the firstorder differenced series is stationary, which means that we can fit ARMA family model to the data. From the PACF plot in Figure 2(b), we empirically regard its shape as the MA(1) process due to the fact that only lag 1 shows negative strong partial autocorrelation. Therefore, when we fit MA model to the firstorder differenced series, we may choose as the order.
When it comes to ARMA model choosing, we also calculate the extended autocorrelation function (EACF) of firstorder differenced series in order to select the proper order . The calculated extended autocorrelations are demonstrated in Table 3. From the table, we can observe plenty of false negative results. However, it can be clearly seen that the triangle of "o"s starts at meaning that we might use ARIMA(1,1,1) model.
MA  

AR  0  1  2  3  4  5  6  7  8  9  10  11  12  13 
0  x  o  o  o  x  x  x  x  o  o  x  x  x  o 
1  x  o  o  o  o  o  x  x  o  o  o  x  o  o 
2  x  o  x  o  o  o  o  o  o  o  o  x  o  o 
3  o  x  o  o  o  o  o  o  o  o  o  x  o  o 
4  x  x  x  o  x  o  o  o  o  o  o  o  o  o 
5  x  x  x  o  o  o  o  o  o  o  o  o  o  o 
6  x  x  x  o  o  o  o  o  o  o  o  o  o  o 
7  x  o  x  o  x  o  x  o  o  o  o  o  o  o 
3 The Proposed Approaches
In this section, we first introduce a variety of parametric methods to model the time series data, which employ parameters to model the linear relation within residuals or previous data, like AutoRegressive model (AR), Moving Average model (MA), and AutoRegressive Integrated Moving Average model (ARIMA). After introducing ARIMA family model, we utilize bootstrapping sampling methods to augment the data set and perform the exponential smoothing methods to model the series. Beyond these two family models, we also perform Fourier terms to enhance the models’ ability to model the complex and multiple seasonality. At last, we propose the final model by adapting ensemble learning technique from the statistical learning area.
3.1 AutoRegressive Model (AR)
Intuitively, the sequence data tends to show some correlations with the previous data or trend, which means that future trend might base on the previous data points. Therefore, we can forecast the variable of interest using a linear combination of past values of the variable. Yule Udny Yule (1927) formulated an autoregressive model with the order (i.e. AR()) as
(3) 
where
is the white noise at time step
, and is the drift value based on previous observed data points.Once we have the observed data, we can estimate the parameters
using the ordinary least squares which can be derived by the following the equation
(4) 
The Eq. (4
) can be solved by the YuleWalker equations
Udny Yule (1927).3.2 Moving Average (MA)
Instead of using the past values from the sequence, Slutzky Slutzky (1937) utilized past forecast errors to model the sequence. The proposed moving average model of order (i.e. MA()) is represented as
(5) 
where is the white noise at time step , and is the drift value for the sequence.
Besides, it is trivial to verify that AR() model can be formulated as MA(), since
(6) 
3.3 AutoRegressive Integrated Moving Average (ARIMA)
By combining AR model and MA model, Makridakis et al. proposed a nonseasonal autoregressive integrated moving average model (i.e. ARIMA()) which is written as
(7) 
where denotes the times differenced series with respected to series .
Given the , we can estimate the parameters in the ARIMA(
) model by maximizing the likelihood of the probability of obtaining the observed data, which is equivalent to minimizing the summation of forecast errors as
(8) 
Furthermore, in order to choose the proper order of and for ARIMA model, we use the automatic model selection technique proposed by Bozdogan Hamparsum Bozdogan (1987), which adapts Akaike’s Information Criterion (AIC) Aho et al. (2014), corrected AIC (AICc), and Bayesian Information Criterion (BIC) Aho et al. (2014) as the criterion to select the proper order and . The AIC, AICc, and BIC are formulated as
(9)  
(10)  
(11) 
where when the condition holds true, else . is the likelihood of the sequence. According to the criteria of model selection Bozdogan (1987), a good model should have low values of AIC, AICc, and BIC. We follow this criteria to select the most proper order and . Hyndman et al. Hyndman et al. (2007) propose an algorithm to decide the proper orders, which is summarized in Algorithm 1.
Simultaneously, we can incorporate the seasonality into the nonseasonal ARIMA to adapt the time series with seasonality.
3.3.1 Dynamic Harmonic Regression
Sometimes the series with long seasonal periods or complex seasonal patterns, the ARIMA model and Seasonal ARIMA model cannot capture the periodical pattern well. Meanwhile, if the series has more than one seasonal periods, it would be difficult to model the patterns only using ARIMA model. Therefore, to handle this problem, we extend ARIMA model with dynamic harmonic regression (DHR) due to the periodic seasonality can be handled using pairs of Fourier terms. We can formulate this model as
(12)  
(13)  
(14) 
where is the intercept term, are estimated regression coefficients, and is seasonal period. is value of nonseasonal ARIMA() process at time step . By using Fourier terms, every period function can be approximated by the combination of sine and cosine function for large enough .
3.4 Exponential Smoothing
3.4.1 Simple Exponential Smoothing
Simple exponential smoothing is suitable for the scenario that the sequence shows no clear trend or seasonality. Mathematically, it can be written as
(15) 
where is the fitted value at time step that needs to be estimated, and is the dampening value with . For the parameter estimation, we following the same protocol for solving parameters in AR model by minimizing Eq. (4). For solving this optimization problem, we can use some iterative optimization approaches such as gradient descent and Newton’s method. Moreover, we can divide Eq. (15) into two separate parts as , and . Here, we can interpret as the smoothed value (level) of the sequence at time step .
3.4.2 Error, Trend, Seasonality (ETS)
Adapting the idea of simple exponential smoothing, we can model the different aspects of the given series. For example, we are able to model the the trend, seasonal, and reminder components. These components can model longterm direction, periodicity, and error terms. Specially, the trend is usually expressed as the combination of a level term and slope term. Take HoltWinters model Holt (2004); Winters (1960) for instance, this model has addictive trend and additive seasonality, which is formulated by
(16)  
(17)  
(18)  
(19) 
where is the slope at time step , denotes the estimate of seasonal component of the sequence at time step , and is the season number within a year. , and are the smoothing parameters with , . is the forecast step.
Moreover, we tend to regard Error, Trend, Seasonality (ETS) as the family of diverse combination of Error, Trend, and Seasonality. To specify, the error can be additive () or multiplicative (); the trend can be nonexistent (), additive, damped additive (), multiplicative, or damped multiplicative (); the seasonality can be nonexistent, additive, or multiplicative. For each combination of three components, we refer to the model as ETS(, , ). To summarize, there are total 30 different kind of models refer to Hyndman et al. Hyndman et al. (2002).
For components selection of ETS family models, we cab adopt the same protocol of ARIMA selection in which utilize the AICc as the criteria. Beside, the initial conditions and smoothing parameters can be estimated by maximizing the likelihood with respect to given data with simplex optimizer Nelder and Mead (1965).
3.4.3 Bootstrapping with ETS
In this part, we augment our data set with bootstrapping technique. For any time series data, we can use STL decomposition Cleveland et al. (1990) for seasonal series or loess Cleveland et al. (2017) for nonseasonal series to decompose the series into several components (i.e. trend, remainders, and seasonality only with seasonal series). Because the sequences generated by the STL decomposition are independent and other two terms are changed over time, we can independently simulate the remainder term in STL using bootstrapping method.
Therefore, to ensure the stationary property of the series, we bootstrap the remainder of the series after performing STL or loess decomposition. Kunsch proposed MBB process Kunsch (1989) which only requires stationary of the data, showing the desired data length is satisfied by sampling data block with equal size from the series repeatedly. Especially, given a series with length and the fixed block size , there exists blocks at the most. After bootstrapping the remainder, we combine them with the trend and seasonality term generated by STL. Then, the inverse of BoxCox transformation with bias adjustment is applied to get the final bootstrapped series. The whole process is summarized in Algorithm 2. Figure 4 shows the series generated by bootstrapping. Empirically, we set the block size for monthly data since we should the seasonality is observed.
3.4.4 ETS Based Complex Seasonal Patterns
Most canonical works do not consider and handle the complex seasonal patterns in the time series which are nested. Meanwhile, they are not capable of handling the nonlinearity existing in the series. Besides, a large quantity of exponential smoothing models cannot cope with nonlinearity problem and suffer from the overparameterization. So, TBATS De Livera et al. (2011) is a innovations space modeling framework used to alleviate these problems, which stands for Trigonometric seasonality, BoxCox transformation, ARMA errors, Trend and Seasonal components. Firstly, TBATS extend the Holtwinter addictive model with the Boxcox transformation, ARMA errors, and seasonal patterns formulated as following:
(20)  
(21)  
(22)  
(23)  
(24)  
(25)  
(26)  
(27) 
where is the time series after Boxcox transformation at time step , are the seasonal periods, is the number of seasonalities, is the trend with damping parameter , represents the local level in period , denotes the th seasonal component at time , is an ARMA() process, and is the Gaussian white noise process with zeromean and constant variance. are smoothing parameters. is the amount of harmonics for th seasonal period. We maxmimize the likelihood and AICc to find the best model.
For TBATS model, each seasonal pattern is modeled by a trigonometric representation based on Fourier series. Meanwhile, it only needs two seed states without involving the length of the period, and it also can model the noninteger seasonal lengths. To specify the model’s structure, we denote model with its relevant arguments as TBATS
3.5 Ensemble Learning
For the standard supervised learning problem, the training examples tend to be involved with some random noise leading to the incorrect of the regressors or classifiers. To alleviate this problem, one of the practical techniques called ensemble learning
Dietterich (2000) is adopted in many fields. An ensemble of regressors is a set of regressors whose individual predictions are combined in some way to predict the new points or series. The more diverse the regressors, the more accurate the forecast can be. It has been verified by Bates and Granger Bates and Granger (1969) that the combination forecasts often lead to better forecast accuracy, and use a simple average has proven hard to beat. Suppose we have a set of candidate models , we can formulate this process as(28) 
where is the final prediction, is the weight of model at time step , and is the prediction generated by the model at time step . is the number of models in the set . In order to estimate the weight , we would use Online Gradient Descent (OGD) method Zinkevich (2003) to calculate the weight at each time step, and we name this model as OGD for abbreviation.
4 Experiments
4.1 Evaluation Metrics
For evaluation, we use the the Mean Average Precision Error (sMAPE) to measure the prediction errors due to the advantage of being unitfree. Formally, sMAPE is formulated as
where is the actual value of time series at time , and is the predicted value at time . Besides, the symmetric Mean Absolute Percentage Error (sMAPE) can also evaluate models’ performance and the formula is
In the same time, the Root Mean Square Error (RMSE) is adopted to measure the ability of model prediction. Besides, we evaluated our results using Mean Percentage Error (MPE) which can be written as
4.2 Evaluation Settings
For traffic accidents time series, we divide the series into three consecutive time periods: 80% time of time series for training, last 5% time of time series for testing, and the rest 15% are used for validation.
Model  RMSE  MPE  MAPE  sMAPE 

ARI(2,1)  199.667  3.077  3.801  3.921 
IMA(1,1)  153.122  0.077  3.429  3.435 
ARIMA(1,1,1)  193.023  2.751  3.686  3.804 
Simple Exponential Smoothing  193.816  2.824  2.735  3.844 
ETS(A, N, A)  124.676  2.096  2.728  2.771 
Bagged ETS  133.530  2.341  2.966  3.017 
Dynamic Harmonic Regression ()  121.624  1.974  2.661  2.701 
TBATS(1,{0,0},,{<12,5>})  120.149  1.735  2.561  2.593 
OGD  114.143  1.439  2.293  2.322 
4.3 Quantitative Results
We compare fitting results among the proposed models, and test these models on the validation series. The results are reported in Table 4. Several interesting observation can be found as follows.

IMA(1,1) model outperforms most of the competing ARIMA models by a large margin in the validation set. IMA(1,1) is better than the rest at least 30 on RMSE. This is because if we look at the forecast series generated by the IMA(1,1), we can see a clear increasing trend that is more compatible with the actual trend. However, for ARI(2,1) and ARIMA(1,1,1), these two models predict the series which perform constantly.

In the column of measuring MPE, only IMA(1,1) achieves negative value which closes to zero. It is on the grounds that the true series fluctuates around the IMA(1,1) predicted series. Therefore, according to the formula of MPE, we can see it can result in the negative value in the numerator if happens. So the smaller MPE cannot indicate the model fits better than other models.

The ETS(A, N, A) achieves better performance in all measurements than the Bagged version of ETS due to the random sampling noise in the generated bootstrapped series. As a consequence, ETS would result in more accurate predictions in the forecast.

Dynamic harmonic regression model and TBATS(1,{0,0},,{<12,5>}) model obtain larger gains on each evaluation metrics. The shared characteristic of these two models is that they both utilize the Fourier terms to model the complex seasonality and nonlinearity. For the estimated number of periods, it is surprising that both models give
as the number of periods, which verifies that the original series involved with complex seasonality and nonlinearity. 
For the final model with ensemble method, we select models with small RMSE for the base model for final OGD model which consists of IMA(1,1), ETS(A, N, A), DHR (), and TBATS(1,{0,0},,{<12,5>}). Meanwhile, the best performance strongly supports our intuition that different models would alleviate the random noise in the training series leading to a more accurate forecast.
Besides, we visualize the forecast series and the ground truth series in Figure 5. For most predictions, their trends are similar to ground truth series, but these predictions tend to underestimate the actual number of accidents. It is possible that prediction errors would result from other factors such as policy, migration, and weather condition, etc.
5 Model Diagnostics
In this section, we would examine some good models mentioned in Table 4 to verify the results and choose the ultimate model for forecasting.
5.1 Residual Analysis of IMA(1,1) Model
Since we start from the ARIMA family model, we observe that IMA model is more proper than ARI or ARIMA model due to the fact that the firstorder differenced series is stationary. So, we fit IMA(1,1) model represented as following:
(29) 
where is the firstorder differenced series. To assess the fitted IMA(1,1) model visually, we draw the figure of ACF plot of residuals generated by fitted IMA(1,1) model in Figure 6. From the ACF plot, we can clearly see that there exist some sharp spikes at lag 6, 7, 12, 16, 18, and 24, which indicates that the residuals are still correlated with each other. To verify our observation, LjungBox test shows that pvalue is which is really close to 0. Therefore, we should reject the null hypothesis that the residuals are uncorrelated. We have sufficient evidence to conclude that the residuals are correlated. In consequence, the series cannot be well explained by IMA(1,1) model.
5.2 Residual Analysis of ETS(A, N, A) Model
From the fitted model, it can be pointed out that the ETS(A, N, A) model is equivalent to ARIMA. It shows that it not only considers the IMA(1, 1) process analyzed in Section 5.1 but also takes the seasonal components into account. Consequently, when we check the residuals of fitted ETS(A, N, A) model, we have enough evidence to conclude that the residuals are normally distributed due to the pvalue 0.9916. Simultaneously, the ACF plot in Figure 7 demonstrates that all the autocorrelations are within the error bounds showing uncorrelation characteristic. However, to quantitatively uncorrelation, we use LjungBox test to check whether the residuals are uncorrelated from each other. LjungBox test generates the pvalue 0.00214, showing that we should reject the null hypothesis that the data is uncorrelated under the significance level 0.05. Therefore, the model can not explain the series well.
5.3 Residual Analysis of DHR () Model
With Fourier items being added, the ARIMA model will then have deterministic, but smoothed, seasonal effects. We choose the best according to AICc criteria. So after taking the Fourier item into consideration, we can see all the values in ACF plot are within the error bounds, which indicates that the residuals are uncorrelated. Meanwhile, we use ShapiroWilk test whether the residuals are normally distributed or not. It results in the pvalue 0.9618, which points out the residuals are normally distributed. Moreover, we conduct the runs test to check the independence assumption which results in pvalue 0.193 showing the residuals are independent. However, when we verify the autocorrelation with LjungBox test, we achieve the small pvalue 0.007595 which means the residuals are still correlated indicating this model may not be a good fit.
5.4 Residual Analysis of TBATS(1,{0,0},,{<12,5>}) Model
TBATS models the time series data from exponential smoothing method which is different from ARIMA family model due to the idea of decomposition. However, the shared feature of TBATS and Dynamic Harmonic Regression is that both methods utilize the Fourier items to model the complex seasonality. To check the validity of the model, we plot the residuals and their ACF in Figure 9. Similarly, following the protocol of checking model, we get the pvalue 0.7057 from ShapiroWilk test, showing the residuals can be regarded as normally distributed. Besides, we conduct the run test for checking the independence which results in pvalue 0.0382 showing that we do not have sufficient evidence to conclude that residuals are independent. Thus, the independence assumption is not satisfied. So the model cannot well model the series.
5.5 Composition Analysis and Residual Analysis of OGD Model
Through ensemble learning method, we build OGD model using IMA, ETS, DHR, and TBATS. Before we consider the residual of the fitted model, we visualize the weights of each model in OGD model as Figure 10
represents. In the subfigure on the left upper corner, we can clearly see that the total weights distribution is like uniform distribution. However, there are slight differences between uniform distribution since if we glance at the subfigure in the left lower corner since it shows that OGD method is slightly better than uniform method. At the same time, the ensemble method enables random errors to decrease as much as possible.
Furthermore, we analysis the residual of fitted model for the sake of verifying its effectiveness. According to Figure 11
, we can observe the ACF plot that there is no spike in the plot and all the autocorrelation values are within the error bounds, which means that residuals of fitted OGD model are independent. To quantitatively verify the uncorrelation, we conduct LjungBox test to check whether the residuals are uncorrelated from each other. It shows that test statistic
with degree of freedom 20 has the pvalue
indicating that we failed to reject the null hypothesis, which proves the uncorrelation. Apart from checking the uncorrelation, we also verify the normality of the residuals. In order to show the normality, we use ShapiroWilk test. It comes that the pvalue is 0.9784 and means that the residuals are normally distributed. So we can regard these residuals as the errors generated by white noise process. Therefore, OGD models the whole time series quite well compared to other methods.6 Forecasting
In this section, we forecast the number of traffic accidents in the future using the OGD model mentioned in Section 3.5. For better understanding the accuracy of prediction, we predict the series on test series which dates from December 2018 to May 2019. In addition, we train our model on training series and validation series The testing results are summarized in Table 5.
Dec. 2018  Jan. 2019  Feb. 2019  Mar. 2019  Apr. 2019  May 2019  RMSE  MPE  MPAE  sMPAE  

Ground Truth  3699  3602  3597  3966  3766  4022  176.573  3.573  4.256  4.151 
Prediction (Mean)  3936.8  3870.3  3788.6  4020.6  3875.4  3875.4  
SE ()  232.7  234.4  253.8  261.4  268.9  276.1  
SE ()  357.3  359.9  289.7  401.5  412.9  424.1 
Forecast results on the testing series with OGD Model compared to the ground truth series. The results include mean prediction value and standard error under 80% and 95% confident level.
As Table 5
presents, the predict values tend to be slightly higher than the actually observed values about 200, this might be caused by the population and economics in the Log Angeles, which suggests our model can explain adequately the traffic accident data. On the other hand, we can see as time pass by, the range of confidence intervals tends to be larger which means more uncertainty in future prediction, and actual observation values are in the interval showing the correctness of our method.
7 Discussion
7.1 Summary
In the paper, we tried to analyze the pattern of traffic accident happened last few years in Los Angeles. We investigated the time series we are interested in terms of statistical characteristics such as trend, seasonality, correlation, and skewness, etc. We also selected a set of candidate models and proposed ensemble method. Then, we fitted the models on the training series and validated these models on validation series to select the best model. Meanwhile, residual diagnostics of different models were fully presented. The ensembled model OGD showed the best fit among candidate models used for final forecasting on the test series.
7.2 Future Work
For further modeling the traffic accident time series, we can take the characteristic of the data. In other words, the distribution of traffic accident follows a Poisson distribution or negative binomial (NB) distribution. We can utilize these assumptions to adopt the Poisson regression or negative binomial regression model to fit the data. Moreover, the combination of generalized linear model and a stationary model can be more effective in modeling the data.
References
 Model selection for ecologists: the worldviews of aic and bic. Ecology 95 (3), pp. 631–636. Cited by: §3.3.
 The combination of forecasts. Journal of the Operational Research Society 20 (4), pp. 451–468. Cited by: §3.5.
 An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) 26 (2), pp. 211–243. Cited by: §2.2.
 Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52 (3), pp. 345–370. Cited by: §3.3.
 Annual estimates of the resident population: april 1, 2010 to july 1, 2016. Cited by: §1.
 STL: a seasonaltrend decomposition. Journal of official statistics 6 (1), pp. 3–73. Cited by: §2.3, §3.4.3.
 Local regression models. In Statistical models in S, pp. 309–376. Cited by: §3.4.3.
 Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American Statistical Association 106 (496), pp. 1513–1527. Cited by: §3.4.4.

Ensemble methods in machine learning
. In International workshop on multiple classifier systems, pp. 1–15. Cited by: §3.5.  Timeseries analysis supported by power transformations. Journal of Forecasting 12 (1), pp. 37–48. Cited by: §2.2.
 Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting 20 (1), pp. 5–10. Cited by: §3.4.2.
 Modeling persistence in hydrological time series using fractional differencing. Water resources research 20 (12), pp. 1898–1908. Cited by: §2.3.
 [13] How many cars per capita in the usa. Note: https://capitoltires.com/howmanycarspercapitaintheus.html Cited by: §1.
 Automatic time series for forecasting: the forecast package for r. Monash University, Department of Econometrics and Business Statistics …. Cited by: §3.3.
 A state space framework for automatic forecasting using exponential smoothing methods. International Journal of forecasting 18 (3), pp. 439–454. Cited by: §3.4.2.
 The jackknife and the bootstrap for general stationary observations. The annals of Statistics, pp. 1217–1241. Cited by: §3.4.3.
 Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root?. Journal of econometrics 54 (13), pp. 159–178. Cited by: 1.
 A simplex method for function minimization. The computer journal 7 (4), pp. 308–313. Cited by: §3.4.2.
 Spatial and transportation mismatch in los angeles. Journal of Planning Education and Research 25 (1), pp. 43–56. Cited by: §1.
 The summation of random causes as the source of cyclic processes. Econometrica: Journal of the Econometric Society, pp. 105–146. Cited by: §3.2.
 Moving los angeles. Cited by: §1.
 Power properties of linearity tests for time series. Studies in nonlinear dynamics & econometrics 1 (1). Cited by: §2.3.
 On a method of investigating periodicities in disturbed series, with special reference to wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society of London Series A 226, pp. 267–298. Cited by: §3.1, §3.1.
 Characteristicbased clustering for time series data. Data mining and knowledge Discovery 13 (3), pp. 335–364. Cited by: §2.3, §2.3.
 Forecasting sales by exponentially weighted moving averages. Management science 6 (3), pp. 324–342. Cited by: §3.4.2.
 [26] World bank data: motor vehicles (per 1,000 people).. Note: https://web.archive.org/web/20140209114811/http:/data.worldbank.org/indicator/IS.VEH.NVEH.P3 Cited by: §1.
 Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML03), pp. 928–936. Cited by: §3.5.
Comments
There are no comments yet.