BayesMAR
Bayesian Median Autoregressive model for time series forecasting
view repo
We develop a Bayesian median autoregressive (BayesMAR) model for time series forecasting. The proposed method utilizes timevarying quantile regression at the median, favorably inheriting the robustness of median regression in contrast to the widely used meanbased methods. Motivated by a working Laplace likelihood approach in Bayesian quantile regression, BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting. We estimate model parameters by Markov chain Monte Carlo. Bayesian model averaging (BMA) is used to account for model uncertainty including the uncertainty in the autoregressive order, in addition to a Bayesian model selection approach. The proposed methods are illustrated using simulation and real data applications. An application to U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and often superior predictive performances than the selected meanbased alternatives under various loss functions. The proposed methods are generic and can be used to complement a rich class of methods that builds on the AR models.
READ FULL TEXT VIEW PDFBayesian Median Autoregressive model for time series forecasting
Time series forecasting is a longstanding problem in econometrics and statistics, where the overwhelming focus has been on meanbased models (Prado and West, 2010; Hyndman and Athanasopoulos, 2018). Although conditional means are the optimal forecast under the squared error loss, complex characteristics violating model assumptions that are present in real data may hamper the predictive performance. Flexible nonparametric methods (Ferraty and Vieu, 2006; Fan and Yao, 2008) building on minimal assumptions are an appealing remedy; however, they often have disadvantages in terms of not only interpretability but also in involving large numbers of parameters that often lead to daunting computation and communication issues. This article is motivated by an attempt to propose a simple, interpretable, and principled strategy that can improve upon meanbased models in realworld time series forecasting.
There is a rich literature on robust time series forecasting, partially motivated by that time series are often measured subject to outliers, including categorizing outliers
(Fox, 1972; Akouemo and Povinelli, 2014), adjusting autoregressive (AR) models to offset effects of outliers (Chen and Liu, 1993a, b), exponential smoothing and HoltWinters seasonal methods to Mestimation (Croux et al., 2008), and weighted forecasts (Jose and Winkler, 2008), just to name a few. As a nonlinear alternative, the median is more robust than mean. While the application of medianbased methods in time series at least dates back to 1974 when John Tukey introduced the running median method (Tukey, 1974), there is surprisingly little work to comprehensively investigate the modeling, fitting, and uncertainty quantification of a medianbased model in the context of time series forecasting, and in particular, how it will compare with stateoftheart meanbased methods using real data and under various loss functions.In the quantile regression literature that encompasses median as a special case (Koenker and Bassett Jr, 1978), Koenker and Xiao (2006) proposes quantile autoregression (QAR) models which depict the conditional distributions of the response more comprehensively at various quantile levels. Engle and Manganelli (2004) proposed the conditional autoregressive value at risk (CAViaR) model for risk management at extreme quantile levels, and has been followed by many others (Giacomini and Komunjer, 2005; Chen and So, 2006; Geweke and Keane, 2007; Gerlach et al., 2011). However, they are developed either for general quantile levels or extreme quantile levels and have not been applied to time series forecasting. Moreover, QAR and CAViaR are semiparametric approaches, resorting to minimizing the check loss function for estimation and necessitating nontrivial modifications to likelihoodbased order selection criteria when the order is unknown.
In this paper, we propose a simple strategy by extending the traditional AR model to a median AR model (MAR) for time series forecasting. The AR model is arguably one of the most popular methods in time series, serving as the building block for other models such as generalized autoregressive conditional heteroskedasticity models (GARCH, Bollerslev, 1986)
and timevarying vector autoregressive models
(TVVAR Primiceri, 2005). The proposed method utilizes timevarying quantile regression but focuses on the median, favorably inheriting the robustness of median regression in contrast to the widely used meanbased methods. It relies on parametric assumptions, which aids interpretation and enables convenient uncertainty quantification and propagation through a principle Bayesian framework. Numerical experiments using U.S. macroeconomic data show that this simple MAR approach leads to favorable and often superior predictive performances compared to selected stateoftheart meanbased methods that are much more complicated in nature, and it is remarkable that the comparison is made not only under the absolute but also the squared error loss. The proposed methods are generic and can be used to complement any methods that build on the AR models by altering the Gaussian error assumption therein.The MAR model has a close connection with the working asymmetric Laplace likelihood approach in Bayesian quantile regression. The working likelihood formalized by Yu and Moyeed (2001) has recently gained increasing attentions (Gerlach et al., 2011; Sriram et al., 2016; Youngman, 2018; Liu et al., 2019). It provides a principled and convenient framework to quantify uncertainties in the parameter estimation eliminating the challenging task to estimate the unknown conditional density functions that are required by the conventional quantile regression for inference (Yang et al., 2016)
. Although the asymmetric Laplace likelihood is generally not the true data generating likelihood, a pragmatic view to support its use is that the maximum a posterior estimate resembles the usual quantile regression estimates which optimizes the check loss function, and theoretically, the posterior distribution of parameters concentrates to what minimizes the KullbackLeibler divergence with respect to the true datagenerating models
(Kleijn and van der Vaart, 2006). Unlike quantile regression where one may vary the quantile levels, the median is the primary quantile level of interest in time series forecasting. Therefore, the MAR model uses the Laplace distribution as the likelihood, alleviating the concern that the working likelihood is not a valid likelihood when considering multiple quantile levels. This fully parametric model enables routine posterior sampling. We estimate model parameters by Markov chain Monte Carlo, and propose to use a Bayesian model averaging (BMA) approach (Hoeting et al., 1999) to propagate the model uncertainty in the unknown autoregressive order in addition to a Bayesian model selection strategy.The rest of the paper is organized as follows. Section 2 introduces the MAR model, estimation, and forecasting procedure. In Section 3 we conduct simulations to compare the proposed approach with other competitive methods to assess parameter estimation. Section 4 consists of a variety of applications using realworld economic data. Section 5 concludes the paper.
We propose a median autoregressive (MAR) model for time series forecasting, alternative to the widely used meanbased models. Suppose we observe a vector of time series . A MAR model with order , denoted as MAR, first assumes a timevarying quantile regression structure
(1) 
where , is vector of unknown coefficient, and is random error with median 0. Model (1) is semiparametric as the error distribution is left unspecified other than the constraint of possessing a zero median. The classic AR model with a given order , denoted as AR
, assumes a Gaussian error distribution with mean 0 and standard deviation
, i.e., The MAR model further assumeswhose probability density function is
(2) 
where is a scale parameter. The Laplace error assumption combined with the semiparametric structure in Equation (1) yielding the following likelihood function for the MAR model
(3) 
which is parametric. The parametric assumption in (2) aids interpretation and enables convenient uncertainty quantification and propagation through a principle Bayesian framework. In addition, the autoregressive structure in the MAR model resembling the widely used AR model provides enormous flexibility and potential to complement the rich literature that builds on the AR model.
The use of Laplace distributions is common in the Bayesian quantile regression literature where the th quantile of the error in Equation (1) is assumed to be 0. For general , a working likelihood method adopts the asymmetric Laplace distribution as the error distribution, which has the probability density function
(4) 
where is a location parameter and is the indicator function. The asymmetric Laplace distribution reduces to the Laplace distribution at the median by setting and . In view of this intimate connection with Bayesian quantile regression as well as the subsequent Bayesian estimation and prediction, we refer to the MAR model with Laplace errors given by (1) and (2) Bayesian MAR, or BayesMAR, and use it exchangeably with the MAR model.
We use an uninformative prior for and the Jeffreys prior for , namely,
(5) 
The posterior distribution of is
(6) 
which is proper (Choi and Hobert, 2013). The regression coefficients are instrumental for time series forecasting, and we derive their marginal posterior distributions by integrating out for efficient sampling:
(7)  
The posterior sampling of proceeds by Markov chain Monte Carlo (MCMC) via the MetropolisHastings (MH) algorithm (Metropolis et al., 1953; Hastings, 1970):
At iteration , draw a candidate sample from the proposal independently for , where is the value of at iteration , follows a Uniform distribution and the scalar controls the step size of each move;
Accept as with probability . Otherwise, set .
We tune the parameter in Step 1 such that the final acceptance rate is between and (Gelman et al., 1996). In addition, we have also implemented Gaussian and heavytailed student proposals, which are recommended by Gerlach et al. (2011) when studying extreme quantile levels under the CAViaR model. We do not observe empirical advantages of using such proposals over a uniform proposal under the MAR model, suggesting that one might have more flexible choices of proposals for median regression. For all experiments, we use 40,000 MCMC samples with 25,000 burnins, initialize randomly within the unit interval, and use the posterior mean as the Bayes estimate of . Trace plots indicate the MCMC samples converge quickly, mostly within thousands of iterations.
The order in MAR() is typically unknown. We address the problem of unknown in the Bayesian framework by putting a prior on . In practice, we usually can specify a maximum order as otherwise a too large hampers the interpretability. We endow the order with a uniform prior on with being the specified maximum order. Then the posterior distribution of in the prior support is
(8)  
(9) 
The order can be selected by using the maximum a posteriori (MAP) estimate
(10) 
For time series forecasting, a more appealing perspective is to use Bayesian model averaging (BMA) to propagate uncertainties in the model space, i.e.,
(11) 
where is the prediction under order . A 1step ahead predictive density conditional on is
In general, for a step ahead predictive density , we have
(12) 
where
The main challenge implement MAP and BMA lies in the evaluation of for large as the integrand may quickly decay to zero leading to numerical errors in the numerical integration. We resort to an approximation to using the Bayesian information criterion or BIC (Kass and Raftery, 1995; Neath and Cavanaugh, 2012). In particular, letting be the MAP estimates of , then the the BIC of MAR() is
(13) 
where is the sample size. We approximate by up to multiplicative constants, leading to aggregated predictions
(14) 
It turns out that we can calculate the MAP estimates efficiently. To see this, first note
(15) 
For any , the posterior distribution in Equation (6) attains its maximum at
(16) 
provided
. This corresponds to the estimators of minimizing absolute error in median regression, which can be efficently solved by linear programming
(Koenker, 2005). An analytical solution of is available through a Gamma kernel:(17)  
(18) 
Before substituting and into Equation (13), we notice that both the likelihood function and sample size depend on the order chosen. To reconcile various sample sizes at different orders, we use the last samples to evaluate the likelihood function for any . Consequently, is calculated as
(19)  
(20)  
(21)  
(22)  
(23) 
In this section, we conduct simulations to assess the performances of BayesMAR with meanbased methods, focusing on parameter estimation under various model assumptions. To this end, we choose the AR model and the generalized autoregressive conditional heteroscedasticity model (GARCH), and defer predictive comparisons and more recent meanbased methods to real data application in Section
4.We generate data according to the model
(24) 
where We consider two scenarios depending on the distribution of : Gaussian error where and Laplace error where , which correspond to the model assumptions of the AR and BayesMAR models, respectively.
For each error assumption, we generate 200 observations and replicate such simulation 100 times. We use maximum likelihood estimation (Gardner et al., 1980) for AR via the ‘arima’ function in the R package stats. We use AR()GARCH(1,1) when implementing GARCH, i.e.,
(25)  
(26)  
(27) 
We fit the model using the R package fGarch, where all parameters are estimated by quasimaximum likelihood (Bollerslev and Wooldridge, 1992). In addition, we implement the Quantile Autoregression (QAR) method proposed by Koenker and Xiao (2006) using the R package quantreg, to compare its finite sample performance with BayesMAR.
We assess estimates of by each method based on mean squared error (MSE). For a generic parameter , letting the estimate be in the th simulation and
, then the MSE and its standard error are
We first provide the true order to all models and compare their performances. Table 1 reports the MSE of all methods. We can see that all methods benefit from a correctly specified error distribution, supported by the observation that AR and GARCH have the smallest MSEs under Gaussian error, while BayesMAR and QAR have smaller MSEs when data are generated from Laplace distributions. However, BayesMAR appears to suffer less from the model misspecification; for example, BayesMAR increases the MSE of by 0.13 within standard errors of AR under Gaussian error, while AR doubles the MSE of that is beyond standard errors of BayesMAR under Laplace error. It is reassuring to see BayesMAR gives either the same or better MSEs than QAR in all cases, although all differences are within one standard error. This finite sample performance is consistent with the findings in Gerlach et al. (2011) when comparing samplingbased Bayesian approaches with optimizationbased counterparts for extreme quantile levels.


Models  Error  Gaussian  Laplace  
BayesMAR  MSE  1.07  0.56  0.63  0.73  0.27  0.19 
SE  0.13  0.07  0.07  0.11  0.04  0.03  
QAR  MSE  1.21  0.62  0.75  0.75  0.27  0.19 
SE  0.15  0.08  0.08  0.11  0.05  0.02  
AR  MSE  0.74  0.42  0.50  1.06  0.51  0.44 
SE  0.09  0.05  0.08  0.16  0.06  0.06  
GARCH  MSE  0.76  0.41  0.51  1.08  0.50  0.40 
SE  0.09  0.05  0.08  0.16  0.06  0.05  

We next investigate the selection of the unknown order in BayesMAR using the BIC approach described in Section 2.3. We provide a large upper bound for the order . Figure 1 plots the distribution of in both scenarios and suggest that the selected orders almost always concentrate around the oracle value , even when the model is misspecified under Gaussian error. The overall accuracy across all simulations to select using MAP is for normal errors and for Laplace errors.
In this section, we compare the predictive performances of the proposed methods relative to meanbased methods using various realworld data. We use three economic series from Federal Reserve Economic Data: the quarterly data Producer Price Index for all commodities (FRED: U.S. Bureau of Labor Statistics, 2018), 3Month Treasury Bill: secondary market Rate (FRED: Board of Governors of the Federal Reserve System (US), 2018), and Unemployment Rate (FRED: Organization for Economic Cooperation and Development, 2018), coded as PPI, TBR, and UR, respectively. Each time series ranges from 1968Q3 to 2018Q2, containing 200 observations. The raw data ’s are plotted in Figure 2, and the lagged data of order one are plotted in Figure 3. These three data sets have distinct patterns: the lagged PPI tends to be more stable before the crisis in the year of 2008, contrasted with the substantial fluctuation since then; the lagged TBR has more dramatic changes in earlier periods than the latest periods; the lagged UR tends to present a periodic pattern while contains several extreme values. The complex characteristics present in these realworld data enable comparison between modelbased methods when there is no guarantee for any model assumptions to hold.
In addition to AR and GARCH, we implement selected methods in the dynamic linear model (DLM) literature. In particular, we consider the timevarying vector AR (TVVAR) model proposed by Nakajima and West (2013), which links timevarying parameters to latent threshold processes and achieves stateoftheart predictive performance in selected applications. For a length 3 vector stacking the three time series , it expands a DLM by utilizing a latent threshold vector with for :
(28)  
(29)  
(30) 
where is the vector of timevarying intercepts, is the matrix of timevarying coefficients at lag , stacks and by rows and by order from 1 to , , and is a latent timevarying parameters vector whose dynamic updates are specified by via a standard VAR model. The latent threshold vector shrinks the timevarying coefficients to zero, leading to dynamic sparsity and improved prediction. We implement two versions of TVVAR: TVVAR without latent threshold (NT) and TVVAR with latent threshold (LT). Both methods are applicable to multivariate time series, and we adapt the OxMetrics code provided in the paper (Nakajima and West, 2013) to the vector observations consisting of the three time series.
For BayesMAR, we use BayesMARBMA to denote the Bayesian model averaging strategy and BayesMARMAP when the MAP estimate of is used. The orders of AR and BayesMARMAP are selected by BIC. GARCH(1,1) uses the same autoregressive order as chosen in AR; although orders other than (1, 1) can be used, we observe that the GARCH(1,1) consistently leads to the best predictive performances of GARCH in our real data application. Since there are no immediately available model selection methods for NT or LT, we run the order from 1 to 5 then choose the optimal results to favor NT and LT under each criterion.
We use recursive outofsample forecasting to assess each method after =2008Q3. In particular, we fit each model with data up to quarter and conduct an step ahead prediction for . Then we move one period ahead and repeat the same procedure until we reach . All methods are applied to the lagged data of order one to remove local trends and forecast the changes that lead to predictions of . We calculate the root MSE (RMSE) and mean absolute error (MAE) from to to compare performances of each method, i.e.,
(31) 
We additionally calculate the the relative change using BayesMARBMA as the reference to ease comparison, that is,
(32) 
Table 2 reports the RMSEs of all methods for the three data sets. The results indicate that BayesMAR outperforms alternatives in all scenarios. Compared to BayesMARBMA, the methods of AR, GARCH, NT, and LT increase the RMSE by 10% to 63.6% for TBR, and 7.1% to 54.8% for UR. Although the gap in the prediction for PPI is smaller (between 0.9% to 5.4%), all competing methods have a larger RMSE than BayesMARBMA. GARCH accounts for comprehensive variance structures and the two timevarying modeling methods, NT and LT, dynamically update regression coefficients, which are arguably powerful methods with considerable complexity; it is remarkable that the proposed simple BayesMAR leads to favorable and often superior performances using realworld data.


Producer Price Index  
Models  RMSE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  3.18  5.94  7.74  9.24         
BayesMARMAP  3.21  5.94  7.69  9.13  0.8  0.1  0.7  1.1 
AR  3.36  6.24  8.06  9.42  5.4  5.1  4.0  1.9 
GARCH  3.21  6.05  7.85  9.31  0.9  1.9  1.4  0.8 
NT  3.22  6.06  7.87  9.36  1.1  2.0  1.6  1.4 
LT  3.27  6.05  8.00  9.69  2.6  1.9  3.2  4.9 


Treasury Bill Rate  
Models  RMSE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  0.91  1.49  2.16  2.94         
BayesMARMAP  0.90  1.48  2.16  2.93  0.9  0.1  0.1  0.1 
AR  1.32  1.77  2.62  3.69  45.7  19.5  21.3  25.7 
GARCH  1.40  1.87  2.61  3.44  54.5  26.2  20.6  17.3 
NT  1.48  2.30  2.94  3.56  63.6  54.9  36.0  21.2 
LT  1.00  2.01  2.91  3.90  10.0  35.1  34.8  32.9 


Unemployment Rate  
Models  RMSE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  3.35  4.85  5.70  6.55         
BayesMARMAP  3.41  4.93  5.79  6.65  1.9  1.6  1.6  1.5 
AR  4.13  6.26  8.16  9.50  23.4  29.0  43.2  44.9 
GARCH  4.53  6.26  7.05  7.02  35.3  29.0  23.8  7.1 
NT  4.41  6.86  8.76  9.32  32.0  41.4  53.8  42.2 
LT  5.18  6.22  8.56  9.34  54.8  28.3  50.2  42.5 

Table 3 reports the MAEs of all methods, suggesting similar observations as made from Table 2. The proposed BayesMAR methods give the smallest MAEs in nearly all scenarios with one exception for the UR when compared to GARCH at 4step ahead prediction.


Producer Price Index  
Models  MAE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  2.50  4.44  5.90  7.26         
BayesMARMAP  2.50  4.44  5.84  7.18  0.2  0.1  1.0  1.1 
AR  2.69  4.52  6.04  7.36  7.5  1.7  2.3  1.4 
GARCH  2.54  4.59  5.98  7.29  1.8  3.5  1.4  0.4 
NT  2.54  4.58  6.04  7.36  1.8  3.3  2.4  1.4 
LT  2.60  4.53  6.34  7.66  4.2  2.1  7.5  5.6 


Treasury Bill Rate  
Models  MAE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  0.56  1.05  1.56  2.10         
BayesMARMAP  0.56  1.05  1.55  2.09  0.6  0.2  0.0  0.0 
AR  0.82  1.23  1.85  2.55  45.9  16.7  18.3  21.2 
GARCH  0.75  1.13  1.69  2.28  34.5  7.4  8.4  8.8 
NT  0.75  1.28  1.75  2.30  34.5  21.5  12.2  9.4 
LT  0.63  1.19  1.75  2.40  12.2  12.9  12.3  14.4 


Unemployment Rate  
Models  MAE  Relative change (%)  
1  2  3  4  1  2  3  4  
BayesMARBMA  2.15  3.28  3.91  5.12         
BayesMARMAP  2.23  3.38  4.03  5.29  3.7  3.2  3.2  3.3 
AR  2.52  4.02  5.43  6.58  17.5  22.6  39.0  28.5 
GARCH  2.71  3.95  4.60  4.79  26.0  20.5  17.8  6.4 
NT  3.04  4.38  5.33  5.92  41.4  33.4  36.3  15.6 
LT  4.11  4.65  6.59  7.70  91.6  41.9  68.6  50.5 
The RMSEs of BayesMARBMA and BayesMARMAP are close to each other with at most 1.1% (PPI), 0.9% (TBR), 1.9% (UR) relative differences, and similar observations hold for MAEs. This is caused by highly concentrated weights of model orders when using BIC for order selection. In particular, we find BayesMARBMA puts more than weight on the order selected in BayesMARMAP, leading to minimal differences between the two variants of BayesMAR. The provided R package allows both options.
Figure 3 compares absolute 1step predictive errors at each from to given by the three static models, BayesMARBMA, AR, and GARCH, which provides insights into understanding the performance of BayesMAR. We choose AR and GARCH here as they build on similar model structures as BayesMAR. We can see that AR yields a large prediction error at the beginning (PPI), which is substantially reduced by GARCH that incorporates heterogeneous variance structures. It is reassuring that the proposed BayesMAR method, which uses a simple constant variance structure, achieves the same predictive gain (for PPI) or even further improvements (for TBR and UR). At later time points when the time series stabilizes without usually large deviations, BayesMAR tends to give similar performances as AR and GARCH. Since BayesMAR is a parametric model bearing the same model structure of AR, these comparisons may suggest further performance gain is possible by following the rich literature that extends AR to various more advanced models such as GARCH and dynamic models but simply altering the error assumption from Gaussian to Laplace.
We plot the 95% confidence/credible intervals of AR and BayesMARBMA in Figure
4. For the first two data sets, BayesMAR has narrower intervals covering the true observations, especially for UBR which hardly changes lately. Although BayesMAR yields wider intervals for UR, AR has multiple observations either close to the boundary or outside of the confidence intervals (around 09Q4), suggesting overconfidence.
This article proposes a Bayesian median autoregressive (BayesMAR) model for robust time series forecasting. The proposed method has close connections with timevarying quantile regression. BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting with principled uncertainty quantification through Bayesian model averaging and Bayesian model selection. Real data applications using U.S. macroeconomic data show that BayesMAR leads to favorable and often superior predictive performances than the selected stateoftheart meanbased alternatives under various loss functions.
BayesMAR enjoys practical benefits as a technical tool to introduce robustness and improve predictions. In addition, one key insight in BayesMAR, the autoregressive structure that resembles the widely used AR model, can be used to complement a rich class of methods that build on the AR model. The AR model is arguably one of the most popular methods in time series, serving as the building block for other models such as GARCH and TVVAR in the literature of DLM. The proposed MAR model shows the potentials for further performance gain by following the rich literature that extends AR to more advanced models with Laplace error assumptions rather than Gaussian.
The R code to implement the proposed methods is publicly available at https://github.com/xylimeng/BayesMAR.
Time series outlier detection and imputation.
In Proceedings of IEEE PES General Meeting Conference & Exposition, pages 1–5.Analysis of mcmc algorithms for bayesian linear regression with laplace errors.
Journal of Multivariate Analysis
, 117:32–40.An algorithm for exact maximum likelihood estimation of autoregressivemoving average models by means of Kalman filtering.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3):311–322.