DeepAI
Log In Sign Up

Bayesian Median Autoregression for Robust Time Series Forecasting

01/04/2020
by   Zijian Zeng, et al.
0

We develop a Bayesian median autoregressive (BayesMAR) model for time series forecasting. The proposed method utilizes time-varying quantile regression at the median, favorably inheriting the robustness of median regression in contrast to the widely used mean-based methods. Motivated by a working Laplace likelihood approach in Bayesian quantile regression, BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting. We estimate model parameters by Markov chain Monte Carlo. Bayesian model averaging (BMA) is used to account for model uncertainty including the uncertainty in the autoregressive order, in addition to a Bayesian model selection approach. The proposed methods are illustrated using simulation and real data applications. An application to U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and often superior predictive performances than the selected mean-based alternatives under various loss functions. The proposed methods are generic and can be used to complement a rich class of methods that builds on the AR models.

READ FULL TEXT VIEW PDF
05/31/2022

Parametric quantile autoregressive moving average models with exogenous terms applied to Walmart sales data

Parametric autoregressive moving average models with exogenous terms (AR...
02/08/2018

Bayesian analysis of predictive Non-Homogeneous hidden Markov models using Polya-Gamma data augmentation

We consider Non-Homogeneous Hidden Markov Models (NHHMMs) for forecastin...
10/18/2017

A Bayesian Nonparametric Method for Clustering Imputation, and Forecasting in Multivariate Time Series

This article proposes a Bayesian nonparametric method for forecasting, i...
10/14/2022

Bayesian estimation of the autocovariance of a model error in time series

Autocovariance of the error term in a time series model plays a key role...
07/29/2020

Bayesian Context Trees: Modelling and exact inference for discrete time series

We develop a new Bayesian modelling framework for the class of higher-or...
01/23/2020

A Bayesian Long Short-Term Memory Model for Value at Risk and Expected Shortfall Joint Forecasting

Value-at-Risk (VaR) and Expected Shortfall (ES) are widely used in the f...

Code Repositories

BayesMAR

Bayesian Median Autoregressive model for time series forecasting


view repo

1 Introduction

Time series forecasting is a long-standing problem in econometrics and statistics, where the overwhelming focus has been on mean-based models (Prado and West, 2010; Hyndman and Athanasopoulos, 2018). Although conditional means are the optimal forecast under the squared error loss, complex characteristics violating model assumptions that are present in real data may hamper the predictive performance. Flexible nonparametric methods (Ferraty and Vieu, 2006; Fan and Yao, 2008) building on minimal assumptions are an appealing remedy; however, they often have disadvantages in terms of not only interpretability but also in involving large numbers of parameters that often lead to daunting computation and communication issues. This article is motivated by an attempt to propose a simple, interpretable, and principled strategy that can improve upon mean-based models in real-world time series forecasting.

There is a rich literature on robust time series forecasting, partially motivated by that time series are often measured subject to outliers, including categorizing outliers 

(Fox, 1972; Akouemo and Povinelli, 2014), adjusting autoregressive (AR) models to offset effects of outliers (Chen and Liu, 1993a, b), exponential smoothing and Holt-Winters seasonal methods to M-estimation (Croux et al., 2008), and weighted forecasts (Jose and Winkler, 2008), just to name a few. As a nonlinear alternative, the median is more robust than mean. While the application of median-based methods in time series at least dates back to 1974 when John Tukey introduced the running median method (Tukey, 1974), there is surprisingly little work to comprehensively investigate the modeling, fitting, and uncertainty quantification of a median-based model in the context of time series forecasting, and in particular, how it will compare with state-of-the-art mean-based methods using real data and under various loss functions.

In the quantile regression literature that encompasses median as a special case (Koenker and Bassett Jr, 1978), Koenker and Xiao (2006) proposes quantile autoregression (QAR) models which depict the conditional distributions of the response more comprehensively at various quantile levels. Engle and Manganelli (2004) proposed the conditional autoregressive value at risk (CAViaR) model for risk management at extreme quantile levels, and has been followed by many others (Giacomini and Komunjer, 2005; Chen and So, 2006; Geweke and Keane, 2007; Gerlach et al., 2011). However, they are developed either for general quantile levels or extreme quantile levels and have not been applied to time series forecasting. Moreover, QAR and CAViaR are semiparametric approaches, resorting to minimizing the check loss function for estimation and necessitating non-trivial modifications to likelihood-based order selection criteria when the order is unknown.

In this paper, we propose a simple strategy by extending the traditional AR model to a median AR model (MAR) for time series forecasting. The AR model is arguably one of the most popular methods in time series, serving as the building block for other models such as generalized autoregressive conditional heteroskedasticity models (GARCH, Bollerslev, 1986)

and time-varying vector autoregressive models

(TV-VAR Primiceri, 2005). The proposed method utilizes time-varying quantile regression but focuses on the median, favorably inheriting the robustness of median regression in contrast to the widely used mean-based methods. It relies on parametric assumptions, which aids interpretation and enables convenient uncertainty quantification and propagation through a principle Bayesian framework. Numerical experiments using U.S. macroeconomic data show that this simple MAR approach leads to favorable and often superior predictive performances compared to selected state-of-the-art mean-based methods that are much more complicated in nature, and it is remarkable that the comparison is made not only under the absolute but also the squared error loss. The proposed methods are generic and can be used to complement any methods that build on the AR models by altering the Gaussian error assumption therein.

The MAR model has a close connection with the working asymmetric Laplace likelihood approach in Bayesian quantile regression. The working likelihood formalized by Yu and Moyeed (2001) has recently gained increasing attentions (Gerlach et al., 2011; Sriram et al., 2016; Youngman, 2018; Liu et al., 2019). It provides a principled and convenient framework to quantify uncertainties in the parameter estimation eliminating the challenging task to estimate the unknown conditional density functions that are required by the conventional quantile regression for inference (Yang et al., 2016)

. Although the asymmetric Laplace likelihood is generally not the true data generating likelihood, a pragmatic view to support its use is that the maximum a posterior estimate resembles the usual quantile regression estimates which optimizes the check loss function, and theoretically, the posterior distribution of parameters concentrates to what minimizes the Kullback-Leibler divergence with respect to the true data-generating models 

(Kleijn and van der Vaart, 2006). Unlike quantile regression where one may vary the quantile levels, the median is the primary quantile level of interest in time series forecasting. Therefore, the MAR model uses the Laplace distribution as the likelihood, alleviating the concern that the working likelihood is not a valid likelihood when considering multiple quantile levels. This fully parametric model enables routine posterior sampling. We estimate model parameters by Markov chain Monte Carlo, and propose to use a Bayesian model averaging (BMA) approach (Hoeting et al., 1999) to propagate the model uncertainty in the unknown autoregressive order in addition to a Bayesian model selection strategy.

The rest of the paper is organized as follows. Section 2 introduces the MAR model, estimation, and forecasting procedure. In Section 3 we conduct simulations to compare the proposed approach with other competitive methods to assess parameter estimation. Section 4 consists of a variety of applications using real-world economic data. Section 5 concludes the paper.

2 Methods

2.1 Median Autoregressive (MAR) model

We propose a median autoregressive (MAR) model for time series forecasting, alternative to the widely used mean-based models. Suppose we observe a vector of time series . A MAR model with order , denoted as MAR, first assumes a time-varying quantile regression structure

(1)

where , is vector of unknown coefficient, and is random error with median 0. Model (1) is semiparametric as the error distribution is left unspecified other than the constraint of possessing a zero median. The classic AR model with a given order , denoted as AR

, assumes a Gaussian error distribution with mean 0 and standard deviation

, i.e., The MAR model further assumes

whose probability density function is

(2)

where is a scale parameter. The Laplace error assumption combined with the semiparametric structure in Equation (1) yielding the following likelihood function for the MAR model

(3)

which is parametric. The parametric assumption in (2) aids interpretation and enables convenient uncertainty quantification and propagation through a principle Bayesian framework. In addition, the autoregressive structure in the MAR model resembling the widely used AR model provides enormous flexibility and potential to complement the rich literature that builds on the AR model.

The use of Laplace distributions is common in the Bayesian quantile regression literature where the th quantile of the error in Equation (1) is assumed to be 0. For general , a working likelihood method adopts the asymmetric Laplace distribution as the error distribution, which has the probability density function

(4)

where is a location parameter and is the indicator function. The asymmetric Laplace distribution reduces to the Laplace distribution at the median by setting and . In view of this intimate connection with Bayesian quantile regression as well as the subsequent Bayesian estimation and prediction, we refer to the MAR model with Laplace errors given by (1) and (2) Bayesian MAR, or BayesMAR, and use it exchangeably with the MAR model.

2.2 Prior specification and posterior sampling at a given

We use an uninformative prior for and the Jeffreys prior for , namely,

(5)

The posterior distribution of is

(6)

which is proper (Choi and Hobert, 2013). The regression coefficients are instrumental for time series forecasting, and we derive their marginal posterior distributions by integrating out for efficient sampling:

(7)

The posterior sampling of proceeds by Markov chain Monte Carlo (MCMC) via the Metropolis-Hastings (MH) algorithm (Metropolis et al., 1953; Hastings, 1970):

  1. At iteration , draw a candidate sample from the proposal independently for , where is the value of at iteration , follows a Uniform distribution and the scalar controls the step size of each move;

  2. Accept as with probability . Otherwise, set .

We tune the parameter in Step 1 such that the final acceptance rate is between and  (Gelman et al., 1996). In addition, we have also implemented Gaussian and heavy-tailed student- proposals, which are recommended by Gerlach et al. (2011) when studying extreme quantile levels under the CAViaR model. We do not observe empirical advantages of using such proposals over a uniform proposal under the MAR model, suggesting that one might have more flexible choices of proposals for median regression. For all experiments, we use 40,000 MCMC samples with 25,000 burn-ins, initialize randomly within the unit interval, and use the posterior mean as the Bayes estimate of . Trace plots indicate the MCMC samples converge quickly, mostly within thousands of iterations.

2.3 Order selection and Bayesian model averaging

The order in MAR() is typically unknown. We address the problem of unknown in the Bayesian framework by putting a prior on . In practice, we usually can specify a maximum order as otherwise a too large hampers the interpretability. We endow the order with a uniform prior on with being the specified maximum order. Then the posterior distribution of in the prior support is

(8)
(9)

The order can be selected by using the maximum a posteriori (MAP) estimate

(10)

For time series forecasting, a more appealing perspective is to use Bayesian model averaging (BMA) to propagate uncertainties in the model space, i.e.,

(11)

where is the prediction under order . A 1-step ahead predictive density conditional on is

In general, for a -step ahead predictive density , we have

(12)

where

The main challenge implement MAP and BMA lies in the evaluation of for large as the integrand may quickly decay to zero leading to numerical errors in the numerical integration. We resort to an approximation to using the Bayesian information criterion or BIC (Kass and Raftery, 1995; Neath and Cavanaugh, 2012). In particular, letting be the MAP estimates of , then the the BIC of MAR() is

(13)

where is the sample size. We approximate by up to multiplicative constants, leading to aggregated predictions

(14)

It turns out that we can calculate the MAP estimates efficiently. To see this, first note

(15)

For any , the posterior distribution in Equation (6) attains its maximum at

(16)

provided

. This corresponds to the estimators of minimizing absolute error in median regression, which can be efficently solved by linear programming 

(Koenker, 2005). An analytical solution of is available through a Gamma kernel:

(17)
(18)

Before substituting and into Equation (13), we notice that both the likelihood function and sample size depend on the order chosen. To reconcile various sample sizes at different orders, we use the last samples to evaluate the likelihood function for any . Consequently, is calculated as

(19)
(20)
(21)
(22)
(23)

3 Simulation

In this section, we conduct simulations to assess the performances of BayesMAR with mean-based methods, focusing on parameter estimation under various model assumptions. To this end, we choose the AR model and the generalized autoregressive conditional heteroscedasticity model (GARCH), and defer predictive comparisons and more recent mean-based methods to real data application in Section 

4.

We generate data according to the model

(24)

where We consider two scenarios depending on the distribution of : Gaussian error where and Laplace error where , which correspond to the model assumptions of the AR and BayesMAR models, respectively.

For each error assumption, we generate 200 observations and replicate such simulation 100 times. We use maximum likelihood estimation (Gardner et al., 1980) for AR via the ‘arima’ function in the R package stats. We use AR()-GARCH(1,1) when implementing GARCH, i.e.,

(25)
(26)
(27)

We fit the model using the R package fGarch, where all parameters are estimated by quasi-maximum likelihood (Bollerslev and Wooldridge, 1992). In addition, we implement the Quantile Autoregression (QAR) method proposed by Koenker and Xiao (2006) using the R package quantreg, to compare its finite sample performance with BayesMAR.

We assess estimates of by each method based on mean squared error (MSE). For a generic parameter , letting the estimate be in the th simulation and

, then the MSE and its standard error are

We first provide the true order to all models and compare their performances. Table 1 reports the MSE of all methods. We can see that all methods benefit from a correctly specified error distribution, supported by the observation that AR and GARCH have the smallest MSEs under Gaussian error, while BayesMAR and QAR have smaller MSEs when data are generated from Laplace distributions. However, BayesMAR appears to suffer less from the model misspecification; for example, BayesMAR increases the MSE of by 0.13 within standard errors of AR under Gaussian error, while AR doubles the MSE of that is beyond standard errors of BayesMAR under Laplace error. It is reassuring to see BayesMAR gives either the same or better MSEs than QAR in all cases, although all differences are within one standard error. This finite sample performance is consistent with the findings in Gerlach et al. (2011) when comparing sampling-based Bayesian approaches with optimization-based counterparts for extreme quantile levels.

 

Models Error Gaussian Laplace
BayesMAR MSE 1.07 0.56 0.63 0.73 0.27 0.19
SE 0.13 0.07 0.07 0.11 0.04 0.03
QAR MSE 1.21 0.62 0.75 0.75 0.27 0.19
SE 0.15 0.08 0.08 0.11 0.05 0.02
AR MSE 0.74 0.42 0.50 1.06 0.51 0.44
SE 0.09 0.05 0.08 0.16 0.06 0.06
GARCH MSE 0.76 0.41 0.51 1.08 0.50 0.40
SE 0.09 0.05 0.08 0.16 0.06 0.05

 

Table 1: MSE of all methods under Gaussian error and Laplace error. Standard errors are reported below MSEs. All summaries have been multiplied by .

We next investigate the selection of the unknown order in BayesMAR using the BIC approach described in Section 2.3. We provide a large upper bound for the order . Figure 1 plots the distribution of in both scenarios and suggest that the selected orders almost always concentrate around the oracle value , even when the model is misspecified under Gaussian error. The overall accuracy across all simulations to select using MAP is for normal errors and for Laplace errors.

Figure 1: Distributions of selected orders using BIC.

4 Real Data Application

In this section, we compare the predictive performances of the proposed methods relative to mean-based methods using various real-world data. We use three economic series from Federal Reserve Economic Data: the quarterly data Producer Price Index for all commodities (FRED: U.S. Bureau of Labor Statistics, 2018), 3-Month Treasury Bill: secondary market Rate (FRED: Board of Governors of the Federal Reserve System (US), 2018), and Unemployment Rate (FRED: Organization for Economic Co-operation and Development, 2018), coded as PPI, TBR, and UR, respectively. Each time series ranges from 1968Q3 to 2018Q2, containing 200 observations. The raw data ’s are plotted in Figure 2, and the lagged data of order one are plotted in Figure 3. These three data sets have distinct patterns: the lagged PPI tends to be more stable before the crisis in the year of 2008, contrasted with the substantial fluctuation since then; the lagged TBR has more dramatic changes in earlier periods than the latest periods; the lagged UR tends to present a periodic pattern while contains several extreme values. The complex characteristics present in these real-world data enable comparison between model-based methods when there is no guarantee for any model assumptions to hold.

Figure 2: Raw data of PPI, TBR, and UR. The blue line is .

In addition to AR and GARCH, we implement selected methods in the dynamic linear model (DLM) literature. In particular, we consider the time-varying vector AR (TV-VAR) model proposed by Nakajima and West (2013), which links time-varying parameters to latent threshold processes and achieves state-of-the-art predictive performance in selected applications. For a length 3 vector stacking the three time series , it expands a DLM by utilizing a latent threshold vector with for :

(28)
(29)
(30)

where is the vector of time-varying intercepts, is the matrix of time-varying coefficients at lag , stacks and by rows and by order from 1 to , , and is a latent time-varying parameters vector whose dynamic updates are specified by via a standard VAR model. The latent threshold vector shrinks the time-varying coefficients to zero, leading to dynamic sparsity and improved prediction. We implement two versions of TV-VAR: TV-VAR without latent threshold (NT) and TV-VAR with latent threshold (LT). Both methods are applicable to multivariate time series, and we adapt the OxMetrics code provided in the paper (Nakajima and West, 2013) to the vector observations consisting of the three time series.

For BayesMAR, we use BayesMAR-BMA to denote the Bayesian model averaging strategy and BayesMAR-MAP when the MAP estimate of is used. The orders of AR and BayesMAR-MAP are selected by BIC. GARCH(1,1) uses the same autoregressive order as chosen in AR; although orders other than (1, 1) can be used, we observe that the GARCH(1,1) consistently leads to the best predictive performances of GARCH in our real data application. Since there are no immediately available model selection methods for NT or LT, we run the order from 1 to 5 then choose the optimal results to favor NT and LT under each criterion.

We use recursive out-of-sample forecasting to assess each method after =2008Q3. In particular, we fit each model with data up to quarter and conduct an -step ahead prediction for . Then we move one period ahead and repeat the same procedure until we reach . All methods are applied to the lagged data of order one to remove local trends and forecast the changes that lead to predictions of . We calculate the root MSE (RMSE) and mean absolute error (MAE) from to to compare performances of each method, i.e.,

(31)

We additionally calculate the the relative change using BayesMAR-BMA as the reference to ease comparison, that is,

(32)

Table 2 reports the RMSEs of all methods for the three data sets. The results indicate that BayesMAR outperforms alternatives in all scenarios. Compared to BayesMAR-BMA, the methods of AR, GARCH, NT, and LT increase the RMSE by 10% to 63.6% for TBR, and 7.1% to 54.8% for UR. Although the gap in the prediction for PPI is smaller (between 0.9% to 5.4%), all competing methods have a larger RMSE than BayesMAR-BMA. GARCH accounts for comprehensive variance structures and the two time-varying modeling methods, NT and LT, dynamically update regression coefficients, which are arguably powerful methods with considerable complexity; it is remarkable that the proposed simple BayesMAR leads to favorable and often superior performances using real-world data.

 

Producer Price Index
Models RMSE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 3.18 5.94 7.74 9.24 - - - -
BayesMAR-MAP 3.21 5.94 7.69 9.13 0.8 0.1 -0.7 -1.1
AR 3.36 6.24 8.06 9.42 5.4 5.1 4.0 1.9
GARCH 3.21 6.05 7.85 9.31 0.9 1.9 1.4 0.8
NT 3.22 6.06 7.87 9.36 1.1 2.0 1.6 1.4
LT 3.27 6.05 8.00 9.69 2.6 1.9 3.2 4.9

 

Treasury Bill Rate
Models RMSE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 0.91 1.49 2.16 2.94 - - - -
BayesMAR-MAP 0.90 1.48 2.16 2.93 -0.9 -0.1 -0.1 -0.1
AR 1.32 1.77 2.62 3.69 45.7 19.5 21.3 25.7
GARCH 1.40 1.87 2.61 3.44 54.5 26.2 20.6 17.3
NT 1.48 2.30 2.94 3.56 63.6 54.9 36.0 21.2
LT 1.00 2.01 2.91 3.90 10.0 35.1 34.8 32.9

 

Unemployment Rate
Models RMSE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 3.35 4.85 5.70 6.55 - - - -
BayesMAR-MAP 3.41 4.93 5.79 6.65 1.9 1.6 1.6 1.5
AR 4.13 6.26 8.16 9.50 23.4 29.0 43.2 44.9
GARCH 4.53 6.26 7.05 7.02 35.3 29.0 23.8 7.1
NT 4.41 6.86 8.76 9.32 32.0 41.4 53.8 42.2
LT 5.18 6.22 8.56 9.34 54.8 28.3 50.2 42.5

 

Table 2: Forecasting Performance for U.S. macroeconomic data after 2007-2008 financial crisis: RMSEs for -step ahead prediction for . The relative change uses BayesMAR-BMA as the baseline.

Table 3 reports the MAEs of all methods, suggesting similar observations as made from Table 2. The proposed BayesMAR methods give the smallest MAEs in nearly all scenarios with one exception for the UR when compared to GARCH at 4-step ahead prediction.

 

Producer Price Index
Models MAE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 2.50 4.44 5.90 7.26 - - - -
BayesMAR-MAP 2.50 4.44 5.84 7.18 0.2 0.1 -1.0 -1.1
AR 2.69 4.52 6.04 7.36 7.5 1.7 2.3 1.4
GARCH 2.54 4.59 5.98 7.29 1.8 3.5 1.4 0.4
NT 2.54 4.58 6.04 7.36 1.8 3.3 2.4 1.4
LT 2.60 4.53 6.34 7.66 4.2 2.1 7.5 5.6

 

Treasury Bill Rate
Models MAE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 0.56 1.05 1.56 2.10 - - - -
BayesMAR-MAP 0.56 1.05 1.55 2.09 -0.6 -0.2 -0.0 0.0
AR 0.82 1.23 1.85 2.55 45.9 16.7 18.3 21.2
GARCH 0.75 1.13 1.69 2.28 34.5 7.4 8.4 8.8
NT 0.75 1.28 1.75 2.30 34.5 21.5 12.2 9.4
LT 0.63 1.19 1.75 2.40 12.2 12.9 12.3 14.4

 

Unemployment Rate
Models MAE Relative change (%)
1 2 3 4 1 2 3 4
BayesMAR-BMA 2.15 3.28 3.91 5.12 - - - -
BayesMAR-MAP 2.23 3.38 4.03 5.29 3.7 3.2 3.2 3.3
AR 2.52 4.02 5.43 6.58 17.5 22.6 39.0 28.5
GARCH 2.71 3.95 4.60 4.79 26.0 20.5 17.8 -6.4
NT 3.04 4.38 5.33 5.92 41.4 33.4 36.3 15.6
LT 4.11 4.65 6.59 7.70 91.6 41.9 68.6 50.5
Table 3: Forecasting Performance for U.S. macroeconomic data after 2007-2008 financial crisis: MAEs for -step ahead prediction for . The relative change uses BayesMAR-BMA as the baseline.

The RMSEs of BayesMAR-BMA and BayesMAR-MAP are close to each other with at most 1.1% (PPI), 0.9% (TBR), 1.9% (UR) relative differences, and similar observations hold for MAEs. This is caused by highly concentrated weights of model orders when using BIC for order selection. In particular, we find BayesMAR-BMA puts more than weight on the order selected in BayesMAR-MAP, leading to minimal differences between the two variants of BayesMAR. The provided R package allows both options.

Figure 3 compares absolute 1-step predictive errors at each from to given by the three static models, BayesMAR-BMA, AR, and GARCH, which provides insights into understanding the performance of BayesMAR. We choose AR and GARCH here as they build on similar model structures as BayesMAR. We can see that AR yields a large prediction error at the beginning (PPI), which is substantially reduced by GARCH that incorporates heterogeneous variance structures. It is reassuring that the proposed BayesMAR method, which uses a simple constant variance structure, achieves the same predictive gain (for PPI) or even further improvements (for TBR and UR). At later time points when the time series stabilizes without usually large deviations, BayesMAR tends to give similar performances as AR and GARCH. Since BayesMAR is a parametric model bearing the same model structure of AR, these comparisons may suggest further performance gain is possible by following the rich literature that extends AR to various more advanced models such as GARCH and dynamic models but simply altering the error assumption from Gaussian to Laplace.

Figure 3: 1-step ahead predictive absolute errors at each period from 2008Q3.

We plot the 95% confidence/credible intervals of AR and BayesMAR-BMA in Figure 

4

. For the first two data sets, BayesMAR has narrower intervals covering the true observations, especially for UBR which hardly changes lately. Although BayesMAR yields wider intervals for UR, AR has multiple observations either close to the boundary or outside of the confidence intervals (around 09Q4), suggesting over-confidence.

Figure 4: 1-step ahead 95% confidence interval of AR and 95% credible interval of BayesMAR-BMA. True observations are points marked green, and predictions are marked red. Blue bars are the confidence/credible intervals.

5 Concluding remarks

This article proposes a Bayesian median autoregressive (BayesMAR) model for robust time series forecasting. The proposed method has close connections with time-varying quantile regression. BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting with principled uncertainty quantification through Bayesian model averaging and Bayesian model selection. Real data applications using U.S. macroeconomic data show that BayesMAR leads to favorable and often superior predictive performances than the selected state-of-the-art mean-based alternatives under various loss functions.

BayesMAR enjoys practical benefits as a technical tool to introduce robustness and improve predictions. In addition, one key insight in BayesMAR, the autoregressive structure that resembles the widely used AR model, can be used to complement a rich class of methods that build on the AR model. The AR model is arguably one of the most popular methods in time series, serving as the building block for other models such as GARCH and TV-VAR in the literature of DLM. The proposed MAR model shows the potentials for further performance gain by following the rich literature that extends AR to more advanced models with Laplace error assumptions rather than Gaussian.

Supplementary Materials

The R code to implement the proposed methods is publicly available at https://github.com/xylimeng/BayesMAR.

References

  • Akouemo and Povinelli (2014) Akouemo, H. N. and Povinelli, R. J. (2014).

    Time series outlier detection and imputation.

    In Proceedings of IEEE PES General Meeting |Conference & Exposition, pages 1–5.
  • Bollerslev (1986) Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307–327.
  • Bollerslev and Wooldridge (1992) Bollerslev, T. and Wooldridge, J. M. (1992). Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric Reviews, 11(2):143–172.
  • Chen and Liu (1993a) Chen, C. and Liu, L.-M. (1993a). Forecasting time series with outliers. Journal of Forecasting, 12(1):13–35.
  • Chen and Liu (1993b) Chen, C. and Liu, L.-M. (1993b). Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association, 88(421):284–297.
  • Chen and So (2006) Chen, C. W. and So, M. K. (2006). On a threshold heteroscedastic model. International Journal of Forecasting, 22(1):73–89.
  • Choi and Hobert (2013) Choi, H. M. and Hobert, J. P. (2013).

    Analysis of mcmc algorithms for bayesian linear regression with laplace errors.

    Journal of Multivariate Analysis

    , 117:32–40.
  • Croux et al. (2008) Croux, C., Gelper, S., and Fried, R. (2008). Computational aspects of robust holt-winters smoothing based on m-estimation. Applications of Mathematics, 53(3):163.
  • Engle and Manganelli (2004) Engle, R. F. and Manganelli, S. (2004). Caviar: Conditional autoregressive value at risk by regression quantiles. Journal of Business & Economic Statistics, 22(4):367–381.
  • Fan and Yao (2008) Fan, J. and Yao, Q. (2008). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer Science & Business Media.
  • Ferraty and Vieu (2006) Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Science & Business Media.
  • Fox (1972) Fox, A. J. (1972). Outliers in time series. Journal of the Royal Statistical Society: Series B (Methodological), 34(3):350–363.
  • FRED: Board of Governors of the Federal Reserve System (US) (2018) FRED: Board of Governors of the Federal Reserve System (US) (2018). 3-Month Treasury Bill: Secondary Market Rate. retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/TB3MS, Aug 09, 2018.
  • FRED: Organization for Economic Co-operation and Development (2018) FRED: Organization for Economic Co-operation and Development (2018). Unemployment Rate: Aged 15-64: All Persons for the United States. retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/LRUN64TTUSQ156N, Aug 09, 2018.
  • FRED: U.S. Bureau of Labor Statistics (2018) FRED: U.S. Bureau of Labor Statistics (2018). Producer Price Index for All Commodities. retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PPIACO, Aug 09, 2018.
  • Gardner et al. (1980) Gardner, G., Harvey, A. C., and Phillips, G. D. A. (1980).

    An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering.

    Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3):311–322.
  • Gelman et al. (1996) Gelman, A., Roberts, G. O., and Gilks, W. R. (1996). Efficient metropolis jumping rules. In Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., editors, Bayesian Statistics 5, pages 599–607. Oxford.
  • Gerlach et al. (2011) Gerlach, R. H., Chen, C. W., and Chan, N. Y. C. (2011). Bayesian time-varying quantile forecasting for value-at-risk in financial markets. Journal of Business & Economic Statistics, 29(4):481–492.
  • Geweke and Keane (2007) Geweke, J. and Keane, M. (2007). Smoothly mixing regressions. Journal of Econometrics, 138(1):252–290.
  • Giacomini and Komunjer (2005) Giacomini, R. and Komunjer, I. (2005). Evaluation and combination of conditional quantile forecasts. Journal of Business & Economic Statistics, 23(4):416–431.
  • Hastings (1970) Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97–109.
  • Hoeting et al. (1999) Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial (with discussion). Statistical Science, 14(4):382–417.
  • Hyndman and Athanasopoulos (2018) Hyndman, R. J. and Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.
  • Jose and Winkler (2008) Jose, V. R. R. and Winkler, R. L. (2008). Simple robust averages of forecasts: Some empirical results. International Journal of Forecasting, 24(1):163–169.
  • Kass and Raftery (1995) Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430):773–795.
  • Kleijn and van der Vaart (2006) Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. The Annals of Statistics, 34(2):837–877.
  • Koenker (2005) Koenker, R. (2005). Quantile regression. Cambridge University Press: Cambridge, UK.
  • Koenker and Bassett Jr (1978) Koenker, R. and Bassett Jr, G. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, pages 33–50.
  • Koenker and Xiao (2006) Koenker, R. and Xiao, Z. (2006). Quantile autoregression. Journal of the American Statistical Association, 101(475):980–990.
  • Liu et al. (2019) Liu, Y., Li, M., and Morris, J. S. (2019). Function-on-scalar quantile regression with application to mass spectrometry proteomics data. The Annals of Applied Statistics, pages 1–30. To appear, arXiv:1809.00266.
  • Metropolis et al. (1953) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087–1092.
  • Nakajima and West (2013) Nakajima, J. and West, M. (2013). Bayesian analysis of latent threshold dynamic models. Journal of Business & Economic Statistics, 31(2):151–164.
  • Neath and Cavanaugh (2012) Neath, A. A. and Cavanaugh, J. E. (2012). The bayesian information criterion: background, derivation, and applications. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):199–203.
  • Prado and West (2010) Prado, R. and West, M. (2010). Time Series: Modeling, Computation, and Inference. Chapman and Hall/CRC.
  • Primiceri (2005) Primiceri, G. E. (2005). Time varying structural vector autoregressions and monetary policy. The Review of Economic Studies, 72(3):821–852.
  • Sriram et al. (2016) Sriram, K., Shi, P., and Ghosh, P. (2016). A bayesian quantile regression model for insurance company costs data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(1):177–202.
  • Tukey (1974) Tukey, J. W. (1974). Nonlinear (nonsuperposable) methods for smoothing data. CONGRESS RECORD (EASCO), page 673.
  • Yang et al. (2016) Yang, Y., Wang, H. J., and He, X. (2016). Posterior inference in bayesian quantile regression with asymmetric laplace likelihood. International Statistical Review, 84(3):327–344.
  • Youngman (2018) Youngman, B. D. (2018). Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts. Journal of the American Statistical Association, pages 1–31. To appear.
  • Yu and Moyeed (2001) Yu, K. and Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4):437–447.