1 Introduction
In macroeconomics and finance, most models commonly employed to study the transmission of economic shocks or produce predictions assume linearity in their parameters and are fully parametric. One prominent example of a frequently used fully parametric and linear model is the vector autoregression (VAR) that is extensively used in central banks and academia (see Sims, 1980; Doan et al., 1984; Litterman, 1986; Sims and Zha, 1998). In normal times, with macroeconomic relations remaining stable, this linearity assumption might describe the data well. In turbulent times, however, key transmission channels often change and more flexibility could be necessary.
The linearity assumption has been questioned several times in the recent literature. For instance, Granger and Terasvirta (1993)
show that macroeconomic quantities might exhibit nonlinear relations and thus assuming linearity might be overly restrictive. As a solution, researchers increasingly wish to estimate nonlinear models that assume the parameters to change over time and/or shock variances to be timevarying. These timevarying parameter (TVP) models, however, assume that within each point in time, the relationship between the endogenous and explanatory variables is linear. For recent contributions, see
Primiceri (2005); Kalli and Griffin (2014); Bitto and FrühwirthSchnatter (2019); Huber et al. (2019); Chan et al. (2020).Another strand of the literature proposes Bayesian nonparametric time series models in order to relax the linearity assumption. Particularly, several recent papers (see, e.g. Bassetti et al., 2014; Kalli and Griffin, 2018; Billio et al., 2019) propose novel methods that assume the transition densities to be nonparametric. These techniques are characterized by featuring an infinite dimensional parameter space that is flexibly adjusted to the complexity of the data at hand. For a review on Bayesian nonparametric methods, see Hjort et al. (2010)
. Within the nonparametric paradigm, there has been a number of popular methods that make use of machine learning techniques such as boosting
(Freund and Schapire, 1997; Friedman, 2001), bagging and random forests
(Breiman, 2001), decision trees and Mondrian forests (Roy and Teh, 2009; Lakshminarayanan et al., 2014) and Bayesian tree models (Chipman et al., 2002; Gramacy and Lee, 2008; Chipman et al., 2010; Linero, 2018). In this paper, we focus on Bayesian tree models applied to multivariate time series regressions.The main contribution of the present paper is to connect the literature on Bayesian additive regression trees (BART, see Chipman et al., 2010)
with the literature on VAR models. BART is a nonparametric regression approach which allows for unveiling nonlinear relations between a set of endogenous and explanatory variables without much input needed from the researcher. Intuitively speaking, it models the conditional mean of the regression model by summing over a large number of trees which are, by themselves, constrained through a regularization prior. The resulting individual trees will take a particularly simple form and can thus be classified as ”weak learners”. Each single tree only explains a small fraction of the variation of the response variable while a large sum of them is capable of describing the data extremely well.
It is precisely this intuition on which we build on when we generalize these techniques to a multivariate setting. More precisely, we assume that a potentially large dimensional vector of endogenous variables is driven by the lags of the response variables. As opposed to VAR models which assume linearity between the explanatory and endogenous variables, we capture these relations using BART. The resulting model, labeled the Bayesian Additive Vector Autoregressive tree (BAVART) model, is a highly flexible variant that can be used for forecasting and impulse response analysis.
Estimation and inference is carried out using Markov Chain Monte Carlo (MCMC) techniques. Since the error covariance matrix is nondiagonal, we propose methods that allow for equationbyequation estimation of the multivariate model. These techniques imply that model estimation scales well in high dimensions and permits estimation of huge dimensional models. Another crucial feature of our approach is that it allows for flexibly handling heteroscedasticity. We control for heteroscedasticity by proposing a stochastic volatility specification. This feature is crucial for producing precise density forecasts.
We illustrate our BAVART model using macroeconomic data for the Eurozone. In this application, we assess the forecasting accuracy of our method relative to competitors that are commonly used in the literature. Since interest not only centers on onestepahead predictive distributions but also on multistepahead forecasts, we provide algorithms to simulate from the relevant predictive distributions. These techniques are then used in a second application where we consider the effects of economic uncertainty on the Eurozone. We show how our model can be used to compute generalized impulse response functions (SGIRFs) and find responses that are consistent with established findings in the literature.
The remainder of this paper is organized as follows. Section 2 discusses the BART model in the context of the homoscedastic regression model while Section 3 extends this method to the VAR case and proposes the BAVART specification and how we control for heteroscedasticity. Section 4 presents the results of the forecasting and impulse response exercise. Finally, the last section summarizes our findings and concludes the paper.
2 Bayesian Additive Regression Tree Models
In this section, we briefly review the BART approach for regression on which we build on in later sections. For an extensive introduction, see Hill et al. (2020). Let denote the dimensional response vector and be a dimensional matrix of exogenous variables with being the covariates in time . We assume the following relationship between and :
where denotes the error variance and is a (potentially) highly nonlinear function. The BART model is then obtained by approximating through a sum of many regression trees as follows:
(1) 
In Eq. (1), corresponds to a single tree model with denoting the tree structure associated with the binary tree and is the vector of terminal node parameters associated with and are the leaves of the tree. The binary trees are constructed by considering splitting rules of the form or with being a partition set. These rules typically depend on selected columns of , denoted as , and a threshold . The set is then defined by a unique partition of the predictor space such that or .
The step function is constant over the elements of :
Hence, the set defines a treespecific unique partition of the covariate space such that the function returns a specific value for specific values of .
To avoid overfitting, the trees are encouraged to be small (i.e. take a particularly simple form) and the terminal node parameters to be shrunk to zero. If the first tree, , is a weak learner and fitted in a reasonable way, the corresponding tree structure will be very simple and elements in will be pushed towards zero. This implies that the first tree will explain a small fraction of the variation in . Subtracting from yields a new conditional model with transformed and the next tree will be fitted. This procedure is repeated for a sufficiently large number of trees until the fit of the additive model becomes reasonably good.
We illustrate BART using a simple example. In this simple example, we use a single regression tree model (i.e. ). The case of several regression trees is considered afterwards.
Example 1.
Consider the regression tree in Figure 1. We include two covariates . In the figure, each rectangle refers to a splitting rule and henceforth label this a node. At the top node (also called root node or simply root), we have the condition that asks whether . If this condition is not true, then we arrive at the terminal node, and set . By contrast, if the condition is fulfilled we move to the next node. The corresponding splitting rule states that if , we reach a terminal node with whereas if , we set . Using a single tree thus yields three possible values for the conditional mean, and and implies abrupt shifts between them. If is evolving smoothly over time, using a single tree would thus lead to a rather simplistic conditional mean structure.
Example 2.
Since the tree model in the first example is too simplistic, our second example deals with a more flexible conditional mean structure. Let us consider a sum of regression trees with trees and covariates, depicted in Figure 2. Within a given tree, the same intuition as in Example 1 applies. However, since we now use two trees we gain more flexibility in modeling the conditional mean of . This is because the conditional mean at a given point in time is equal to the sum of the terminal node parameters for the two trees and the combination of the two parameters depend on the specific set of decision rules across both trees. This allows for a substantially richer mean structure and increased flexibility as opposed to a single tree model.
Regression tree,  Regression tree, 

The second example shows how flexibility is increased by adding more trees and illustrates how BART handles nonlinearities in a flexible manner. In particular, each regression tree is a simple stepwise function and when we sum the different regression trees, we gain flexibility. The resulting additive model essentially allows for approximating nonlinearities without prior assumptions on the specific form on the nonlinearities.
3 BART models for multivariate time series analysis
3.1 Bayesian Additive Vector Autoregressive Tree Models
In this section we generalize the model outlined in the previous section to the multivariate case. Consider a dimensional vector of endogenous variables . Stacking the rows yields a matrix . We assume that depends on a matrix with each being a dimensional vector of lagged endogenous variables. The Bayesian Additive Vector Autoregressive Tree (BAVART) model is then given by:
(2) 
where and
. For the moment, we assume that the error variancecovariance matrix
is timeinvariant. In Eq. (2), is defined in terms of equationspecific functions :Similarly to the standard BART specification, we approximate each through a sum of regression trees:
Here, denotes an equationspecific step function with arguments and . As before, the individual tree structures are associated with a dimensional vector of terminal node coefficients associated with denoting the number of leaves per tree in equation .
As stated in Section 2, a regression tree creates a partition of the covariate space into subgroups. Internal nodes of regression trees are equipped with splitting rules that are characterized by some column of , labeled , and a specific threshold which can take any value in the range of . Such a decision rule takes the form or and thus allocates observations according to this condition.
We estimate the BAVART model by exploiting its structural form. Using the structural form of the VAR to speed up computation has been done in several recent papers (see, e.g., Carriero et al., 2019; Huber et al., 2020). In our case, Eq. (2) can be written as follows:
(3) 
whereby is defined similarly to , is a dimensional lower triangular matrix with and denotes a matrix of orthogonal shocks with with denoting a dimensional diagonal matrix with the variances on its main diagonal.
Conditional on , this form permits equationbyequation estimation since the shocks are independent. This leads to enormous computational gains. Notice that the equation can be written as:
(4) 
Here, we let denote a regression tree while and refers to the column of and , respectively. denotes the element of .
Notice that Eq. (4) is a generalized additive model that consists of a nonparametric part and a regression part . For , the model reduces to a standard BART specification without including the contemporenous values of the other endogenous variables.
The key idea behind this formulation is that, for a sufficiently large number of trees, we approximate nonlinear relations between and its lags while allowing for linear relations between the contemporaneous values of . After obtaining estimates of the trees and the free elements of , we can solve Eq. (3) for its reduced form representation as follows:
(5) 
with , and .
3.2 Allowing for Heteroscedasticity
Up to this point we assumed the error variance to be constant. If the researcher wishes to relax this assumption, several feasible options exist. In a recent paper, Pratola et al. (2019) propose a combination of a standard additive BART model with a multiplicative BART specification to flexibly control for heteroscedasticity. But modeling heteroscedasticity with BART implies that we need some information on how the volatility of economic shocks depends on additional covariates (which potentially differ from ). As a simple yet flexible solution we adopt a standard stochastic volatility (SV) model. SV models are frequently used in macroeconomics and finance and possess a proven track record for forecasting applications (see, e.g., Clark, 2011; Clark and Ravazzolo, 2015). Our SV specification simply assumes that is timevarying:
with the timevarying variances . Their logarithms follow an AR(1) model:
(6) 
Here, is the unconditional mean, is the persistence parameter and is the error variance of the logvolatility process. In our empirical work, all models feature stochastic volatility of this form.
3.3 The Prior and Posterior Simulation
The Bayesian approach calls for the specification of suitable priors over the parameters of the model. Here we mainly follow the different strands of the literature our approach combines and thus only provide a brief summary. In particular, we focus on the priors associated with the trees and the terminal node parameters for and
. We assume that the hyperparameters across priors are the same for each equation.
For each equation , the joint prior structure is given by:
At this level, the prior implies independence between and the remaining model parameters. Chipman et al. (2010) further introduce the following factorization:
This structure indicates that the tree components are independent of each other and of the remaining parameters while the prior on depends on the tree structure.
The prior on the tree structure is specified along the lines suggested in Chipman et al. (2010). More precisely, we assume a tree generating stochastic process that follows three recursive steps per tree. Assume initially , then we have the following steps:

Start with the trivial tree that consists of a single terminal node (or only its root), which will be denoted by for each equation and .

Split the terminal node
with probability
(7) where denotes the depth of the tree, and are prior hyperparameters. In detail, a node at depth spawns children with and as the tree grows,
increases while the prior probability decreases, making it less likely that a node spawns children. This implies that the probability that a given node becomes a bottom node increases and thus a penalty on tree complexity is introduced.

If the current node is split, we introduce a splitting rule drawn from the distribution function and use this rule to spawn a left and right children node. In particular, the rule is set such that a given ”splitting” covariate is chosen with probability
. As described above, conditional on a chosen covariate, one needs to estimate a threshold. The prior on this threshold is, in the absence of substantial information, assumed to be uniformly distributed over the range of
. 
Once we obtain all terminal nodes (i.e. no node is split anymore), we will label this new tree and return to step (2).
On the terminal node parameter , we use a conjugate Gaussian prior distribution , where . Recall that is the number of trees and
is the number of prior standard deviations between
and . Hence, larger values of increasingly push towards zero.For the free elements in , we introduce a Horseshoe prior on each element of for every and
where and
denotes the halfCauchy distribution. The prior specification on the parameters of the logvolatility equation follows the setup proposed in
Kastner and FrühwirthSchnatter (2014). In details, we use a zero mean Gaussian prior with variance on the unconditional mean , a Beta prior on a transformation of the persistence parameter, and a Gamma prior on the error variance of the logvolatility process .These priors can be combined with the likelihood to yield a joint posterior distribution over the coefficients and latent states in the model. To simulate from this joint posterior distribution we adopt a MCMC algorithm. This algorithm simulates all quantities in an equationspecific manner. Since all steps necessary are standard, we only provide an overview. Appendix A provides more details. Here it suffices to say that we sample all quantities related to the trees (i.e. for all ) using the algorithm outlined in Chipman et al. (2010). The latent states and the coefficients associated with the state equation of the logvolatilities are obtained through the efficient algorithm discussed in Kastner and FrühwirthSchnatter (2014). Conditional on the trees and logvolatilities, the posterior of
is multivariate Gaussian and takes a standard form since the resulting conditional model is a linear regression model with heteroscedastic shocks. Finally, the parameters of the Horseshoe prior are simulated using techniques outlined in
Makalic and Schmidt (2015).4 Empirical Work
4.1 Data Overview
In this section we briefly discuss the dataset adopted. Instead of focusing on US data, we apply our approach to monthly Eurozone data that ranges from January 1999 to January 2019. Our dataset is a selection of time series from the popular Euro Area Real Time Database (EARTD).
For the insample results discussed in the next subsection, we consider a mediumscale model that includes six macroeconomic and financial time series. These six time series are augmented by an uncertainty index (abbreviated as Unc) which is estimated using the approach outlined in Jurado et al. (2015). Apart from the uncertainty indicator, we include the growth rate of the Dow Jones Euro Stoxx 50 price index (DJE50), HICP inflation (C_OV), unemployment rate (UNETO), 3month Euribor (EUR3M), industrial production (XCONS) and the yield on Euro area 10year benchmark government bonds (10Y).
In the forecasting application, we consider three different datasets. A small one that includes only HICP inflation, the unemployment rate and the 3month Euribor. These three variables will consequently form our set of focus variables. Moreover, we consider the mediumsized dataset discussed above but without the uncertainty index as well as a large dataset. The large dataset augments the mediumsized one by adding the real effective exchange rate (ERC_BGR), the M2 money base (M2_V_NC), the onemonth forward price of Brent crude oil (OILBR) and the spread between ten and two year government bond yields (YY).
All variables are transformed to be approximately stationary. We include one lag of the endogenous variables.^{1}^{1}1Using a single lag allows for simple inspection of several features of the model. Including more lags is straightforward but, as discussed in the text, lags larger than one only rarely show up in the splitting rules.
4.2 Insample Results
In this section, we illustrate key features of the BART approach using a subset of the dataset discussed in the previous section.
Unc  DJE50  EUR3M  C_OV  UNETO  XCONS  10Y  

Unc  31  10  11  6  9  14  9 
DJE50  11  40  9  9  6  10  10 
EUR3M  12  12  45  13  9  11  14 
C_OV  11  12  13  55  12  14  12 
UNETO  13  10  11  9  54  12  11 
XCONS  12  10  9  10  10  35  12 
10Y  11  10  5  5  7  11  41 
Variable importance is gauged by considering the posterior median of the number of times a given quantity shows up in a splitting rule. These frequencies are shown in Table 1. The columns refer to the different equations and the rows to the corresponding covariates. By simply inspecting the main diagonal elements of the table we directly observe that the first, own lag of a given variable within some equation appears to be important. This finding holds for all equations and resembles key results of the literature on Bayesian VARs which states that the AR(1) term explains most variation in . However, here it is worth stressing that our model is far from sparse and also the lags of other variables seem to play a role in determining the dynamics of a given endogenous variable.
In principle, we could also use Table 1 to construct a restricted VAR model. This can be achieved by introducing a threshold (such as including only variables that show up at least times). Using this (arbitrary) threshold would lead to a model that assumes uncertainty to depend only on its lagged value, the shortterm interest rate, the unemployment rate and industrial production. Similarly, stock returns only depend on lagged stock returns, shortterm interest rates and inflation. Shortterm interest rates would only depend on inflation and lagged interest rates while inflation only depends on lagged inflation and the short rate. The unemployment rate is determined exclusively by its own lag and (lagged) inflation, closely resembling a Phillips curvetype relationship. Furthermore, we find that industrial production growth depends on lagged inflation, unemployment, uncertainty and its lag. Finally, 10year bond yields depend on lagged bond yields, shortterm interest rates, inflation and industrial production. This simple example essentially shows that, conditional on choosing a suitable cutoff value, one can investigate the determinants of each equation in a multivariate model and potentially use these to construct a restricted linear VAR model.
4.3 Forecasting results
In this section, we investigate whether our combination of BART and VAR models yields better forecasts of shortterm interest rates, the unemployment rate and inflation. Before discussing the precise design of the forecast exercise, the question on how to compute the predictive distribution naturally arises. Producing onestepahead forecasts is computationally easy since conditional on the tree structures, splitting values and splitting covariates it is straightforward to compute a prediction for . More precisely (and with a slight abuse of notation), the onestepahead predictive distribution is given by:
where denotes the full history of and is a generic notation that summarizes all parameters and latent states (i.e. the tree structures, terminal nodes, logvolatilities etc.). The conditional density is:
is obtained by using (6) to predict the logvolatilities and using these predictions to form . Higher order forecasts are then computed iteratively by first simulating from the onestepahead predictive distributions to obtain a prediction for . This prediction is then used to form which is consequently used to predict . We repeat this procedure until we have a prediction with denoting the desired forecast horizon.
The design of the forecasting exercise is the following. Considering a holdout period that runs from to ( monthly observations), we use an initial estimation period that lasts until to produce the stepahead predictive distributions for . After these are obtained, we repeat the procedure until we reach the final point in our sample. The predictions are then evaluated by using relative mean squared forecast errors (MSFEs) and the continuous ranked probability score (CRPS). The MSFEs allow us to gauge the quality of point forecasts while the CRPS is used to evaluate the quality of the predictive distribution.
We compare our BAVART model to several commonly used multivariate time series models in the literature. First, we use a set of VAR models that feature different shrinkage priors. The first is the nonconjugate Minnesota prior that allows for asymmetric treatment of ”own” lags (defined as lags of a given endogenous variable in ) and ”other” lags (defined of the lags of the other variables in ). The corresponding shrinkage hyperparameters are estimated using a random walk Metropolis Hastings step. Since this prior is often used by practitioners, we use it as a baseline model to which we benchmark our models. Second, we use the stochastic search variable selection (SSVS) prior of George et al. (2008). The third prior is the NormalGamma prior proposed in Griffin and Brown (2010) that has been applied to VARs in Huber and Feldkircher (2019). Moreover, we use a TVPVAR with a NG shrinkage prior on the state innovation variances and the initial state of the system. Finally, all models are estimated using a stochastic volatility specification. Recall that all results are relative to the Minnesota VAR.
The results of our forecasting exercise are shown in Table 2. Our forecasting horse race draws a rich picture on relative model performance. We consider models of different sizes, priors, assumptions on the conditional mean and forecast horizons. For all these specifications, we provide a table that summarizes forecasting performance across the three focus variables and for point and density forecasts.
Small (M = 3)  Medium (M = 7)  Large (M = 10)  

BAVART  NG  SSVS  NGTVP  BAVART  NG  SSVS  NGTVP  BAVART  NG  SSVS  NGTVP  
Onestepahead  
EUR3M  1.021  0.977  0.984  1.014  1.033  0.987  0.985  1.009  1.018  0.977  0.977  1.005 
(0.924)  (0.984)  (0.987)  (0.997)  (0.958)  (0.992)  (0.99)  (0.996)  (0.951)  (0.983)  (0.981)  (0.985)  
UNETO  0.905  0.86  0.881  0.725  0.991  0.877  0.937  0.838  0.947  0.891  0.929  0.859 
(1.053)  (0.932)  (0.941)  (0.849)  (1.05)  (0.931)  (0.958)  (0.89)  (1.001)  (0.926)  (0.945)  (0.882)  
C_OV  0.922  1.006  1.016  1.108  0.932  0.998  1.002  1.034  0.933  1.009  1.007  1.04 
(0.878)  (1.005)  (1.012)  (1.102)  (0.899)  (0.996)  (0.997)  (1.022)  (0.909)  (1.007)  (1.002)  (1.025)  
Threestepsahead  
EUR3M  0.942  0.974  0.978  0.996  1.255  0.938  0.887  1.043  1.156  0.962  1.006  1.501 
(0.867)  (0.984)  (0.987)  (1.003)  (1.014)  (0.969)  (0.947)  (0.971)  (0.916)  (0.959)  (0.971)  (1.053)  
UNETO  0.858  0.939  0.937  0.757  2.321  1.444  1.439  4.592  1.883  1.786  1.769  3.357 
(0.811)  (0.965)  (0.959)  (0.82)  (1.55)  (1.074)  (1.079)  (1.547)  (1.449)  (1.11)  (1.13)  (1.6)  
C_OV  0.745  0.998  1.012  1.177  0.731  0.989  0.995  1.144  0.72  1.024  1.02  1.143 
(0.769)  (1)  (1.008)  (1.168)  (0.75)  (0.988)  (0.99)  (1.114)  (0.755)  (1.018)  (1.012)  (1.121) 
Starting with the onestepahead horizon and point forecasts, we observe that for the interest rate and unemployment, our BAVART setup (irrespective of model size) is dominated by linear VAR models in terms of MSFEs. More precisely, the smallscale SSVS (for interest rates) and TVPVARs (for unemployment) yield the lowest prediction errors. But notice that most of the ratios are close to one, indicating that the BAVART is never far off and thus highly competitive. Inflation appears to be an exception. Across all model sizes considered, we find that our model generally yields the smallest forecast errors. This suggests that controlling for nonlinearities pays off when interest centers on forecasting inflation.
Turning to higher order forecasts provides a relatively similar picture. For longerrun forecasts, we again identify models which improve upon the BAVART setup for interest rate and unemployment forecasts. These improvements, however, are again quite small. One striking difference between the onestep and the threestepsahead case is the strong performance of the Minnesota VAR for larger information sets and especially for unemployment. The results for inflation are again consistent with the findings for the onestepahead forecasts but appear to be more pronounced. Across model sizes, BAVART produces multistepahead inflation predictions which feature the lowest forecast errors across all models considered.
Considering only point forecasts imply that we do not factor in how well a given model predicts higher order moments of the predictive distribution. We evaluate density forecasts using CRPSs. These are a generalization of the MSFE and consider the qualities of the full predictive distribution under scrutiny. Similarly to the onestepahead point forecasts, we find that most values are below but close to unity. This suggests that using the BAVART model improves upon the Minnesota benchmark in most cases. When compared to the other models, we can again identify a model that improves upon BAVART for interest rates and unemployment. For inflation, the CRPSs corroborate the findings for the point forecasts. For all information sets, the BAVART model produces the most precise predictive densities.
When we consider threemonthsahead predictive densities, the pattern observed above remains mainly in place. While there are models which yield more precise density forecasts than our proposed setup, these differences are often very small. By contrast, our model dominates when we consider higherorder inflation predictions. In this case, the performance gains visavis the competing models are appreciable.
To sum up, our forecasting exercise shows that the BAVART model works remarkably well for forecasting Eurozone inflation. For unemployment and interest rates, our proposed specification is slightly outperformed but both point and density forecasts are still competitive.
4.4 Impulse Responses to an Uncertainty Shock
In this section we briefly illustrate how the BAVART model can be used to carry out structural inference. Since our model is dynamic, we can consider how a shock at time affects the quantities in the system over time. Because of the highly nonlinear nature of our model, we resort to structural generalized impulse response (SGIRF) functions (see Koop et al., 1996).
Let denote the structural shocks and denotes a selection vector that equals in the position. Hence, yields the structural shock. The SGIRF to the structural shock is then defined as the difference between a forecast that assumes (while setting the other shocks to zero) and the unconditional forecast (i.e. with ). The posterior distribution of the SGIRFs are computed by repeating this procedure during MCMC sampling.
Figure 3
shows the dynamic responses to a one standard deviation uncertainty shock. The dashed gray lines represent 16th and 84th posterior credible intervals. For brevity, we only report impulse responses that are significant (i.e. the credible intervals do not cover zero) for at least one month across the impulse response horizon.
The responses to an uncertainty shock largely mirror the ones reported elsewhere in the literature (see, among many others, Bloom, 2009; Jurado et al., 2015). The uncertainty index displays an immediate increase that fades out after the first five months. By contrast, we observe that inflation falls due to decreasing demand that drives down prices. Notice that prices react with a lag of one month. Since the uncertainty index is ordered first in this is purely driven by our shrinkage prior on the covariance parameters. This sluggish reaction of prices to uncertainty shocks reflects the fact that prices tend to be sticky and it takes some time for companies to adjust due to menu costs.
Turning to stock market reactions shows that equities display a rather short lived decline that fades out after around five months. Finally, real activity measured through industrial production also decreases. Interestingly, we do not observe a rebound in real activity arising from a ”waitandsee mechanism” reported in, e.g., Bloom (2009).
To sum up, this brief structural exercise shows that BART is capable of producing dynamic responses that are consistent with the ones commonly found in the literature.
5 Conclusions
VAR models assume that the lagged dependent variables influence the contemporaneous values in a linear fashion. In this paper, we relax this by blending the literature on Bayesian additive regression tree (BART) models and VARs. The resulting model, labeled the Bayesian additive vector autoregressive tree (BAVART) model, can handle nonlinear relations between the endogenous and the exogenous variables. Our proposed model is, moreover, capable of handling stochastic volatility in the shocks. This feature is crucial for obtaining precise density forecasts. We illustrate our model using synthetic as well as real data. Using data on the Eurozone shows that the BAVART model is capable of producing precise forecasts and that it yields reasonable impulse responses to an uncertainty shock.
References

Bassetti et al. (2014)
Bassetti, F., Casarin, R., and Leisen, F. (2014).
Betaproduct dependent Pitman–Yor processes for Bayesian inference.
Journal of Econometrics, 180(1):49–72.  Billio et al. (2019) Billio, M., Casarin, R., and Rossini, L. (2019). Bayesian nonparametric sparse VAR models. Journal of Econometrics, 212(1):97–115.
 Bitto and FrühwirthSchnatter (2019) Bitto, A. and FrühwirthSchnatter, S. (2019). Achieving shrinkage in a timevarying parameter model framework. Journal of Econometrics, 210(1):75–97.
 Bloom (2009) Bloom, N. (2009). The impact of uncertainty shocks. Econometrica, 77(3):623–685.
 Breiman (2001) Breiman, L. (2001). Random Forests. Machine Learning, 45(1):5–32.

Carriero et al. (2019)
Carriero, A., Clark, T. E., and Marcellino, M. (2019).
Large Bayesian vector autoregressions with stochastic volatility and nonconjugate priors.
Journal of Econometrics, 212(1):137–154.  Chan et al. (2020) Chan, J. C. C., Eisenstat, E., and Strachan, R. W. (2020). Reducing the state space dimension in a large TVPVAR. Journal of Econometrics.
 Chipman et al. (1998) Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93(443):935–948.
 Chipman et al. (2002) Chipman, H. A., George, E. I., and McCulloch, R. E. (2002). Bayesian Treed Models. Machine Learning, 48(1):299–320.
 Chipman et al. (2010) Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat., 4(1):266–298.
 Clark (2011) Clark, T. E. (2011). Realtime density forecasts from bayesian vector autoregressions with stochastic volatility. Journal of Business & Economic Statistics, 29(3):327–341.
 Clark and Ravazzolo (2015) Clark, T. E. and Ravazzolo, F. (2015). Macroeconomic forecasting performance under alternative specifications of timevarying volatility. Journal of Applied Econometrics, 30(4):551–575.
 Doan et al. (1984) Doan, T., Litterman, R., and Sims, C. (1984). Forecasting and conditional projection using realistic prior distributions. Econometric Reviews, 3(1):1–100.
 Freund and Schapire (1997) Freund, Y. and Schapire, R. E. (1997). A DecisionTheoretic Generalization of OnLine Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1):119–139.

Friedman (2001)
Friedman, J. H. (2001).
Greedy function approximation: A gradient boosting machine.
Ann. Statist., 29(5):1189–1232.  George et al. (2008) George, E. I., Sun, D., and Ni, S. (2008). Bayesian stochastic search for VAR model restrictions. Journal of Econometrics, 142(1):553–580.
 Gramacy and Lee (2008) Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian Treed Gaussian Process Models With an Application to Computer Modeling. Journal of the American Statistical Association, 103(483):1119–1130.
 Granger and Terasvirta (1993) Granger, C. W. and Terasvirta, T. (1993). Modelling nonlinear economic relationships. OUP Catalogue.
 Griffin and Brown (2010) Griffin, J. E. and Brown, P. J. (2010). Inference with normalgamma prior distributions in regression problems. Bayesian analysis, 5(1):171–188.
 Hill et al. (2020) Hill, J., Linero, A., and Murray, J. (2020). Bayesian Additive Regression Trees: A review and look forward. Annual Review of Statistics and Its Application, 7(1):251–278.
 Hjort et al. (2010) Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (2010). Bayesian Nonparametrics. Cambridge University Press, Cambridge.

Huber and Feldkircher (2019)
Huber, F. and Feldkircher, M. (2019).
Adaptive shrinkage in Bayesian vector autoregressive models.
Journal of Business & Economic Statistics, 37(1):27–39.  Huber et al. (2019) Huber, F., Kastner, G., and Feldkircher, M. (2019). Should I stay or should I go? A latent threshold approach to largescale mixture innovation models. Journal of Applied Econometrics, 34(5):621–640.
 Huber et al. (2020) Huber, F., Koop, G., and Onorante, L. (2020). Inducing sparsity and shrinkage in timevarying parameter models. Journal of Business & Economic Statistics, pages 1–15.
 Jurado et al. (2015) Jurado, K., Ludvigson, S. C., and Ng, S. (2015). Measuring uncertainty. American Economic Review, 105(3):1177–1216.
 Kalli and Griffin (2014) Kalli, M. and Griffin, J. E. (2014). Timevarying sparsity in dynamic regression models. Journal of Econometrics, 178(2):779–793.
 Kalli and Griffin (2018) Kalli, M. and Griffin, J. E. (2018). Bayesian nonparametric vector autoregressive models. Journal of Econometrics, 203(2):267–282.
 Kastner and FrühwirthSchnatter (2014) Kastner, G. and FrühwirthSchnatter, S. (2014). Ancillaritysufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statistics & Data Analysis, 76:408–423.
 Kastner and Huber (2020) Kastner, G. and Huber, F. (2020). Sparse Bayesian Vector Autoregressions in Huge Dimensions. Journal of Forecasting.
 Koop et al. (2019) Koop, G., Korobilis, D., and Pettenuzzo, D. (2019). Bayesian compressed vector autoregressions. Journal of Econometrics, 210(1):135–154.
 Koop et al. (1996) Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multivariate models. Journal of econometrics, 74(1):119–147.
 Lakshminarayanan et al. (2014) Lakshminarayanan, B., Roy, D. M., and Teh, Y. W. (2014). Mondrian Forests: Efficient Online Random Forests. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27, pages 3140–3148. Curran Associates, Inc.
 Linero (2018) Linero, A. R. (2018). Bayesian Regression Trees for HighDimensional Prediction and Variable Selection. Journal of the American Statistical Association, 113(522):626–636.
 Litterman (1986) Litterman, R. B. (1986). Forecasting with Bayesian Vector Autoregressions – Five Years of Experience. Journal of Business & Economic Statistics, 4(1):25–38.
 Makalic and Schmidt (2015) Makalic, E. and Schmidt, D. F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179–182.
 Pratola et al. (2019) Pratola, M. T., Chipman, H. A., George, E. I., and McCulloch, R. E. (2019). Heteroscedastic BART via Multiplicative Regression Trees. Journal of Computational and Graphical Statistics, pages 1–13.
 Primiceri (2005) Primiceri, G. E. (2005). Time Varying Structural Vector Autoregressions and Monetary Policy. The Review of Economic Studies, 72(3):821–852.
 Roy and Teh (2009) Roy, D. M. and Teh, Y. W. (2009). The Mondrian Process. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages 1377–1384. Curran Associates, Inc.
 Sims (1980) Sims, C. A. (1980). Macroeconomics and Reality. Econometrica, 48(1):1–48.
 Sims and Zha (1998) Sims, C. A. and Zha, T. (1998). Bayesian Methods for Dynamic Multivariate Models. International Economic Review, 39(4):949–968.
Appendix A Posterior Approximation of the Trees
In this appendix, we provide details on the MCMC algorithm used to simulate from the joint posterior distribution of the trees and the terminal node parameters. The remaining quantities take well known forms and are very easy to simulate using textbook results for the Gaussian linear regression model.
As stated in Section (3.3) and following Carriero et al. (2019), Koop et al. (2019) and Kastner and Huber (2020), we rely on a structural representation of the model that entails equationbyequation estimation.
We simulate from the conditional posterior of by conditioning on all trees except for the tree. This is achieved by using the Bayesian backfitting strategy discussed in Chipman et al. (2010). More precisely, the likelihood function depends on through the partial residuals
The algorithm proposed in Chipman et al. (2010) then proceeds by integrating out and then drawing with MetropolisHastings (MH) algorithm detailed in Chipman et al. (1998).
This step is implemented by generating a candidate tree from a proposal distribution and then accept the proposed value with probability:
The first term is the ratio of the probability of how the previous tree moves to the new tree against the probability of how the new tree moves to the previous one. The second term is the likelihood ratio of the new tree against the previous tree and the last term denotes the likelihood ratio of the new against the previous tree under the prior distribution.
The proposal distribution features four local steps (Chipman et al., 1998). The first is step grows the tree by splitting a node into two different nodes. The second step combines two nonterminal nodes into a single terminal node. The third step swaps splitting rules between two terminal nodes and the final step changes the splitting rule of a single nonterminal node.
After sampling the tree structure we can easily simulate from the posterior distribution of the terminal nodes. Since the prior is conjugate the resulting posterior will also be a Gaussian distribution that takes a standard form and does not explicitly depend on the tree structure but only on the subset of elements in the residuals allocated to a specific terminal node.
Comments
There are no comments yet.