Large Hybrid Time-Varying Parameter VARs

01/18/2022
by   Joshua C. C. Chan, et al.
0

Time-varying parameter VARs with stochastic volatility are routinely used for structural analysis and forecasting in settings involving a few endogenous variables. Applying these models to high-dimensional datasets has proved to be challenging due to intensive computations and over-parameterization concerns. We develop an efficient Bayesian sparsification method for a class of models we call hybrid TVP-VARs–VARs with time-varying parameters in some equations but constant coefficients in others. Specifically, for each equation, the new method automatically decides whether the VAR coefficients and contemporaneous relations among variables are constant or time-varying. Using US datasets of various dimensions, we find evidence that the parameters in some, but not all, equations are time varying. The large hybrid TVP-VAR also forecasts better than many standard benchmarks.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/09/2018

Variational Bayes inference in high-dimensional time-varying parameter models

This paper proposes a mean field variational Bayes algorithm for efficie...
04/23/2020

High-dimensional macroeconomic forecasting using message passing algorithms

This paper proposes two distinct contributions to econometric analysis o...
06/07/2021

Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling

Both Bayesian and varying coefficient models are very useful tools in pr...
01/09/2021

On the numerical solution of stochastic oscillators driven by time-varying and random forces

In this work, we provide a specifc trigonometric stochastic numerical me...
09/01/2020

Time-Varying Parameters as Ridge Regressions

Time-varying parameters (TVPs) models are frequently used in economics t...
11/04/2014

Vector Autoregressions with Parsimoniously Time Varying Parameters and an Application to Monetary Policy

This paper proposes a parsimoniously time varying parameter vector autor...
09/24/2021

A unified theory for ARMA models with varying coefficients: One solution fits all

For the large family of ARMA models with variable coefficients (TV-ARMA)...

1 Introduction

Time-varying parameter vector autoregressions (TVP-VARs) developed by Cogley and Sargent (2001, 2005) and Primiceri (2005) have become the workhorse models in empirical macroeconomics. These models are flexible and can capture many different forms of structural instabilities and the evolving nonlinear relationships between the dependent variables. Moreover, they often forecast substantially better than their homoskedastic or constant-coefficient counterparts, as shown in papers such as Clark (2011), D’Agostino, Gambetti, and Giannone (2013), Koop and Korobilis (2013), Clark and Ravazzolo (2015) and Cross and Poon (2016). In empirical work, however, their applications are mostly limited to modeling small systems involving only a few variables because of the computational burden and over-parameterization concerns.

On the other hand, large VARs that use richer information have become increasingly popular due to their better forecast performance and more sensible impulse-response analysis, as demonstrated in the influential paper by Banbura, Giannone, and Reichlin (2010). There is now a rapidly expanding literature that uses large VARs for forecasting and structural analysis. Prominent examples include Carriero, Kapetanios, and Marcellino (2009), Koop (2013), Banbura, Giannone, Modugno, and Reichlin (2013), Carriero, Clark, and Marcellino (2015), Ellahie and Ricco (2017) and Morley and Wong (2019). Since there is a large body of empirical evidence that demonstrates the importance of accommodating time-varying structures in small systems, there has been much interest in recent years to build TVP-VARs for large datasets. While there are a few proposals to build large constant-coefficient VARs with stochastic volatility (see, e.g., Carriero, Clark, and Marcellino, 2016, 2019; Kastner and Huber, 2018; Chan, 2020, 2021), the literature on large VARs with time-varying coefficients remains relatively scarce.

We propose a class of models we call hybrid TVP-VARs—VARs in which some equations have time-varying coefficients, whereas the coefficients are constant in others. More precisely, we develop an efficient Bayesian shrinkage and sparsification method that automatically decides, for each equation, (i) whether the VAR coefficients are constant or time-varying, and (ii) whether the parameters of the contemporaneous relations among variables are constant or time-varying. Given the importance of time-varying volatility, all equations feature stochastic volatility. Our framework nests many popular VARs as special cases, ranging from a constant-coefficient VAR with stochastic volatility on one end of the spectrum to the flexible but highly parameterized TVP-VARs of Cogley and Sargent (2005) and Primiceri (2005) on the other end. More importantly, our framework also includes many hybrid TVP-VARs in between the extremes, allowing for a more nuanced modeling approach of the time-varying structures.

To formulate these large hybrid TVP-VARs, we use a reparameterization of the standard TVP-VAR in Primiceri (2005). Specifically, we rewrite the TVP-VAR in the structural form in which the time-varying error covariance matrices are diagonal. Hence, we can treat the structural TVP-VAR as a system of

unrelated TVP regressions and estimate them one by one. This reduces the dimension of the problem and can substantially speed up computations. This approach is similar to the equation-by-equation estimation approach in

Carriero, Clark, and Marcellino (2019) that is designed for the reduced-form parameterization. But since under our parameterization there is no need to obtain the ‘orthogonalized’ shocks at each iteration as in Carriero, Clark, and Marcellino (2019), the proposed approach is substantially faster. Moreover, under our parameterization the estimation can be parallelized to further speed up computations. This structural-form parameterization, however, raises the issue of variable ordering, that is, the assumed order of the variables might affect the model estimates compared to a standard reduced-form TVP-VAR. We investigate this issue empirically and find that the variability of the estimates from this structural-form parameterization is comparable to that of the TVP-VAR of Primiceri (2005).

Next, we adapt the non-centered parameterization of the state space model in Frühwirth-Schnatter and Wagner (2010) to our structural TVP-VAR representation. Further, for each equation we introduce two indicator variables, one determines whether the VAR coefficients are time-varying or constant, while the other controls whether the elements of the impact matrix are time-varying or not. Hence, each vector , where is the number of endogenous variables, characterizes a hybrid TVP-VAR with a particular form of time variation. By treating these indicators as parameters to be estimated, we allow the data to determine the appropriate time-varying structures, in contrast to typical setups where time variation is assumed. The proposed approach therefore is not only flexible—it includes many state-of-the-art models routinely used in applied work as special cases—it also induces parsimony to ameliorate over-parameterization concerns. This data-driven hybrid TVP-VAR can also be interpreted as a Bayesian model average of

hybrid TVP-VARs with different forms of time variation, where the weights are determined by the posterior model probabilities

. It follows that forecasts from such a model can be viewed as a forecast combination of a wide variety of hybrid TVP-VARs.

The estimation is done using Markov chain Monte Carlo (MCMC) methods. Hence, in contrast to earlier attempts to build large TVP-VARs, our approach is fully Bayesian and is exact—it simulates from the exact posterior distribution. There are, however, a few challenges in the estimation. First, the dimension of the model is large and there are thousands of latent state processes—time-varying coefficients and stochastic volatilities—to simulate. To overcome this challenge, in addition to using the equation-by-equation estimation approach described earlier, we also adopt the precision sampler of

Chan and Jeliazkov (2009)

to draw both the time-invariant and time-varying VAR coefficients, as well as the stochastic volatilities. In our high-dimensional setting the precision sampler substantially reduces the computational cost compared to conventional Kalman filter based smoothers. A second challenge in the estimation is that the indicators and the latent states enter the likelihood multiplicatively. Consequently, it is vital to sample them jointly; otherwise the Markov chain is likely to get stuck. We therefore develop algorithms to sample the indicators and the latent states jointly.

Using US datasets of different dimensions, we find evidence that the VAR coefficients and elements of the impact matrix in some, but not all, equations are time varying. In particular, in a formal Bayesian model comparison exercise, we show that there is overwhelming support for the (data-driven) hybrid TVP-VAR relative to a few standard benchmarks, including a constant-coefficient VAR with stochastic volatility and a full-fledged TVP-VAR in which all the VAR coefficients and error covariances are time varying. We further illustrate the usefulness of the hybrid TVP-VAR with a forecasting exercise that involves 20 US quarterly macroeconomic and financial variables. We show that the proposed model forecasts better than many benchmarks. These results suggest that using a data-driven approach to discover the time-varying structures—rather than imposing either constant coefficients or time-varying parameters—is empirically beneficial.

This paper contributes to the budding literature on developing large TVP-VARs. Earlier papers include Koop and Korobilis (2013, 2018), who propose fast methods to approximate the posterior distributions of large TVP-VARs. Banbura and van Vlodrop (2018) and Götz and Hauzenberger (2018) consider large VARs with only time-varying intercepts. Chan, Eisenstat, and Strachan (2020) model the time-varying coefficients using a factor-like reduced-rank structure, whereas Huber, Koop, and Onorante (2019) develop a method that first shrinks the time-varying coefficients, followed by setting the small values to zero. As mentioned above, our estimation approach is exact and fully Bayesian, and the modeling framework is more flexible than many of those in earlier papers. There is also a growing literature on alternative, non-likelihood based approaches. Examples include Giraitis, Kapetanios, and Price (2013) and Petrova (2019) that allow for the estimation of large TVP-VARs without imposing the Cholesky-type stochastic volatility, and hence they avoid the ordering issue. Nevertheless, one main advantage of the likelihood-based approach taken in this paper is that it is flexible and modular. In particular, it is straightforward to incorporate additional useful features into the proposed hybrid model, such as more sophisticated static and dynamic shrinkage priors for VARs (Prüser, 2021; Chan, 2021)

or more flexible error distributions to deal with outliers

(Carriero, Clark, Marcellino, and Mertens, 2021; Bobeica and Hartwig, 2021).

The rest of the paper is organized as follows. We first introduce the proposed modeling framework in Section 2. In particular, we discuss how we combine a reparameterization of the reduced-form TVP-VAR and the non-centered parameterization of the state space model to develop the hybrid TVP-VARs. We then describe the shrinkage priors and the posterior sampler in Section 3. It is followed by a Monte Carlo study in Section 4 that demonstrates that the proposed methodology works well and can select the correct time-varying or time-invariant structure. The empirical application is discussed in detail in Section 5. Lastly, Section 6 concludes and briefly discusses some future research directions.

2 Hybrid TVP-VARs

We first introduce a class of models we call hybrid time-varying parameter VARs: VARs in which some equations have time-varying coefficients, whereas coefficients in other equations remain constant. To that end, let be an vector of endogenous variables at time . The TVP-VAR of Primiceri (2005) can be reparameterized in the following structural form:

(1)

where is an vector of time-varying intercepts, are VAR coefficient matrices, is an lower triangular matrix with ones on the diagonal and . The law of motion of the VAR coefficients and log-volatilites will be specified below. Since the system in (1) is written in the structural form, the covariance matrix is diagonal by construction. Consequently, we can estimate this recursive system equation by equation without loss of efficiency.

We note that Carriero, Clark, and Marcellino (2019) pioneer a similar equation-by-equation estimation approach for a large reduced-form constant-coefficient VAR with stochastic volatility. The main advantage of the structural-form representation is that it allows us to rewrite the VAR as unrelated regressions, and it leads to a more efficient sampling scheme. The main drawback of this representation, however, is that the implied reduced-form estimates depend on how the variables are ordered in the system. We will investigate the extent to which these estimates depend on the ordering in Section 5.2.

2.1 An Equation-by-Equation Representation

It is convenience to introduce some notations. Let denote the -th element of and let represent the -th row of . Then, is the intercept and VAR coefficients of the -th equation and is of dimension with . Moreover, let denote the free elements in the -th row of the contemporaneous impact matrix for . That is, is of dimension with . Then, the -th equation of the system in (1) can be rewritten as:

where and . Note that depends on the contemporaneous variables . But since the system is triangular, when we perform the change of variables from to to obtain the likelihood function, the density function remains Gaussian.

If we let , we can further simplify the -th equation as:

(2)

where is of dimension Hence, we have rewritten the TVP-VAR in (1) as unrelated regressions. Finally, the coefficients and log-volatilities are assumed to evolve as independent random walks:

(3)
(4)
(5)

where the initial conditions and are treated as unknown parameters to be estimated. The system in (2)–(5) specifies a reparameterization of a standard TVP-VAR in which all equations have time-varying parameters and stochastic volatility.

Note that the innovations in (3)-(5) are assumed to be independent across equations. This assumption is partly motivated by the concern of proliferation of correlation parameters, especially when is large, if the correlations of the innovations are unrestricted. In addition, for and , this independence assumption is important for extending the setup later so that we can turn on and off the time variation in both equations. In contrast, it is feasible to allow the innovations to to be correlated across equations (with a slight increase of computational cost). In preliminary work we considered such an extension. While the estimation results suggest that the correlation parameters are sizable, this extension leads to only very modest forecast gains (see Appendix D for details). Therefore, in what follows we maintain the independence assumption in (3)-(5) as the baseline.

2.2 The Non-Centered Parameterization

Next, we introduce a framework that allows the model to determine in a data-driven fashion whether the VAR coefficients and the contemporaneous relations among the endogenous variables in each equation are time varying or constant. For that purpose, we adapt the non-centered parameterization of Frühwirth-Schnatter and Wagner (2010) to our hybrid TVP-VARs. More specifically, for we consider the following model:

(6)
(7)
(8)
(9)

where and . Here and are indicator variables that take values of either 0 or 1.

The model in (6)-(9) includes a wide variety of popular VAR specifications. For example, assuming that all indicators take the value of 1, the above model is just a reparameterization of the TVP-VAR in (2)–(5). To see that, define and . Then, when , it is clear that (6) becomes (2). In addition, we have

Hence, and follow the same random walk processes as in (3) and (4), respectively. We have therefore shown that when , the proposed model reduces to a TVP-VAR with stochastic volatility.

For the intermediate case where and , the proposed model reduces to a structural-form reparameterization of the model in Cogley and Sargent (2005), i.e., a TVP-VAR with stochastic volatility but the contemporaneous relations among the endogenous variables are restricted to be constant. In the extreme case where , the proposed model then becomes a constant-coefficient VAR with stochastic volatility—a reparameterization of the specification in Carriero, Clark, and Marcellino (2019). More generally, by allowing the indicators and to take different values, we can have a VAR in which only some equations have time-varying parameters. Note that it is straightforward to include a few additional indicators to allow for more flexible forms of time variation. For example, one can replace with two indicators, say, and , which control the time variation in the elements of that correspond to coefficients on own lags and lags of other variables, respectively. The posterior simulator in Section 3.2 can be modified to handle this case, with a slight increase in computation time.

These indicators are not fixed but are estimated from the data. More precisely, we specify that each

follows an independent Bernoulli distribution with success probability

. Similarly for : . These success probabilities and , are in turn treated as parameters to be estimated. In contrast to typical setups where time variation in parameters is assumed (e.g. Cogley and Sargent, 2001, 2005; Primiceri, 2005), here the proposed model puts positive probabilities in simpler models in which the VAR coefficients and the contemporaneous relations among the variables are constant. The values of the indicators are determined by the data, and these time-varying features are turned on only when they are warranted. The proposed model therefore is not only flexible in the sense that it includes a wide variety of specifications popular in applied work as special cases, it also induces parsimony to combat over-parameterization concerns.

2.3 An Exploration of the Model Space

The proposed hybrid TVP-VAR can also be viewed as a Bayesian model average of a wide variety TVP-VARs with different forms of time variation. To see that, let denote the vector of indicator variables with . Note that each value of corresponds to a particular TVP-VAR in which the time variation of the -th equation is characterized by . For example, corresponds to a constant-coefficient VAR with stochastic volatility. Then, the posterior distribution of any model parameters under the proposed model can be represented as the posterior average with respect to , i.e., the posterior model probabilities of the collection of TVP-VARs with different forms of time variation, where

denotes the data. For example, the joint distribution of

and , the time-varying VAR coefficients and free elements of the contemporaneous impact matrix, can be represented as

For a small VAR with variables (and the additional assumption that ), Chan and Eisenstat (2018b) estimate all TVP-VARs and the corresponding posterior model probabilities. For larger , this approach of computing and sampling from for all possible models is clearly infeasible. In contrast, by including the model indicator in the estimation, we simultaneously explore the parameter space and the model space. This latter approach is convenient and computationally feasible for large systems.

It is also instructive to investigate how the value of the model indicator is determined by the data. To fix ideas, suppose we wish to compare two TVP-VARs, represented as and . Let denote the marginal likelihood under model , i.e.,

(10)

where is the collection of model-specific time-invariant parameters and time-varying states (in our setting these parameters and states are common across models and ), is the (complete-data) likelihood and

is the prior density. Then, the posterior odds ratio in favor of model

against model is given by:

where is the prior odds ratio. It follows that if both models are equally probable a priori, i.e.,

, the posterior odds ratio between the two models is then equal to the ratio of the two marginal likelihoods, or the Bayes factor. More generally, under the assumption that each TVP-VAR has the same prior probability, the value of the model indicator

is determined by the marginal likelihood . That is, if the TVP-VAR represented by forecasts the data better (as one-step-ahead density forecasts), the value will have a higher weight.

3 Priors and Bayesian Estimation

In this section we first describe in detail the priors on the time-invariant parameters. We then outline the posterior simulator to estimate the model described in (6)–(9)

3.1 Priors

For notational convenience, stack , , and over , and collect , , and over , and similarly define and . Furthermore, let and . In our model, the time-invariant parameters are , , , , , and . Below we give the details of the priors on these time-invariant parameters.

Since , the initial conditions of the VAR coefficients, is high-dimensional when is large, appropriate shrinkage is crucial. We assume a Minnesota-type prior on along the lines in Sims and Zha (1998); see also Doan, Litterman, and Sims (1984), Litterman (1986) and Kadiyala and Karlsson (1997). We refer the readers to Koop and Korobilis (2010), Del Negro and Schorfheide (2012) and Karlsson (2013) for a textbook discussion of the Minnesota prior. More specifically, consider , where the prior mean is set to be zero when the variables are in growth rate to induce shrinkage and the prior covariance matrix is block-diagonal with —here is the prior covariance matrix for . For each we in turn assume it to be diagonal with the -th diagonal element set to be:

where

denotes the sample variance of the residuals from regressing

on , . Here the prior covariance matrix

depends on four hyperparameters

and —that control the degree of shrinkage for different types of coefficients. For simplicity, we set and . These values imply moderate shrinkage for the coefficients on the contemporaneous variables and no shrinkage for the intercepts.

The remaining two hyperparameters are and , which control the overall shrinkage strength for coefficients on own lags and those on lags of other variables, respectively. Departing from Sims and Zha (1998), here we allow and to be different, as one might expect that coefficients on lags of other variables would be on average smaller than those on own lags. In fact, Carriero, Clark, and Marcellino (2015) and Chan (2021) find empirical evidence in support of this so-called cross-variable shrinkage. In addition, we treat and as unknown parameters to be estimated rather than fixing them to some subjective values. This is motivated by a few recent papers, such as Carriero, Clark, and Marcellino (2015) and Giannone, Lenza, and Primiceri (2015), which show that by selecting this type of overall shrinkage hyperparameters in a data-based fashion, one can substantially improve the forecast performance of the resulting VAR. In addition, this data-based Minnesota prior is also found to forecast better than many recently introduced adaptive shrinkage priors such as the normal-gamma prior, the Dirichlet-Laplace prior and the horseshoe prior. For example, this is demonstrated in a comprehensive forecasting exercise in Cross, Hou, and Poon (2020).

We assume gamma priors for the hyperparameters and : . We set , and . These values imply that the prior modes are at zero, which provides global shrinkage. The prior means of and are 0.04 and respectively, which are the fixed values used in Carriero, Clark, and Marcellino (2015). Next, following Frühwirth-Schnatter and Wagner (2010), the square roots of the diagonal elements of

are independently distributed as mean 0 normal random variables:

. We assume each follow a conventional inverse-gamma priors: . The success probabilities and

are assumed to have beta distributions:

and . Finally, the elements of the initial condition are assumed to be Gaussian: .

3.2 The Posterior Simulator

We now turn to the estimation of the model in (6)–(9) given the prior described in the previous section. There are a few challenges in the estimation. First, since becomes degenerate when , making its sampling nonstandard (similarly for ). To sidestep this problem, we will use the parameterization in terms of and . Then, given the posterior draws of , and other parameters, we can recover the posterior draws of and using the definitions and .

Second, since and the indicator enter the likelihood in (6) multiplicatively, it is vital to sample them jointly (similarly for and ); otherwise the Markov chain might get stuck. To see this, consider a simpler sampling scheme in which we simulate given , followed by sampling given . Suppose in the last iteration. Given , does not enter the likelihood and we simply sample it from its state equation. Since the sampled has no relation to the data, the implied time variation in the VAR coefficients would not match the data. Consequently, it is highly likely that the model would prefer no time variation, i.e., . Hence, it is unlikely for the Markov chain to move away from once it is there. It is therefore necessary to sample both and in the same step. In addition, since the pair and enters the likelihood additively, we sample them jointly to further improve efficiency.

Next, define with . Then, one can simulate from the joint posterior distribution using the following posterior sampler that sequentially samples from:

  1. , ;

  2. ;

  3. ,;

  4. , ;

  5. , ;

  6. , ;

  7. .

Step 2 to Step 7 mainly involve standard sampling techniques and we leave the details to Appendix A. Here we focus on the first step.

Step 1. We sample the four blocks of parameters and jointly to improve efficiency. This is done by first drawing the indicators marginally of —but conditional on other parameters—and then sample and from their joint conditional distribution. The latter of these two steps is straightforward because given and , we have a linear Gaussian state space model in . Specifically, we stack the observation equation (6) over :

where ,

Here note that the matrix depends on the indicators . Next, stack the state equations (7)-(8) over :

where is the first difference matrix of dimension . Since is a square matrix with unit determinant, it is invertible. It then follows that

Finally, using standard linear regression results, we have

(11)

where

(12)

Since the precision matrix is a band matrix, one can sample efficiently using the algorithm in Chan and Jeliazkov (2009).

To sample marginal of , it suffices to compute the four probabilities that and . To that end, note that

where both the conditional likelihood and the prior density are Gaussian. It turns out that the above integral admits an analytical expression. In fact, using a similar derivation in Chan and Grant (2016), one can show that

(13)

where and are defined in (12). Then, one can compute the relevant probabilities using the expression in (13). For example, when , and . It follows that

Similarly, we have

where and denote respectively and evaluated at . The probabilities that and can be computed similarly. A draw from this 4-point distribution is standard once we normalize the probabilities. The details of the remaining steps are provided in Appendix A.

4 A Monte Carlo Study

In this section we first conduct a series of simulated experiments to assess how well the posterior sampler works in recovering the time-varying structure in the data generating process. We then document the runtimes of estimating the hybrid TVP-VARs of different dimensions to assess how well the posterior sampler scales to larger systems.

First, we generate 300 datasets from the hybrid VAR in (6)–(9) with variables and sample size or . We set the vector of indicators by repeating the four combinations three times — that allows us to study the effect of different combinations of time-varying pattens as well as their positions in the system. We generate

, the initial conditions of the VAR coefficients, stochastically as follows. The intercepts are drawn independently from the uniform distribution on the interval

, i.e., . For the VAR coefficients, the diagonal elements of the first VAR coefficient matrix are iid and the off-diagonal elements are from . All other elements of the -th () VAR coefficient matrices are iid Finally, the elements of are drawn independently from .

If the coefficient is time-varying (i.e., the associated indicator or is 1), it is generated from the state equation (3) or (4) with if is a VAR coefficient and if it is an intercept for . Finally, for the log-volatility processes, we draw and set .

In the Monte Carlo study we use the priors described in Section 3.1 with the following hyperparameters. The prior means of the initial conditions and are set to be zero and , and the prior covariance matrix of is . The hyperparameter of is set so that the implied prior mean of is if it is associated with a VAR coefficient and for an intercept. Finally, we set the hyperparameters of and to be . These values imply that the prior modes are at 0 and 1, whereas the prior mean is 0.5.

Given a dataset and the priors described above, we estimate the hybrid VAR using the posterior sampler in Section 3.2 and obtain the posterior mode of . We repeat this procedure for all the datasets and compute the frequencies of and being one, . The results are reported in Table 1.

Overall, the posterior sampler works well and is able to recover the true time-varying structure in the simulated data on average. While it is harder to pin down the correct value of compared to , the frequencies of identifying the true value of are still reasonably good. In addition, these results substantially improve when the sample size increases from to . All in all, these Monte Carlo results confirm that the proposed hybrid model can recover salient patterns—such as time-varying conditional means and covariances—in the data.

To further investigate the effect of the beta prior on and , we repeat the Monte Carlo experiments but assume a uniform prior on the unit interval , i.e., Hence, both the prior means and modes are 0.5. The Monte Carlo results are similar to the baseline case and they are reported in Appendix D.

Equation True True
1 0 0.06 0.05
2 0 1 0.04 0.88 0.05 0.93
3 1 0 0.98 0.25 1.00 0.12
4 1 1 0.98 0.64 1.00 0.75
5 0 0 0.02 0.02 0.02 0.00
6 0 1 0.03 0.96 0.04 0.98
7 1 0 0.97 0.13 1.00 0.03
8 1 1 0.95 0.80 1.00 0.94
9 0 0 0.03 0.00 0.02 0.01
10 0 1 0.04 0.94 0.10 0.99
11 1 0 0.94 0.11 1.00 0.02
12 1 1 0.93 0.88 0.99 0.96
Table 1: Frequencies (%) of the posterior modes of and being one in 300 datasets.

Next, we document the runtimes of estimating the hybrid TVP-VARs of different sizes to assess how well the posterior sampler scales to higher dimensions. More specifically, Table 2 reports the runtimes (in minutes) to obtain 1,000 posterior draws from the hybrid models with variables and time periods. The posterior sampler is implemented in on a standard desktop with an Intel Core i7-7700 @3.60 GHz processor and 64 GB memory. As a comparison, we also include the corresponding runtimes of fitting the TVP-VAR of Primiceri (2005) using the algorithm in Del Negro and Primiceri (2015). Note that the algorithm in Del Negro and Primiceri (2015) samples all the time-varying VAR coefficients in one block and it tends to be very computationally intensive for larger systems. One potential solution is to develop an equation-by-equation estimation procedure similar to that in Carriero, Chan, Clark, and Marcellino (2021). Since the algorithm is designed for models with a constant contemporaneous impact matrix, extending it to handle the TVP-VAR of Primiceri (2005)—which features a time-varying contemporaneous impact matrix—would be an interesting future research direction.

Hybrid TVP-VAR 4 29 94 8 59 188
Primiceri (2005) 12 209 25 415
Table 2: The runtimes (in minutes) to obtain 1,000 posterior draws from the hybrid TVP-VAR with variables and time periods. All VARs have lags.

It is evident from the table that for typical applications with 15-30 variables, the proposed model can be estimated reasonably quickly. In addition, using the recursive representation that admits straightforward equation-by-equation estimation, fitting the proposed model is much faster than estimating the TVP-VAR of Primiceri (2005), even though the former is more flexible.

5 Application: Model Comparison and Forecasting

In this section we fit a large US macroeconomic dataset set to demonstrate the usefulness of the proposed model. After describing the dataset in Section 5.1, we first investigate how different variable orderings affect the estimates from the proposed hybrid TVP-VAR relative to the TVP-VAR of Primiceri (2005) in Section 5.2. We then present the full sample results in Section 5.3. In particular, we conduct a formal Bayesian model comparison exercise to shed light on the time-varying patterns of the model parameters. We then consider a pseudo out-of-sample forecasting exercise in Section 5.4. We show that the forecast performance of the proposed model compares favorably to a range of standard benchmarks.

5.1 Data and Prior Hyperparameters

The US dataset for our empirical application consists of 20 quarterly variables with a sample period from 1959Q1 to 2018Q4. It is sourced from the FRED-QD database at the Federal Reserve Bank of St. Louis as described in McCracken and Ng (2021). Our dataset contains a variety of standard macroeconomic and financial variables, such as Real GDP, industrial production, inflation rates, labor market variables, money supply and interest rates. They are transformed to stationarity, typically to annualized growth rates. The complete list of variables and how they are transformed is given in Appendix C.

We use the priors described in Section 3.1. In particular, since the data are transformed to growth rates, we set the prior mean of to be zero, i.e., . For the prior hyperparameters on and , we set , and . These values imply that the prior means of and are respectively 0.04 and . For the hyperparameters of the initial conditions