1 Introduction
Over the last decade, the availability of large datasets in economics and finance has allowed the introduction of high dimensional models. In particular, large datasets in macroeconomics help to improve the forecasts, while in finance some authors have investigated the use of large datasets to analyse financial crises, contagion effects and their impact on the real economy. In order to deal with high dimensional models, the introduction of Bayesian nonparametric techniques have become popular in different fields (such as statistics and machine learning), but only few attempts have been made in econometrics. In particular, Bayesian nonparametric approach allows to improve the estimation efficiency and the prediction accuracy in time series analysis.
Recently, timevarying parameter (TVP) models provide an interesting alternative to process multiple change points; for example, timevarying structural vector autoregressive (VAR) models have been used in Primiceri (2005) for study monetary policy application; Dangl and Halling (2012) forecast equity returns by mean of TVP models; and in Belmonte et al. (2014) the European inflation has been studied via a timevarying parameters model. As shown in Primiceri (2005), Del Negro and Primiceri (2015) and Bitto and FrühwirthSchnatter (2019), the advantage in capturing gradual changes is due to the flexibility of TVP models. We combine the ideas behind timevarying parameters models and Bayesian nonparametric techniques, thus allowing to model complex phenomena in a flexible and efficient manner. Moreover, we provide an innovative Bayesian nonparametric timevarying graphical framework for making inference in highdimensional time series.
In this paper, we allow coefficients to be sparse, meaning that only a fraction of the time varying parameters have significant effects, but we retain flexibility in modelling nonzero coefficients, by including temporal dependence in the prior structure. In order to achieve these goals, we define a shrinkage prior on the VAR coefficients by means of a Bayesian nonparametric prior (BNP). This distribution is a spikeandslab prior, where on the spike (parametric) distribution, we impose two different specifications: a Dirac spike and a “diffuse” spike. On the other hand, on the (nonparametric) slab distribution, we use a well know Bayesian nonparametric Lasso prior as in Billio et al. (2019).
The prior previously described groups the timevarying parameter vector autoregressive (TVPVAR) coefficients into clusters and shrinks the coefficients within a cluster toward common notation. Differently from Markovswitching approach (Krolzig, 1997) and random walk processes (Primiceri, 2005; Del Negro and Primiceri, 2015), we impose time variation on the distribution of the VAR coefficients. In the literature of timevarying coefficients, the VAR coefficients are represented as a direct dependence, in practice they can be represented as statespace models, where they are functions of the previous time. On the other hand, we introduce a different structure, the indirect dependence on the VAR coefficients. In this case, we have a dependence construct through the atoms of the Dirichlet process and not on a direct way. Thus, we include a Bayesian nonparametric dependent prior specification on the VAR coefficients and the covariance matrix by means of a timeseries dependent Dirichlet process (tsDDP) as in NietoBarajas et al. (2012).
Following Billio et al. (2019), our hierarchical prior overcomes overparametrization and overfitting issues by clustering the VAR coefficients into groups and by shrinking the coefficients of each group toward a common location. This hierarchical prior allows to contemporaneously estimate the (potentially) sparse timevarying causal network structure and to cluster the corresponding coefficients. In our BNPTVPVAR model, timevarying coefficients allow to (i) estimate the temporal networks of contemporaneous and causal structures, (ii) identify different sources of time variation, from the size of shocks and/or the propagation mechanism, and (iii) accommodate for potential nonlinearities.
We also contribute to the literature on financial and macroeconomic contagion (see Billio et al. (2012); Bianchi et al. (2019) and Barigozzi and Brownlees (2019)) through the lens of Granger causality and graphs/networks representation. Our BNP prior is particularly suited for studying Granger causality from time series and in particular it allows to estimate the most significant timevarying dependence interrelationships between the variables of interest. As explained above, we can extract timevarying graphs by using the posterior random partition induced by the nonparametric (slab) distribution, which allows to cluster the edges into groups.
1.1 Literature
Since their introduction in macroeconomics (see Sims (1980)), vector autoregessive (VAR) models have been extensively used in econometrics and time series statistics. Large VAR models have been used to analyse and forecast highdimensional macroeconomic data (e.g., McCracken and Ng (2016)) and financial panels (e.g., Barigozzi and Brownlees (2019)). Moreover, in recent years VAR models have been used for studying financial and macroeconomic contagion (e.g., Cogley and Sargent (2005), Stock and Watson (2007), Diebold and Yilmaz (2012) and Bianchi et al. (2019)). Although, VAR models have been extensively used for assessing the impact and spread of external shocks (i.e., to perform impulseresponse analysis), forecasting, estimating networks from Grangercausal relationships and to study systemic risk and financial contagion (e.g., Diebold and Yilmaz (2009), Billio et al. (2012) and Barigozzi and Brownlees (2019)).
Despite being a potentially very flexible statistical tools, the high number of parameters and the typical limited length of standard macroeconomic datasets make unrestricted inference daunting as the crosssectional size increases. This has favoured the use of penalised regression and Bayesian methods for dealing with the problem of overparametrisation. The general idea is to use informative priors to shrink the unrestricted model towards a more parsimonious setting, thereby reducing parameter uncertainty and improving forecast accuracy (see Karlsson (2013), Koop and Korobilis (2010) for a survey).
In the Bayesian VAR (a.k.a. BVAR) literature, a plethora of different prior distributions have been proposed to perform sparse estimation (e.g., see Giannone et al. (2014)). Starting from the wellknown Minnesota prior (see Doan et al. (1984), Litterman (1986)), which specifies an objective prior on the coefficient and covariance matrices of a VAR, several parametric approaches have been developed exploiting hierarchical structures and finite mixtures (e.g., Kalli and Griffin (2014), Gefang (2014), Huber and Feldkircher (2019), Kastner and Huber (2018)).
Among the recent contributions for dealing with large dimensional models, we distinguish two approaches: the first attempts to reduce the size of the data to handle or to process during each step of the inferential algorithm, while the second is concerned with the reduction of the size of the parameter space. Within the first class, we mention the Bayesian compressed VAR of Koop et al. (2018), who tackled the dimensionality issue by using random projections to compress the data, and the Bayesian composite likelihood approach of Chan et al. (2018). On the other hand, Gefang et al. (2019) and Koop and Korobilis (2018) adopted a variational Bayes approach for performing efficient approximate posterior inference in large parameter spaces. Also, Kastner and Huber (2018) exploited factor models and hierarchical shrinkage priors for providing a parsimonious parametrisation of the covariance matrix which allows for equationbyequation estimation. Additional contributions for estimating large VAR and VARMA models include Koop and Korobilis (2013), Korobilis (2016) and Chan et al. (2016).
In addition to the large crosssectional dimensionality, also the temporal length of many economic and financial datasets is steadily increasing. Thus, the possible relations between different variables of interest can be described by static matrix of coefficients. This assumption can be elapsed by introducing a timevariation of the matrix of coefficients of the time series. In particular, the most common approach consists in specifying a process governing the evolution of the parameters of interest. According to the force driving this dynamics, we distinguish observationdriven and parameterdriven timevarying parameter (TVP) models. The first class is mainly represented by generalised autoregressive score models (GAS, see Creal et al. (2013)), while the second one includes Markov switching (e.g., Hamilton (1989), Krolzig (1997)), change point (e.g. Pesaran et al. (2006)) and random walk models (e.g. Del Negro and Primiceri (2015), Primiceri (2005)). These processes are able to describe parameters whose evolution is subject to switching regimes, structural breaks or smooth changes, respectively.
In the Bayesian and frequentist literature, the use of parametric models has been widely studied by applying different shrinkage methods (such as the Least Absolute Shrinkage and Selection Operator, known as LASSO). In particular, important papers focus on sparse and efficient estimation in highdimensional datasets. However, more recently increasing attention is being devoted to the issue of overshrinkage and to the modelling of nonzero coefficients (e.g.,
Giannone et al. (2018)). Consequently, there is an increasing need for adequate statistical tools capable of flexibly model the dynamics described by a VAR process, allowing for sparsity without incurring into overshrinking.In this paper, we aim to contribute to the growing literature on the use of Bayesian nonparametrics in time series analysis. In particular, Bayesian nonparametric techniques are widely in statistics, machine learning and data analysis as powerful tools for flexible modelling of complex data structure. Only recently, Bayesian nonparametrics has increased popularity in econometrics and in economic time series modelling to capture observation clustering effects (see e.g. Bassetti et al. (2014), Kalli and Griffin (2018) and Billio et al. (2019)).
Up to our knowledge, our paper is the first to provide sparse Bayesian nonparametric VAR model when the coefficients are timevarying and the proposed twostage prior specification can be easily extended to other classes, such as the seemingly unrelated regression (SUR) models. We propose a novel Bayesian nonparametric prior structure, which provides a sparse estimation of the coefficient matrix of a VAR model. This representation allows to manage the flexibility of nonzero entries and most importantly, to manage the timevariation in the matrix of coefficients through the atoms of the Dirichlet process and not through a statespace representation.
Our approach substantially differs from the existing literature in two aspects as described below. First, we consider a spikeandslab prior distribution for each entry of the coefficient matrix, where on the spike we have a parametric prior specification by mean of Dirac or diffuse prior. On the hand, the slab component has random nonparametric prior. Second, we impose prior dependence on the coefficients by specifying a Markov process for their random distribution. As a byproduct of the estimation procedure, we are able to extract a time series of dependent Grangercausality graphs. This shows how the BNPTVPVAR contributes to the literature on the estimation of timevarying networks from economic and/or financial series.
2 A Bayesian TimeVarying VAR Model
2.1 TVPVAR models
Let be the number of units in a dataset and a vector of variables available at time
. A timevarying parameters vector autoregressive model of order
(TVPVAR()) is defined as(1) 
where is the matrix of timevarying coefficients an is the time period. We assume that the error terms are i.i.d. for
. Eq. (1) can be written in the more compact form as(2) 
where we define ; ; is the Kronecker product and the columnwise vectorization operator that stacks the columns of a matrix into a column vector.
2.2 Prior specification
Let us consider the problem of defining a flexible prior for a timevarying parameter model and we define the following TVPVAR() with equal to as
(3) 
In Eq. (3), the number of parameters of the dimensional TVPVAR() model is , thus scales quadratically in . In macroeconomic and financial applications, the number of variables of interest ranges from (small size model) to (large model) and even (huge model). This highlights the twofold need for shrinkage estimation methods and in particular, for the introduction of sparse estimation of the coefficient matrix. In fact, it is very hard both to provide a meaningful interpretation for a large VAR with full timevarying matrix and to have an efficient and computationally feasible algorithm for an unrestricted estimation.
Motivated by this fact, we provide a prior distribution, which allows for sparse estimation in a timevarying parameter setting. For each coefficient of the matrix of parameters , we introduce a mixture prior with independent location and scale parameters:
where
is the probability distribution of the vector matrix of coefficients (for example, we can choose it as a Double Exponential or Laplace distribution). One of the most successful and widespread approach in the Bayesian literature consists in the use of (independent) spikeandslab prior distributions (e.g.,
Mitchell and Beauchamp (1988), George and McCulloch (1993), Smith and Kohn (1996), George and McCulloch (1997)) for each coefficient . Based on it, we specify a spikeandslab prior distribution for each , with and , of the form(4) 
where correspond to the spike and slab distributions, respectively, and
is the timevarying mixing probability (i.e., the prior probability of the spike component). In the literature, we have two commonly choices for
: a Dirac mass at such that; and a centered (in zero) Normal distribution
. The Dirac spike is a degenerate distribution that allows for variable selection as a byproduct of the estimation. Instead, the choice of a continuous, diffuse prior (like a Gaussian) allows for shrinkage of the coefficients and is computationally faster, but requires the postprocessing specification of threshold for the sake of variable selection.The standard choice for the slab component is a heavytailed distribution belonging to the family of Generalised Hyperbolic distribution (e.g., Double Exponential, Cauchy, Student), since the aim of this component is to capture potentially large nonzero coefficients. In the case of Dirac spike, the prior for each coefficient, for and , is given by
(5) 
while for a diffuse (Gaussian) spike we have
(6) 
Example 1.
In Fig. 1 we report an example of spikeandslab prior, with centred Gaussian spike distribution (in blue) and centred double exponential slab distribution (in red). From the left panel, which shows the two distributions, we can see that the Gaussian accounts for most of the prior mass on while the double exponential governs the tails. This is reflected in the plot on the right, which shows that the mixture distribution (with equal weights) has fatter tails than the Gaussian and more mass in a neighborhood of than the double exponential.
As previously described, we study the performance of the two different specification of the spike component by comparing the performances of the two constructions in extracting timevarying Granger causality networks from time series data.
The literature on timevarying parameter (TVP) models is vast. Some common parametric specifications include the threshold AR (TAR, e.g., Tong and Lim (1980)), smooth transition AR (STAR, e.g.,Teräsvirta (1994)), along with their multivariate generalisations, Markov switching process (e.g., Hamilton (1989), Krolzig (1997)), change point process (e.g. Pesaran et al. (2006)) and random walk process (e.g. Del Negro and Primiceri (2015), Primiceri (2005)). The choice of the particular specifications has been motivated by the intent to capture a particular feature of the dynamic evolution of the coefficients, such as changing regimes, structural breaks or smooth variations.
Differently from the existing literature on TVPVAR models, we model the temporal dependence of the autoregressive parameters via assuming that the underlying prior (random) distributions evolve according to a discretetime Markov process. A standard approach in Bayesian nonparametrics involves the specification of a Dirichlet Process (a.k.a. DP, see Ferguson (1973)) or a Dirichlet Process mixture (a.k.a. DPM, see e.g., Lo (1984)) prior for the distribution of the parameters of interest. The use of DP and related priors for a random probability measure allows for clustering of the variables .
We proceed by introducing the prior temporal dependence between the random measures via the time series Dirichlet Process (tsDDP) of NietoBarajas et al. (2012). As stated in the paper, for time series models, it is convenient to use dependence on the weights and common location. In practice, we apply a common discretization over the sequence of random measures, while the assumption of common weights and dependent location will lead to a discretization over the probability scale. In opposite to Taddy (2010), who was working with equally spaced time points, we accommodate for unequal time points. In our analysis a latent binomial process to induce the desired correlation has been used, differently from the stickbreaking random probability measures as in Taddy (2010), which use a beta autoregression on the fractions of the stickbreaking constructions by mean of two sets of latent variables.
By exploiting the stickbreaking construction of Sethuraman (1994), the time series Dirichlet Process imposes a dependence for the random probability measures
(7) 
where the locations are fixed and the weights vary over time. The dependence is described by a Markov process for each unnormalised stickbreaking weight , with , via auxiliary variables (in the spirit of Pitt et al. (2002), Pitt and Walker (2005)), as follows
(8)  
The hyperparameter tunes the strength of the dependence between and , such that implies , while implies with probability 1 (see NietoBarajas et al. (2012)). Note that this construction implies that at each time , the marginal distribution of each random measure is a Dirichlet Process, that is
with total mass parameter and base measure , such that the base measure defines the expectation and the mass parameter is interpreted as the precision parameter.
Eq. (8
) explains the joint distribution of
and and it allows us to define the joint model for as a , where is the sequence of the strength of dependence, for and . In Figure 2, we show a single draw of , with total mass parameter equal to ; base measureas a normal distribution with zero mean and variance
, i.e. and total timing .In order to assess the dependence structure induced by the time series Dirichlet Process, we consider the correlation between two random probability measures and . The following proposition is explaining this correlation:
Proposition 2.1 (NietoBarajas et al. (2012)).
Let be measurable. For and any let . Then
where
Remark 1.
The correlation between is larger in regions where the prior mean assigns more probability, meaning that the prior places strongest dependence in most probable regions. Note that strong dependence between does not imply strong dependence between their outcomes .
We can summarize what we have described above in the following prior structure^{1}^{1}1
We use the shapescale parametrisation of the Gamma distribution (thus
and) and Inverse Gamma distribution, whose probability density functions are, respectively
The exponential distribution is obtained as a particular case when
, that is ., for and as follows(9)  
(10)  
(11)  
(12)  
(13)  
(14) 
where is either a Dirac mass at or a Normal distribution centered in zero and with variance , which marginally has Inverse Gamma prior distribution . If we marginalize over , we have a Double Exponential slab distribution for each entry of the coefficient matrix. Following the notation of Eq. (4) we have , resulting in
For the covariance matrix , we assume an Inverse Wishart prior distribution that is:
(15) 
where and
are the degrees of freedom and scale hyperparameters, respectively.
In summary, the observational model in Eq. (2) together with the prior structure in Eqs. (9) to (14) lead to the BNPTVPVAR(1) model. Eq. (9)–(14) represent our hierarchical prior and in Figure 3
we represent them through a Directed Acyclic Graph (DAG) for the Normal spike specification. The observable and nonobservable random variables are indicated through shadow and empty circles, respectively. On the left side we have the prior for
, while on the right side, we have the hierarchial prior for , with a description of the first and second stage of the hierarchy by means of the shrinking parameters , and .Sufficient conditions for stationarity of TVP autoregressive models are given for the univariate case (despite the proof is valid also in the multivariate setting) in Brandt (1986), while Bourgerol and Picard (1992) provides conditions for multivariate regressions where the coefficients are independent and identically distributed. The sufficient conditions given by Brandt (1986) is reported below.
Theorem 2.1 (Brandt (1986)).
Let be a strictly stationary ergodic process such that both and are finite. Suppose that the top Lyapunov exponent defined by
is strictly negative. Then, for all , the series
converges a.s., and the process is the unique strictly stationary solution of
2.3 Hyperparameter elicitation
Following NietoBarajas et al. (2012), we assume , for each and . Higher values of strengthen the dependence between the unnormalised weights , however when big may induce the prior to overcome the likelihood, especially when the sample size is small. For this reason they specify a Poisson prior distribution for , truncated on . Given the complexity of our prior specification, we prefer to fix the value of , which is sufficiently small to avoid overweighting of the prior^{2}^{2}2In our empirical application, the sample size is , while NietoBarajas et al. (2012) have . and then check the robustness of the results to alternative values of . We choose the following values for the hyperparameters:
This choice amounts to assuming a uniform prior on each and a rather uninformative prior on the covariance matrix, . The value of the concentration parameter is set according to standard practice in Dirichlet Process literature. The hyperparameters imply that for each new component of the Dirichlet Process the prior distribution of is centered at zero mean with mediumhigh variance, whereas the prior for has mean . Instead, the values of imply that the prior variance of the (diffuse) spike distribution is , reflecting that this component should account for coefficients not significantly different from zero.
3 Posterior computation
3.1 Sampling method
Since the joint posterior distribution is not tractable and it is complex to be sample from, Bayesian estimator cannot be obtained analytically. In this paper, we rely on simulation based inference methods, and develop a Gibbs sampler algorithm for approximating the posterior distribution.
In order to deal with the finite mixture provided by the spikeandslab prior and the infinite mixture given by the DPM, we exploited a data augmentation approach. For each and , we introduce two sets of allocation variables ; a set of stickbreaking variables, ; a set of auxiliary variables (for ) and a set of slice variables, . The allocation variables, , selects the spike component , when is equal to zero and the slab component, when it is equal to one. The second allocation variable, , selects the component of the Dirichlet mixture to which each single coefficient is allocated to. The sequence of stickbreaking variables defines the mixture weights, whereas the slice variable, , allows us to deal with the infinite mixture components by identifying a finite number of stickbreaking variables to be sampled and an upper bound for the allocation variables .
Finally, we obtain the following joint posterior distribution
(16) 
where and are the collections of slice variables and stickbreaking components, respectively; and are the auxiliary and latent variables, respectively; and are the allocation variables; are the atoms, where ranges from to the number of allocated DP components; is the vector of VAR coefficients and are the specific probabilities of shrinking coefficients to zero.
We obtain random samples from the posterior distributions by Gibbs sampling. The Gibbs sampler is based on the algorithm of Hatjispyros et al. (2011) and on the slice sampler approach of Walker (2007) and Kalli et al. (2011) for estimating the weights and locations of each random measure . For improving the mixing of the MCMC, we introduced some Hamiltonian Monte Carlo (see Neal (2011)) steps in spite of drawing from the full conditional posterior distribution. Hereafter, we show the iterative steps by using the conditional independence between variables, for , , and :

[label=(0)]

the slice and stickbreaking variables and are updated along with the auxiliary variable given ;

the latent scale variables are updated given ;

the parameters of the stickbreaking locations are updated given ;

the allocation variables are jointly updated given ;

the VAR coefficients are jointly updated given ;

The covariance matrix is updated given ;

the mixing probability of having sparse coefficients is updated given .
The detailed Gibbs sampler is described in Appendix A and Appendix C.
3.2 Graph extraction
Based on the Gibbs sampler previously described, we are able to extract timevarying Grangercausal graphs. In the literature, linkages and networks describing the relationships between variables of interest, such as macroeconomics and financial linkages (e.g. Billio et al. (2012) and Barigozzi and Brownlees (2019)
) can be used to extract pairwise Granger causality. This approach is generating spurious causality effects and does not consider conditioning on variables of interest. The main problem relies on the high number of variables available relative to the number of data, thus it could lead to overparametrization and inefficiency in gauging the causal relationships. Our proposed prior can be used to extract the networks and pairwise Granger causality while reducing the overfitting and curse of dimensionality problems. Moreover, the introduction of our prior could lead to the extraction of edgecolored graphs, that allows us to identify stylized facts in financial or macroeconomics networks and to show the presence of communities, hubs and linkage heterogeneity.
From the MCMC output of the timevarying coefficient matrix , we are able to extract timevarying Grangercausal graphs. At each time , we use the posterior random partition induced by the nonparametric (slab) distribution to cluster the edges of the graph (i.e., the entries , of the vectorised coefficient matrix ) into groups.
Formally, a graph is a pair , where is a set of nodes and is a set of nodes pairs, named links or edges. The nodes are labeled and a link/edge is identified by the pair of nodes it connects, . In particular, we have the existence of an edge if and only if the timevarying VAR coefficients of the variable in the equation of is not null. In our network analysis, we focus on the adjacency matrix constructed a posterior from the allocation variables and it allows to take both values between and if we apply a threshold, while if the values are allowed to vary between and , we have a weighted graph. The purpose is to estimate the most significant timevarying dependence interrelationships (in terms of Grangercausality) between the variables of interest.
Example 2.
Consider the TVPVAR(1) model in Eq. (2) and let . Without loss of generality, focus on the coefficient matrices at three consecutive times and . Suppose the posterior estimates of the coefficient matrices and allocation variables , respectively, are as follows
(17) 
The corresponding Grangercausal graphs are given in Fig. 4, where colours have been used to denote the cluster assignment encoded in the matrices and .
(a)  (b)  (c) 

4 Conclusions
We proposed the BNPTVPVAR model for sparse, nonparametric inference in timevarying VAR models. The use of spikeandslab priors with timeseries dependent Dirichlet Process prior for the slab component allows to contemporaneously shrink the autoregressive coefficients and flexibly modelling timevarying nonzero entries. We applied the proposed methodology using two alternative spike distributions: a Dirac and a Normal distribution. The performance of the resulting models has been compared in terms of: (i) sparse estimation and variable selection, and (ii) clustering structure. Moreover, we showed how the BNPTVPVAR model can be used for extracting Grangercausal timedependent graphs from multivariate time series.
References
 Barigozzi and Brownlees (2019) Barigozzi, M. and Brownlees, C. (2019). Nets: Network estimation for time series. Journal of Applied Econometrics, 34(3):347–364.

Bassetti et al. (2014)
Bassetti, F., Casarin, R., and Leisen, F. (2014).
Betaproduct dependent Pitman–Yor processes for Bayesian inference.
Journal of Econometrics, 180(1):49–72.  Belmonte et al. (2014) Belmonte, M. A. G., Koop, G., and Korobilis, D. (2014). Hierarchical shrinkage in timevarying parameter models. Journal of Forecasting, 33(1):80–94.
 Bianchi et al. (2019) Bianchi, D., Billio, M., Casarin, R., and Guidolin, M. (2019). Modeling systemic risk with Markov switching graphical SUR models. Journal of Econometrics, 2010(1):58–74.
 Billio et al. (2019) Billio, M., Casarin, R., and Rossini, L. (2019). Bayesian nonparametric sparse VAR models. Journal of Econometrics, Forthcoming.
 Billio et al. (2012) Billio, M., Getmansky, M., Lo, A. W., and Pelizzon, L. (2012). Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of Financial Economics, 104(3):535–559.
 Bitto and FrühwirthSchnatter (2019) Bitto, A. and FrühwirthSchnatter, S. (2019). Achieving shrinkage in a timevarying parameter model framework. Journal of Econometrics, 210(1):75–97.
 Bourgerol and Picard (1992) Bourgerol, P. and Picard, N. (1992). Strict stationarity of generalised autoregressive processes. Annals of Probability, 20(1):1714–1730.
 Brandt (1986) Brandt, A. (1986). The stochastic equation with stationary coefficients. Advances in Applied Probability, 18(1):211–220.
 Chan et al. (2018) Chan, J., Eisenstat, E., Hou, C., and Koop, G. (2018). Composite likelihood methods for large Bayesian VARs with stochastic volatility. CAMA Working Paper.
 Chan et al. (2016) Chan, J., Eisenstat, E., and Koop, G. (2016). Large Bayesian VARMAs. Journal of Econometrics, 192(2):374–390.
 Cogley and Sargent (2005) Cogley, T. and Sargent, T. J. (2005). Drifts and volatilities: monetary policies and outcomes in the post WWII US. Review of Economic Dynamics, 8(2):262–302.
 Creal et al. (2013) Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28(5):777–795.
 Dangl and Halling (2012) Dangl, T. and Halling, M. (2012). Predictive regressions with timevarying coefficients. Journal of Financial Economics, 106(1):157–181.
 Del Negro and Primiceri (2015) Del Negro, M. and Primiceri, G. E. (2015). Time varying structural vector autoregressions and monetary policy: A corrigendum. Review of Economic Studies, 82(1342–1345).
 Diebold and Yilmaz (2009) Diebold, F. X. and Yilmaz, K. (2009). Measuring financial asset return and volatility spillovers, with application to global equity markets. The Economic Journal, 119(534):158–171.
 Diebold and Yilmaz (2012) Diebold, F. X. and Yilmaz, K. (2012). Better to give than to receive: Predictive directional measurement of volatility spillovers. International Journal of Forecasting, 28(1):57–66.
 Doan et al. (1984) Doan, T., Litterman, R., and Sims, C. (1984). Forecasting and conditional projection using realistic prior distributions. Econometric Reviews, 3(1):1–100.
 Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
 Gefang (2014) Gefang, D. (2014). Bayesian doubly adaptive elasticnet Lasso for var shrinkage. International Journal of Forecasting, 30(1):1–11.
 Gefang et al. (2019) Gefang, D., Koop, G., and Poon, A. (2019). Variational bayesian inference in large vector autoregressions with hierarchical shrinkage. CAMA Working Paper.
 George and McCulloch (1993) George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881–889.
 George and McCulloch (1997) George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7:339–373.
 Giannone et al. (2014) Giannone, D., Lenza, M., and Primiceri, G. E. (2014). Prior selection for vector autoregressions. The Review of Economics and Statistics, 97(2):436–451.
 Giannone et al. (2018) Giannone, D., Lenza, M., and Primiceri, G. E. (2018). Economic predictions with big data: The illusion of sparsity. FRB of New York Staff Report No. 847. Available at SSRN: https://ssrn.com/abstract=3166281.
 Hamilton (1989) Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2):357–384.
 Hatjispyros et al. (2011) Hatjispyros, S. J., Nicoleris, T., and Walker, S. G. (2011). Dependent mixtures of dirichlet processes. Computational Statistics & Data Analysis, 55(6).
 Huber and Feldkircher (2019) Huber, F. and Feldkircher, M. (2019). Adaptive shrinkage in Bayesian vector autoregressive models. Journal of Business & Economic Statistics, 37(1):27–39.
 Kalli and Griffin (2014) Kalli, M. and Griffin, J. E. (2014). Timevarying sparsity in dynamic regression models. Journal of Econometrics, 178(2):779–793.
 Kalli and Griffin (2018) Kalli, M. and Griffin, J. E. (2018). Bayesian nonparametric vector autoregressive models. Journal of Econometrics, 203(2):267–282.
 Kalli et al. (2011) Kalli, M., Griffin, J. E., and Walker, S. G. (2011). Slice sampling mixture models. Statistics and computing, 21(1):93–105.
 Karlsson (2013) Karlsson, S. (2013). Forecasting with bayesian vector autoregression. In Handbook of economic forecasting, volume 2, pages 791–897. Elsevier.
 Kastner and Huber (2018) Kastner, G. and Huber, F. (2018). Sparse Bayesian vector autoregressions in huge dimensions. arXiv preprint arXiv:1704.03239.
 Koop and Korobilis (2010) Koop, G. and Korobilis, D. (2010). Bayesian multivariate time series methods for empirical macroeconomics. Foundations and Trends in Econometrics, 3(4):267–358.
 Koop and Korobilis (2013) Koop, G. and Korobilis, D. (2013). Large timevarying parameter vars. Journal of Econometrics, 177(2):185–198.
 Koop and Korobilis (2018) Koop, G. and Korobilis, D. (2018). Variational bayes inference in highdimensional timevarying parameter models. arXiv preprint arXiv:1809.03031.
 Koop et al. (2018) Koop, G., Korobilis, D., and Pettenuzzo, D. (2018). Bayesian compressed vector autoregressions. Journal of Econometrics.
 Korobilis (2016) Korobilis, D. (2016). Prior selection for panel vector autoregressions. Computational Statistics & Data Analysis, 101:110–120.
 Krolzig (1997) Krolzig, H.M. (1997). MarkovSwitching Vector Autoregressions: Modelling, Statistical Inference, and Application to Business Cycle Analysis. SpringerVerlag Berlin Heidelberg.
 Litterman (1986) Litterman, R. B. (1986). Forecasting with bayesian vector autoregressions – five years of experience. Journal of Business & Economic Statistics, 4(1):25–38.
 Lo (1984) Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates: I. density estimates. The Annals of Statistics, 12(1):351–357.
 McCracken and Ng (2016) McCracken, M. W. and Ng, S. (2016). Fredmd: A monthly database for macroeconomic research. Journal of Business & Economic Statistics, 34(4):574–589.

Mitchell and Beauchamp (1988)
Mitchell, T. J. and Beauchamp, J. J. (1988).
Bayesian variable selection in linear regression.
Journal of the American Statistical Association, 83(404):1023–1032. 
Neal (2011)
Neal, R. M. (2011).
MCMC using Hamiltonian dynamics.
In Brooks, S., Gelman, A., Galin, J. L., and Meng, X.L., editors,
Handbook of Markov Chain Monte Carlo
, chapter 5. Chapman & Hall /CRC.  NietoBarajas et al. (2012) NietoBarajas, L. E., Muller, P., Ji, Y., Lu, Y., and Mills, G. B. (2012). A TimeSeries DDP for Functional Proteomics Profiles. Biometrics, 68:859–868.
 Pesaran et al. (2006) Pesaran, M. H., Pettenuzzo, D., and Timmermann, A. (2006). Forecasting time series subject to multiple structural breaks. The Review of Economic Studies, 73(4):1057–1084.
 Pitt et al. (2002) Pitt, M. K., Chatfield, C., and Walker, S. G. (2002). Constructing first order stationary autoregressive models via latent processes. Scandinavian Journal of Statistics, 29(4):657–663.
 Pitt and Walker (2005) Pitt, M. K. and Walker, S. G. (2005). Constructing stationary time series models using auxiliary variables with applications. Journal of the American Statistical Association, 100(470):554–564.
 Primiceri (2005) Primiceri, G. E. (2005). Time varying structural vector autoregressions and monetary policy. Review of Economic Studies, 72(821–852).
 Sethuraman (1994) Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica sinica, 4(2):639–650.
 Sims (1980) Sims, C. A. (1980). Macroeconomics and reality. Econometrica, 48(1):1–48.
 Smith and Kohn (1996) Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics, 75(2):317–343.
 Stock and Watson (2007) Stock, J. H. and Watson, M. W. (2007). Why has US inflation become harder to forecast? Journal of Money, Credit and Banking, 39(1):3–33.
 Taddy (2010) Taddy, M. A. (2010). Autoregressive mixture models for dynamic spatial Poisson processes: Application to tracking intensity of violent crime. Journal of the American Statistical Association, 105(492):1403–1417.
 Teräsvirta (1994) Teräsvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models. Journal of the American Statistical Association, 89(425):208–218.
 Tong and Lim (1980) Tong, H. and Lim, K. (1980). Threshold autoregression, limit cycles and cyclical cata (with discussion of the paper). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 42(3):245–292.
 Walker (2007) Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Communications in Statistics—Simulation and Computation, 36(1):45–54.
Appendix A Posterior distributions: diffuse DE spike
a.1 Posterior for stick breaking unnormalised weights
Posterior distribution for , for all and all , where is the number of ties. We use the convention , for all .
where
The normalised weights are then computed by . The posterior distribution of the latent auxiliary variables , for all and all is given by
Finally, the auxiliary variable for the slice sampler has posterior distribution given by
a.2 Posterior for
Posterior distribution for , for . Define are the location and scale of , respectively, when sparse or nonsparse component is chosen.
Comments
There are no comments yet.