We consider the problem of Bayesian inference over the parameters and the latent states in non-Gaussian non-linear state space models, and in particular a factor stochastic volatility model, which is a particular example of a high dimensional state space model. However, we believe that most of the lessons learnt are likely to apply more generally. Joint inference over both the parameters and latent states in such models can be challenging because the likelihood is an integral over the latent state variables. This integral is often analytically intractable and can also be computationally challenging to evaluate numerically when the dimensions of the latent states is high. In an important contribution in the literature, Andrieu et al. (2010) proposed two particle Markov chain Monte Carlo (PMCMC) methods for state space models. The first is the particle marginal Metropolis-Hastings (PMMH) method, where the parameters are generated with the unobserved latent states integrated out. The second is the particle Gibbs (PG) method, which generates the parameters conditional on the states. Both methods run a particle filter algorithm within an MCMC at each iteration. They show that for any finite number of particles, the target density used by these two algorithms has the joint posterior density of the parameters and states as a marginal density. Their work has been extended by Lindsten and Schön (2012), Lindsten et al. (2014), Olsson and Ryden (2011), Mendes et al. (2018), and Gunawan et al. (2018).
An alternative approach to MCMC is the annealed importance sampling (AIS) approach discussed by Neal (2001), which is a sequential importance sampling method where samples are first drawn from an easily-generated distribution and then moved towards the target distribution through Markov kernels. It is a useful method for estimating expectations with respect to the posterior of the parameters when it is possible to evaluate the likelihood pointwise. In general, the AIS method has some advantages compared to MCMC, and more generally, PMCMC approaches: (a) the Markov chain generated by an MCMC sampler can often be trapped in local modes and it is difficult to assess whether the chain has mixed adequately and converged to its invariant density, whereas AIS explores the parameter space more efficiently when the target distribution is multimodal, and assessing convergence to the posterior is much less of an issue. (b) it is often difficult to use MCMC to estimate the marginal likelihood, which is of interest because it is often used to choose between models (Kass and Raftery, 1995; Chib and Jeliazkov, 2001). Using AIS to estimate the likelihood is far more straightforward. (c) MCMC algorithms are not parallelizable in general, whereas it is straightforward to parallelize the AIS algorithm as discussed in Neal (2001), Del Moral et al. (2006), and Duan and Fulop (2015).
Duan and Fulop (2015) extends the annealing/tempering method of Neal (2001), and Del Moral et al. (2006) to time series state space models having intractable likelihoods and call their approach the Density Tempered Sequential Monte Carlo (SMC) algorithm. Their SMC approach includes three steps: reweighting, resampling, and Markov moves which are discussed in Section 2. A critical component of the annealing or density tempering method is the Markov move component that is implemented at every stage of the annealing process. The Markov move component effectively runs a small Markov chain Monte Carlo for each combination of the parameters and latent variables to help diversify the collection of parameters and latent variables so that they are better approximations to the tempered target density at that level or temperature. Duan and Fulop (2015) use a pseudo marginal Metropolis-Hastings (PMMH) approach with the likelihood estimated unbiasedly by the particle filter in the Markov move component. There are two issues with the Markov move based on PMMH. First, it is computationally costly to follow the guidelines of Pitt et al. (2012)
for high dimensional state space models and set the optimal number of particles in the particle filter such that the variance of the log of the estimated likelihood is around; we show this in our empirical example in Section 5. It is also possible to implement the correlated PMMH of Deligiannidis et al. (2017) which is more efficient than standard PMMH for low dimensional state space models. However, given that we consider the high dimensional state space model in Section 5, even the correlated PMMH would get stuck unless a very large number of particles is used to ensure the variance of the log of the estimated likelihood is around 1. Second, it is hard to implement the PMMH Markov move efficiently when the dimension of the parameter space in the model is large because it is difficult to obtain good proposals for the parameters. This is because the first and second derivatives with respect to the parameters can only be estimated, while a random walk proposal is easy to implement but is very inefficient in high dimensions.
Our contribution is to develop and study two flexible annealing approaches that use Markov move steps that are more efficient than the density tempered SMC approach of Duan and Fulop (2015) and can handle a high dimensional parameter space. The first is based on the PG algorithm of Andrieu et al. (2010) and we call it Annealed Importance Sampling for Intractable Likelihood with particle Gibbs (AISIL-PG). The second is based on Hamiltonian Monte Carlo (HMC) algorithm discussed by Neal (2011) and we call it Annealed Importance Sampling for Intractable Likelihood with Hamiltonian Monte Carlo (AISIL-HMC).
The AISIL-PG and AISIL-HMC approaches refine the density-tempered SMC methodology proposed by Duan and Fulop (2015) to allow for more flexible Markov moves and apply to any state space model. In particular, they allow for particle Metropolis within Gibbs moves and Hamiltonian Monte Carlo moves. Such a generalisation is important for two reasons. First, it permits applications to much higher dimensional parameter spaces and much better proposals than random walks; in particular, in some cases we are able to sample from the full conditional distribution using Gibbs steps. Second, using particle Metropolis within Gibbs (PMwG) Markov moves can substantially reduce the number of particles required because it is then unnecessary for the variance of the log of the likelihood estimate to be around 1 as we show in Section 5. The AISIL methods also provide an estimate of the marginal likelihood as a by product. We illustrate the proposed AISIL methods empirically using a sample of daily US stock returns to estimate a univariate stochastic volatility (SV) model and a multivariate high dimensional factor SV model.
The rest of the article is organised as follows. Section 2 outlines the basic state space model and the AISIL approach to such a model. Section 3 discusses in detail the application of the AISIL-PG and AISIL-HMC methods to the univariate SV model. Section 4 introduces the factor SV model, and discusses in detail the application of the AISIL-PG and AISIL-HMC methods to this model. Section 5 presents empirical results for both the univariate SV and for the factor SV models. The article also has an online supplement that contains some further technical empirical results. We use the following notation in both the main paper and the online supplement. Eq. (1), Sec. 1, Alg. 1 and Sampling Scheme 1, etc. refer to the main article, while Eq. (S1), Sec. S1, Alg. S1, and Sampling Scheme S1, etc. refer to the supplement.
2 Flexible Density Tempering
This section first introduces the basic state space model and then discusses the two flexible AISIL approaches.
2.1 The Basic State Space Model
We consider a state space model where the latent states determine the evolution of the system. The initial density of the latent state is and the transition density of given is for . The observations for are linked to the latent states through the observation equation . Denote a dimensional Euclidean space by
. We assume that: (i) the state vectorand where and ; (ii) the parameter vector , where is a subset of .
Our aim is to perform Bayesian inference over the latent states and the parameters conditional on the observations . By using Bayes rule, the joint posterior of the latent states and the parameters is
is the prior density of and is the marginal likelihood of . For notational simplicity we will often write and .
2.2 The Annealing Approach for State Space Models
This section discusses our proposed AISIL estimation method. The main idea of the proposed method is to begin with an easy to sample distribution and propagate a particle cloud through a sequence of tempered target densities , for , to the posterior density of interest which is much harder to sample from directly. The tempered densities are defined as
The tempering sequence is such that . If it is both easy to generate from the densities and , and evaluate them, then we take , and then
We now discuss the general AISIL approach that is summarized by Alg. 1. The initial particle cloud is obtained by generating the from , and giving the particles equal weight, i.e., . The weighted particles (particle cloud) at the st level (or stage) of the annealing is an estimate of . Based on this estimate of , the AISIL algorithm goes through the following steps to obtain an estimate of . The move from to is implemented by reweighting the particles by the ratio of the two unnormalized densities , yielding the following weights,
We follow Del Moral et al. (2012) and choose the tempering sequence adaptively to ensure a sufficient amount of particle diversity by selecting the next value of automatically such that the effective sample size (ESS) stays close to some target value chosen by the user. The effective sample size ESS is used to measure the variability in the , and is defined as . The ESS varies between 1 and M, with a low value of ESS indicating that the weights are concentrated only on a few particles and a value of means that the particles are equally weighted. We achieve an ESS close to the target value by evaluating the ESS over a grid of potential values and select as that value of the value of whose ESS is the closest to . We then resample the particles proportionally to their weights , which means that the resampled particles then have equal weight . This has the effect of eliminating particles with negligible weights and replicating the particles with large weights, so the ESS is now .
Repeatedly reweighting and resampling can seriously reduce the diversity of the particles, leading to particle depletion. The particle cloud may then not be a good representation of . To improve the approximation of the particle cloud to , we carry out Markov move steps for each particle using a Markov kernel which retains as its invariant density. Thus, at each annealing schedule, we run a short MCMC scheme for each of the particles.
Step and Steps 2a-2d of Alg. 1 are standard and apply to any model with slight modification so it is only necessary to discuss the Markov move step in detail for each model. We propose two Markov move algorithms that leave the target density invariant. The first algorithm is based on particle Gibbs (Andrieu et al., 2010) which samples the latent states and is denoted as the AISIL-PG algorithm. The second algorithm is based on the Hamiltonian Monte Carlo proposal (Neal, 2011) to sample the latent states and is denoted as the AISIL-HMC algorithm. By using the results from Del Moral et al. (2006), the AISIL algorithm provides consistent inference for the target density as goes to infinity. See also Beskos et al. (2016)
for recent consistency results for the adaptive algorithm employed here and the associated central limit theorem.
Set and initialise the particle cloud , by generating the from , and giving them equal weight, i.e., , for
While the tempering sequence do
Find adaptively by searching across a grid of to maintain effective sample size (ESS) around some constant .
Compute new weights,
Resample using the weights to obtain .
Let be a Markov kernel having invariant density . For , move each times using the Markov kernel to obtain .
3 Applying the AISIL method to the Univariate Stochastic Volatility Model
This section discusses the application of the AISIL methods to the simple univariate SV model. We introduce this univariate model first as it forms the basis of the factor SV model studied later.
with and . We call the latent volatility process. The unknown parameters are , , and . We place the following prior distributions on the parameters, and assume that the parameters are independent apriori: (a) . (b) We follow Jensen and Maheu (2010) and choose the prior for as inverse Gamma with and . (c) We restrict the persistence parameter to ensure stationarity, and follow Kim et al. (1998) and choose the prior for as , with and .
3.1 AISIL-HMC Markov Moves
This section discusses the application of the AISIL-HMC Markov moves for the univariate stochastic volatility (SV) model in Eq. (2). We first define the appropriate sequence of intermediate target density for ,
3.2 AISIL-PG Markov Moves
This section discusses the application of AISIL-PG Markov moves to a univariate stochastic volatility (SV) model. The main idea of Markov moves based on particle Gibbs is to define a sequence of tempered target densities on an augmented space that includes all of the parameters and all the particles generated by particle filter algorithm. The augmented target density has the as a marginal density for . The particle filter is discussed in Section 3.2.1. The augmented target density is discussed in Section 3.2.2.
3.2.1 Sequential Monte Carlo/Particle Filtering
We first describe the particle filter methods used to obtain sequential approximations to the densities for . The particle filter algorithm consists of recursively producing a set of weighted particles such that the intermediate densities , defined on the sequence of spaces , are approximated by
where is the Dirac delta distribution located at . In more detail, given samples (representing the filtering density at time , we use
to obtain the particles (representing ), by first drawing from a known and easily sampled proposal density function and then computing the unnormalised weights to account for the difference between the target posterior density and the proposal, where
and then normalize .
Chopin (2004) shows that as the number of time periods increases, the normalised weights of the particle system become concentrated on only few particles and eventually the normalised weight of a single particle converges to one. This is known as the ‘weight degeneracy’ problem. One way to reduce the impact of weight degeneracy is to include resampling steps in a particle filter algorithm. A resampling scheme is defined as , where indexes a particle in
and is chosen with probability. Some assumptions on the proposal density and resampling scheme are given in Section S4. In our empirical application in Section 5, we use the bootstrap filter with as a proposal density and systematic resampling.
3.2.2 AISIL-PG Augmented Target Density
This section discusses the appropriate sequence of intermediate augmented target densities , for , for the state space model described in Section 2.1
. It includes all the random variables which are produced by the particle filter method. Let
denote the vector of particles. The joint distribution of the particles given the parametersis
The backward simulation algorithm introduced by Godsill et al. (2004) is used to obtain a sample from the particle approximation of . by sampling the indices sequentially. We denote the selected particles and trajectory by and , respectively. Further, let us denote as the collection of all random variables except the chosen particle trajectory. The AISIL constructs a sequence of tempered densities , , based on the following augmented target density
The next lemma gives the important properties of the target density. Its proof is in Section S3.
(i) The target distribution in Eq. (3) has the marginal distribution
(ii) The conditional distribution is
Let be a partition of the parameter vector into components, where each component may be a vector. Alg. 3 gives the Markov move based on particle Gibbs algorithm.
For , we propose the following Markov steps
For , sample from the proposal density
Accept the proposed values with probability
Sample using the conditional sequential Monte Carlo (CSMC) algorithm given in Alg. S4.
Sample using the backward simulation algorithm given in Alg. S5.
Note that the expression
appearing in given in Eq. (3) is the density under of all the variables that are generated by the particle filter algorithm conditional on a pre-specified path. This is the conditional sequential Monte Carlo algorithm of Andrieu et al. (2010). It is a sequential Monte Carlo algorithm in which a particle , and the associated sequence of ancestral indices are kept unchanged with all other particles and indices resampled and updated. Alg. S4 in Section S6 describes the conditional sequential Monte Carlo algoritm and is the key part of the Markov move steps. Alg. 4 gives the Markov move step based on particle Gibbs for the univariate SV model.
3.3 Estimating the Marginal Likelihood
The marginal likelihood is often used in the Bayesian literature to compare models (Chib and Jeliazkov, 2001). An advantage of the AISIL method is that it offers a natural way to estimate the marginal likelihood. We note that , , so that
Because the particle cloud obtained after iteration approximates , the ratio above is estimated by
giving the marginal likelihood estimate
4 The Multivariate Factor Stochastic Volatility Model
The factor SV model is a parsimonious multivariate stochastic volatility model that is often used to model a vector of stock returns; see, for example, Chib et al. (2006) and Kastner et al. (2017). Suppose that is a vector of daily stock prices and define as the log-return of the stocks. We model as a factor SV model
where is a vector of latent factors (with ), and is a factor loading matrix of unknown parameters. We model the latent factors as and . The time varying variance matrices and depend on unobserved random variables and such that
Each of the log stochastic volatilites and are assumed to follow independent first order autoregressive processes, with
Conditional Independence in the factor SV model
The key to making the estimation of the factor SV model tractable is that given the values of , the factor model in Eq. (5) separates into independent components consisting of univariate SV models for the latent factors with the th ‘observation’ of the th factor univariate SV model and univariate SV models for the idiosyncratic errors with the th ‘observation’ on the th error SV model. Section 4.2 discusses the AISIL-HMC method for the factor SV model and Section 4.3 discusses the AISIL-PG method.
4.2 Application of the AISIL-HMC method to the Multivariate Factor Stochastic Volatility Model
This section discusses the application of the AISIL-HMC method to the multivariate factor SV model described in Section 4.1. We first define the appropriate sequence of intermediate target densities,
We propose this following Markov move steps, with details in Section S2.1.
4.3 Application of the AISIL-PG method to the Multivariate Factor Stochastic Volatility Model
Augmented Intermediate Target Density for the Factor SV model
This section provides an appropriate augmented tempered target density for the factor SV model. This augmented tempered target density includes all the random variables produced by univariate particle filter methods that generate the factor log-volatilities for and the idiosyncratic log-volatilities for , as well as the latent factors and the parameters . We use Eq. (6) to specify the particle filters for idiosyncratic SV log-volatilities for , and equation 7 to specify the univariate particle filters that generate the factor log-volatilities for . We denote the weighted samples at time for the factor log-volatilities by and for the idiosyncratic error log-volatilities. The corresponding proposal densities are , , , and