Bayesian optimisation for fast approximate inference in state-space models with intractable likelihoods
We consider the problem of approximate Bayesian parameter inference in non-linear state-space models with intractable likelihoods. Sequential Monte Carlo with approximate Bayesian computations (SMC-ABC) is one approach to approximate the likelihood in this type of models. However, such approximations can be noisy and computationally costly which hinders efficient implementations using standard methods based on optimisation and Monte Carlo methods. We propose a computationally efficient novel method based on the combination of Gaussian process optimisation and SMC-ABC to create a Laplace approximation of the intractable posterior. We exemplify the proposed algorithm for inference in stochastic volatility models with both synthetic and real-world data as well as for estimating the Value-at-Risk for two portfolios using a copula model. We document speed-ups of between one and two orders of magnitude compared to state-of-the-art algorithms for posterior inference.READ FULL TEXT VIEW PDF
Approximate Bayesian computation (ABC) methods can be used to sample fro...
This paper is on Bayesian inference for parametric statistical models th...
State space models contain time-indexed parameters, called states; some ...
Probabilistic (or Bayesian) modeling and learning offers interesting
We introduce a new method for inference in stochastic epidemic models wh...
Quadrature is the problem of estimating intractable integrals, a problem...
Approximate Bayesian computation (ABC) is a set of techniques for Bayesi...
Bayesian optimisation for fast approximate inference in state-space models with intractable likelihoods
Dynamical modelling of time series data is an essential part of modern econometrics (McNeil et al., 2010). A popular model is the state-space model (SSM) used extensively in both macroeconomics and financial economics to capture e.g. the volatility of log-returns using stochastic volatility (SV) models. Recently, it has been proposed by Stoyanov et al. (2010) and others to model the jumps frequently observed in log-returns using an -stable distribution (Nolan, 2003)
, which allows heavy-tailed and possibly skewed log-returns.
However, inference in models with -stable distributions is challenging and often takes significantly longer time to carry out even on modern computers. This is a hindrance to wide-spread adoption of such models in applied work. One such example application is Value-at-Risk (VaR) modelling for a portfolio of assets, where the volatility of each asset is captured using a SV model with -stable log-returns. Another example is multi-fractal learning models (Calvet and Czellar, 2015) to model asset prices.
We aim to develop an efficient Bayesian approach for parameter inference in SSMs,
where denotes unknown static parameters. Here, and denotes the latent state and the observations at time . It is also possible to include discrete states required e.g. by the multi-fractal learning model.
An important object in Bayesian inference is the parameter posterior distribution,
where denotes the parameter prior distribution and we denote the likelihood (with slight abuse of notation) as . For an SSM, we say that the likelihood (and thereby the posterior) is intractable, when cannot be evaluated point-wise. This intractability can be the result of that lacks an analytical closed-form expression, is defined recursively or is computationally prohibitive to evaluate.
One approach for inference in models with intractable likelihoods is approximate Bayesian computations (ABC; Marin et al., 2012). In our setting, we can make use of an ABC version of sequential Monte Carlo (SMC-ABC; Jasra et al., 2012; Calvet and Czellar, 2015) to estimate the likelihood of a perturbed version of (2). It is then straight-forward to make use of SMC-ABC in many inference algorithms, e.g., particle Metropolis-Hastings (PMH; Andrieu et al., 2010; Dahlin and Schön, 2015), gradient-based optimisation (Yıldırım et al., 2014) and the SPSA algorithm (Spall, 1998; Ehrlich et al., 2015).
However, a problem with the estimates obtained from SMC-ABC is that they often have a large variance and are computationally expensive. This usually results in long run-times (days) of inference algorithms, which makes it difficult to apply ABC efficiently for inference in many important settings. To mitigate this problem, we propose an approach based on Gaussian process optimisation (GPO;Brochu et al., 2010) for extracting a Laplace approximation of the posterior. This is an iterative optimisation method that often requires substantially less posterior evalutations and is more robust to noise compared with many alternatives. GPO operates by constructing a surrogate function that mimics the parameter posterior in analogue with Wood (2010). The resulting surrogate is smooth and computationally cheap to evaluate, which enables the use of standard optimisation methods to extract a Laplace approximation of the surrogate mimicking the true posterior.
The main contribution of this paper is to introduce, develop and numerically study the proposed GPO-SMC-ABC algorithm, which is a combination of SMC-ABC and GPO. The proposed algorithm can be a competitive alternative for inference in any model where the likelihood is computationally costly to evaluate/estimate. We compare the proposed algorithm to PMH and SPSA for inference in SV models using both synthetic and real-world data. We demonstrate that GPO-SMC-ABC: (i) provides good posterior approximations using only a small number of evaluations of the posterior, (ii) reduces the computational time by between one and two orders of magnitude compared with PMH, (iii) exhibits good robustness to the ABC approximation and noise in the estimates. Furthermore, we demonstrate how to make use of the proposed algorithm to estimate the VaR in two different portfolios of assets. In these cases, the proposed algorithm can estimate the VaR in about an hour compared with a day when using PMH. Hence, this can enable routine inference in SSMs with intractable likelihoods, which opens up for investigating their use in real-world applications.
Work related to the proposed algorithm are presented in e.g. Dahlin and Lindsten (2014), Gutmann and Corander (2016) and Meeds and Welling (2014). In the first two works, the authors aim to obtain a maximum likelihood estimate and MAP estimate using GPO, respectively. In the present work, we would like to approximate the entire posterior and not only the parameter that maximises the value of the likelihood or posterior. Moreover, compared with Meeds and Welling (2014), the uncertainty encoded into the surrogate function is utilized to determine the next point in which to sample the log-posterior. Finally, we consider SSMs with intractable likelihoods and therefore employ SMC-ABC algorithms within the GPO algorithm to estimate the log-posterior. SMC-ABC often give estimates with a larger variance, which makes inference more challenging compared with tractable SSMs considered by Dahlin and Lindsten (2014).
We continue with Section 2, where an overview of the proposed algorithm and its components are presented. Sections 3 and 4 discuss the details of these components and the resulting algorithm is presented in Section 5. We conclude the paper with an extensive numerical evaluation in Section 6 on simulated and real-word data, and some final remarks and future work in Section 7.
Our aim is to construct a Laplace approximation of the parameter posterior (2),
denotes the Gaussian distribution with meanand covariance matrix . denotes the estimate of the Hessian of the log-posterior at the posterior mode,
denotes the recorded observations. This can be seen as a Gaussian approximation around the mode of the posterior motivated by the Bernstein-von Mises theorem, which states that the posterior concentrates to a Gaussian distribution centred at the true parameters with the inverse expected information matrix as its covariance as. Note that, even if this is an asymptotic results it can provide reasonable approximations using a finite number of samples as discussed by Panov and Spokoiny (2015).
We encounter two main problems when constructing the Laplace approximation: (i) the optimisation problem in (4) is difficult to solve efficiently and (ii) is typically difficult to estimate with good accuracy. The first problem is due to the high variance in and computational cost of the posterior estimates. An example of this problem is presented in the left part of Figure 1. Here, we present the log-posterior estimates obtained by SMC-ABC over a grid of in (16). The high variance typically results in a slow convergence of parameter inference algorithms such as SPSA and PMH. It is also difficult to speed up the convergence by using gradient information as this requires running computationally expensive particle smoothing algorithms.
Instead, we propose to circumvent these problems by optimising a smooth surrogate function that mimics the log-posterior distribution similar to Wood (2010). We can then obtain a Laplace approximation using standard optimisation methods as the surrogate function is smooth and cheap to evaluate. The surrogate function is obtained by GPO algorithm, which is a specific instance of Bayesian optimisation (Brochu et al., 2010). The surrogate function sequentially is updated using samples from the log-posterior. In this paper, we make use of the predictive distribution of a Gaussian process (GP) as the surrogate function. Hence, we can encode certain prior knowledge regarding the log-posterior into the GP prior, which reduces the number of samples required to explore the log-posterior.
The resulting algorithm iterates three steps. At the th iteration:
compute an approximation of the log-posterior at the parameter denoted using SMC-ABC.
construct a surrogate function by a GP predictive posterior using .
evaluate the acquisition rule to determine .
The GPO algorithm then returns an estimate of the posterior and its uncertainty. There are two major advantages of GPs for estimating the posterior density. First, the uncertainty quantification can be used to develop so-called acquisition rules that explores areas with large uncertainty and exploits the information about the possible locations of the mode of the posterior. This typically results in a rapid convergence of the algorithm, which limits the number of posterior samples required and hence also decreases the computational cost. This cannot be done using a spline model for the posterior. Secondly, the GP can handle noisy function evaluations in a natural manner.
In this section, we discuss how to estimate the log-posterior that is required for carrying out Step (i) of GPO-SMC-ABC. Remember that the main difficulty is that the log-posterior is intractable but is can be estimated using a family of methods known as particle filters. We focus on a specific version of this procedure which is known as the bootstrap particle filter (bPF).
We obtain an estimate of the marginal filtering distribution from the bPF as
where and denotes the particle at time and its normalised weight, respectively. Here, denotes the Dirac measure placed at . As we shall see, the particle system can also be used to obtain estimates of the log-posterior.
The main problem with applying the bPF is that we assume that cannot be evaluated point-wise. To circumvent this, we reformulate the problem using an ABC approach. This is done by augmenting the posterior with an auxiliary variable , which is data simulated from for . The fundamental assumption of ABC is therefore that data generated from should be similar to the observed data if is properly selected.
The augmented posterior can be expressed as
We then assume that can be marginalised with respect to to obtain
when the tolerance parameter is small enough and where denotes the posterior of the perturbed model. Here, we make use of some density with mean and tolerance parameter to compute the distance between the simulated observations and the true observations . A common choice is the Gaussian density .
To make use of this in the bPF, we reformulate the model following Jasra et al. (2012). The observed data is perturbed by
denotes some suitable one-to-one transformation. Moreover, we assume that it is possible to simulate from the model using a transformation of random variables. This corresponds to that we can write, where denotes a function of and . This is an useful construction as it is often possible to generate samples from complicated distributions but not evaluate them point-wise. See Appendix A for how to select and to generate -stable random variables.
By using these two steps, we can rewrite the SSM (1) as
We can now apply a bPF described in Algorithm 1 for this new model, which does not require us to evaluate the point-wise but only that we can simulate . In this paper, we leave the choice of to the user, which can be done using e.g. pilot runs on simulated data. However, it is possible to adapt on-the-fly using the approaches discussed by e.g. Del Moral et al. (2012) or Calvet and Czellar (2015). However in our experience, these methods require a larger value of than un-adapted SMC-ABC to provide log-posterior estimates with a reasonable bias. Therefore, we decided to fix in this paper to obtain computationally efficient algorithms. See Section 7 for some more discussion.
An estimator for the log-posterior of the perturbed model (7) is given by
which use the unnormalised weights from the particle system generated by Algorithm 1.
It is known from Pitt et al. (2012) that the estimator (8) is biased for a finite number of particles but it is consistent and asymptotically Gaussian. Specifically, we have that the error in the log-posterior estimate fulfils a CLT given by
for some unknown variance . As a result, we have an expression for the bias of the estimator given by for a finite number of particles. However, we see that the error is approximately Gaussian for this type of model in the finite sample case by the experimental data presented in the center and right parts of Figure 1.
The log-posterior estimator in (8) is consistent with respect to the perturbed model (7) but not the true model. Dean and Singh (2011) show that the perturbation results in a bias in the parameter estimates (compared with the unperturbed model) that decrease proportional to under some regularity assumptions. Furthermore, the asymptotic Gaussianity of the estimator and a Bernstein-von Mises-type theorem holds for a small enough . This is an important fact to motivate the Laplace approximation as discussed in Section 2. We return to a numerical investigation of these properties in Section 6.1.
In this section, we briefly discuss Steps (ii) and (iii) of GPO-SMC-ABC, where we construct a surrogate function to mimic the log-posterior. See Brochu et al. (2010) for more details.
and can be seen as a generalisation of the multivariate Gaussian distribution to an infinite dimensional setting. A realisation drawn from a GP can therefore be seen as an infinite vector of values or as a function over the real space. To construct the surrogate function, we assume a priori that the log-posterior is distributed according to
Note that this does not correspond to an assumption that the log-posterior of the parameters in the SSM is Gaussian. Here, denotes a GP with mean function and covariance function defined by
The mean function specifies the expected value of the process and the covariance function specifies the dependence between any pair of points on the log-posterior function. The covariance function depends on a set of hyperparameters, such as thelength scale that controls the dependence, see Rasmussen and Williams (2006) for details. From the CLT in (9) and Figure 1, we know that the error in the log-posterior is approximately Gaussian,
where denotes some unknown variance estimated in a later stage of the algorithm. Consequently, we have that the predictive posterior for any test point is given by
where we have introduced the notation for the information available at iteration . The surrogate function of the log-posterior is then given by (12).
The major cost in computing the predictive posterior is incurred by the matrix inversion which is proportional to . Hence, sparse formulations of the GP can be useful to decrease the computation cost for large .
The surrogate function given by the GP predictive distribution gives us the estimate of the log-posterior and its uncertainty. As previously mentioned, this is useful information for creating an acquisition rule to balance exploration and exploitation. We can then determine the next point in which to sample the log-posterior by
where denotes a search space defined by the user. In this paper, we make use of the expected improvement (EI) due to its general good performance in numerical evaluations(Lizotte, 2008). To derive the EI rule, consider the predicted improvement defined as
where denotes the maximum value of for the sampled points . Here, we follow Lizotte (2008) and introduce as a parameter that controls the exploitation/exploration behaviour. Hence for , we have that is the difference between the posterior and the maximum value it assumes in the sampled points. Therefore, it is positive for points where the log-posterior is larger than the current peak and zero for all other points. The EI rule is obtained by computing the expected value of (13) with respect to (12). This results in the acquisition rule given by
where and denotes the density and distribution function of the standard Gaussian distribution, respectively. Here, we jitter to the solution of the optimisation problem by adding Gaussian noise with covariance . In practice, this improves the exploration and increases the accuracy of the obtained parameter estimates. Jittering is also advocated by Bull (2011) and Gutmann and Corander (2016) to increase the convergence rate of GPO.
The optimisation in (14) is possibly non-convex but it is cheap to evaluate the objective function as it only amounts to evaluating the GP predictive posterior in one point. We make use of the the gradient-free dividing rectangles (DIRECT; Jones et al., 1993) to solve (14) over , which is determined from the support of the prior distribution .
GPO-SMC-ABC (Algorithm 2) is obtained by combining SMC-ABC (Algorithm 1) to approximate the log-posterior point-wise and GPO to create a surrogate function that mimics the log-posterior around its mode. In this section, we discuss some user choices and convergence results for the algorithm. See Appendix A for the choices made in this paper.
We initialise Algorithm 2 at Line 1 to find some suitable hyperparameters for the GP prior. The hyperparameters are estimated using initial samples from the log-posterior obtained by Latin hypercube sampling. We then execute Algorithm 1 for each of the sampled parameters to obtain by the analogue of Line 4 in Algorithm 2. After the initialisation of the algorithm, we can update the hyperparameters with Line 5 at every iteration or at some pre-defined interval. Estimating the hyperparameters is computationally costly and it is therefore recommended to re-estimate them only at some fixed interval. The algorithm is usually executed for some pre-defined number of iterations or until the EI is smaller than some , i.e. until satisfies .
From Algorithm 2, we obtain the predictive posterior mean function , which hopefully is an accurate surrogate for the log-posterior. We can then proceed to extract the MAP estimate defined by (4). As the surrogate is smooth and cheap to evaluate, we can carry out the optimisation using standard methods such as the DIRECT algorithm to find the mode. The estimate of the Hessian of the log-posterior can be computed by a finite-difference scheme or by analytically computing the Hessian of when possible. Moreover, it can be obtained using a quasi-Newton algorithm to solve (4).
There are only a limited number of results regarding the convergence properties of the GPO algorithm in the literature. Most of the properties have been studied numerically by benchmarking the GPO algorithm against alternatives on a large number of optimisation problems. However, some theoretical results are discussed by Bull (2011) and Vazquez and Bect (2010). They conclude that GPO using the EI rule samples the log-posterior densely if it is continuous with respect to the GP prior. Also, GPO achieves an optimal convergence rate of the order for the Matérn 5/2 covariance function, where and denote the number of samples and parameters to infer, respectively.
In this section, we provide four illustrations of the properties and advantages of the proposed algorithm. The implementation details are collected in Appendix A and the source code, data and results from runs are available from https://github.com/compops/gpo-abc2015/.
Consider the stochastic volatility model with Gaussian log-returns (GSV) given by
with parameters . Here, the latent log-volatility is assumed to follow a mean-reverting random walk with mean , persistence
and standard deviation of the increments. We generate a single synthetic data set from this model with observations, parameters and initial state .
This model is interesting since it allows us to compare the GPO-SMC-ABC algorithm to an algorithm that uses standard SMC to estimate the log-posterior as we can evaluate in closed-form. We refer to this version of the algorithm as GPO-SMC and it corresponds to the GPO-SMC-ABC algorithm with tolerance parameter .
In the left part of Figure 2, we present the posterior estimates from GPO-SMC (solid curve) and PMH (histogram). We begin by observing a good fit of the Laplace approximation from GPO-SMC to the histogram approximation obtained by PMH; both the location and the spread of the posterior approximations are similar. Using GPO-SMC results in a speed-up of about times () compared with PMH. The main computational cost for both algorithms is incurred by running the particle filter (one run per iteration). The overhead for the proposed algorithm (estimating hyperparameters in the GP and computing the predictive posterior) is negligible compared with the computational cost of running a particle filter.
We now continue with analysing the Laplace approximations obtained by GPO-SMC-ABC. In the right side of Figure 2, we present the posterior approximations obtained by varying the tolerance parameter between and to study the bias and robustness of the approximation. Darker shades of grey indicate a larger tolerance parameter. For and , we see that the approximations are rather poor with a significant bias and bad fit to the spread. However for the other choices of , the approximations from GPO-SMC-ABC converges quickly to be similar to GPO-SMC (solid curve). From these results, we conclude that we have a bias in the posterior approximation when is too small and the approximation tends to grow wider as increases. Hence, a bias-variance trade-off is introduced into the posterior approximation depending on the value of .
We continue by comparing GPO-SMC to SPSA, where the latter is a gradient-free alternative with good convergence properties in many applications. SPSA operates by constructing a finite-difference approximation of the gradient at each iteration after which it takes a step in the gradient direction. Note that SPSA requires two log-posterior estimates at each iteration compared with only one sample in the GPO-SMC. Another possible drawback with SPSA is that it only provides the MAP estimate and no quantification of the posterior uncertainty.
In Figure 3, we compare the MAP parameter estimates of two algorithms as a function of the number of log-posterior estimates. The first samples of the log-posterior are used to estimate the hyperparameters of the GP prior. After this initial phase, GPO-SMC converges quickly to reasonable values of the parameters using about half the number of posterior samples. This results in a speed-up of factor when using the proposed algorithm compared with SPSA.
In the upper part of Figure 4, we present the log-returns of future contracts on coffee during the period between June, 2013 and December, 2014. The data seem to indicate the presence of jumps around the first half of . This is common in financial data and can be modelled in a number of different ways. Here, we consider the stochastic volatility model with -stable log-returns (SV) given by
with parameters and denoting a zero-mean symmetric -stable distribution with stability parameter and scale . The stability parameter determines the tail behaviour of the distribution, see Nolan (2003) for a discussion of the -stable distribution and its properties. The likelihood is in general intractable for this model and therefore only PMH and GPO-SMC-ABC can be applied for this model.
In the middle and lower part of Figure 4, we compare the posterior approximations obtained with GPO-SMC-ABC (solid curves) with independent runs and PMH (histograms). We see that the mixing of PMH is quite poor for this model as the histograms are peaky
. This is a common problem as Markov chains tends to get stuck if the log-posterior estimates are noisy. However, the posterior estimates overlap and seems to give reasonable parameter values for each run of GPO-SMC-ABC. The main difference is in the estimate of, which could be the result of the ABC approximation or problems with observability. Finally, the estimate the log-volatility (black) seems reasonable when compared to the log-returns.
We follow Charpentier (2015) to construct a copula model to capture the dependency structure between prices of oil future contracts. The data that we consider is presented in Figure 5 and consists of weekly log-returns between January 10, 1997 and June 4, 2010 of Brent (produced in the North Sea), Dubai (produced in the Persian Gulf) and Maya (produced in the Gulf of Mexico) oil. We partition the data set into two parts and make use of the first data points for estimating the SV and copula models. The remaining data points are kept for validating the model by back-testing.
Stage 1 (repeated for each asset )
We adopt the commonly used two-stage approach to copula modelling outlined in Algorithm 3, where marginal models are first fitted separately to each of the log-return series, and then combined using a copula to model the dependency structure (Joe, 2005). We make use of the procedure in Section 6.2 to estimate the parameters of an SV model for each type of oil. The filtered residuals (17) are assumed to be independent and identically distributed as by (16) if is known. Stage 1 is carried out independently for each asset and is therefore straight-forward to implemented in a parallel manner. The use of the proposed algorithm decreases the computational time for this stage from hours or days for each asset to about half an hour compared with PMH.
We follow McNeil et al. (2010, p. 231-231) to model the dependency in the residuals. This amounts to applying a probability transform via the empirical CDF on the residuals into the -dimensional hypercube, see right part of Figure 5. The transformed residuals are then combined by a Student’s
-copula are estimated using MAP and matching of moments via Kendall’s, respectively.
Finally, we make use of the copula models and their margins to estimate the VaR for each type of oil. The VaR at confidence is defined by
i.e. the smallest loss such that probability that the loss (the negative log-return) exceeds is no larger then . We adopt a Monte Carlo approach to estimate
by: (i) simulating from the copula, (ii) obtain simulated filtered residuals by applying a quantile transform based on the empirical CDF, (iii) computing the resulting log-returns by the estimated volatility and the inverse of (17). We repeat this times for each asset and then compute the empirical quantile corresponding to for an equally weighted portfolio. However, more advanced portfolio weighting schemes can be implemented.
The resulting VaR-estimate is presented Figure 6, where we compared the SV model with a GSV model (15) estimated using GPO-SMC in a similar manner. We note that the estimates from the two models are quite different especially at the end of the data series. Back-testing on the validation data gives violations for both models and the expected number of violations is . The GPO-SMC-ABC algorithm requires around minutes to infer the model for each of the three assets, where the inference is straight-forward to run on parallel architecture. The corresponding computational time required by the PMH algorithm would be in the order of hours.
We offer a final numerical example to illustrate that the proposed method can be applied to large portfolios as well. The data that we consider is the 30 industrial portfolio provided by Kenneth French. The portfolio consists of monthly log-returns of different industrial sectors during the period September, 1926 to December, 2015. Again, we partition the data into an estimation set and a validation set with and data points, respectively.
We adopt the same procedure as in Section 6.3 and the results are presented in Figure 7. The conclusions are similar with both VaR estimates being quite similar. Back-testing gives and violations for the SV and GSV models, respectively. The expected number of violations is . The computational speed-up with GPO-SMC-ABC compared with PMH is similar to in Section 6.3 as the number of observations per asset are similar.
We have proposed GPO-SMC-ABC, an algorithm to approximate the posterior distribution in SSMs with intractable likelihoods. The illustrations provided in Section 6 indicate that the proposed algorithm is quite accurate and exhibits a substantial decrease in the computational cost. We obtain similar posterior estimates to PMH with a speed-up of one or two orders of magnitude, which reduces computational time from tens of hours to tens of minutes. Moreover, GPO-SMC-ABC is more robust to noise in the log-posterior estimates, which typically results in that the PMH algorithm gets stuck at times and that the SPSA algorithm converges slowly. Overall, this shows that the proposed algorithm is an efficient inference method that makes it possible for practitioners to use models with intractable likelihoods, such as copula models with -stable margins, in applied work.
Future work includes: (i) adopting a sparse presentation of the GP, (ii) developing new acqusition functions, (iii) making use of better tailored covariance functions and (iv) incorperating adaptive approaches for choosing the tolerance level . Some ideas for sequential and sparse representations of GPs are discussed by Huber (2014) and Bijl et al. (2015). It would also be interesting to design new acquisition functions to obtain good estimates of the Hessian of the log-posterior or higher order moments at the same time as the MAP estimate. This could improve the Laplace approximation of the parameter posterior and open up for alternative posterior approximations such as using the skewed Student’s -distribution. Moreover, an interesting approach for tailored GP priors would be to make use of non-stationary covariance functions (Paciorek and Schervish, 2004). This would capture the fact that the log-posterior often falls off rapidly in some parts of the parameter space and is almost flat in other parts.
Finally, it would be beneficial to include some adaptive approach to select in the SMC-ABC algorithm to decrease the number of choices for the user. As previously discussed, we initially implemented the adaptive algorithms proposed in Del Moral et al. (2012) and Calvet and Czellar (2015). However, we ran into problems when using them to approximate the log-posterior values using moderate values of . The estimates exhibited a large bias that resulted in biased estimates of using PMH. Therefore, we choose not to adapt to be able to keep much smaller. It is therefore interesting to explore other adaptive schemes that provide good estimates of the log-posterior using a moderate value of .
This work was supported by: Probabilistic modeling of dynamical systems (Contract number: 621-2013-5524) and CADICS, a Linnaeus Center, both funded by the Swedish Research Council. The simulations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at Linköping University, Sweden. JD would like to thank Fredrik Lindsten, Neil Lawrence and Carl-Henrik Ek for fruitful discussions. Thanks also to Joerg M. Gablonsky, Abraham Lee, Per A. Brodtkorb and the GPy team for making their Python implementations available.
GPO algorithm: We make use of the GPy package (The GPy authors, 2014) for calculating the GP predictive posterior and estimating the GP prior hyperparameters. In this paper, we assume a zero prior mean function and a combination of the bias and the Matérn 5/2 covariance functions. This choice corresponds to a prior for the log-posterior with some non-zero mean and two continuous derivatives. These are reasonable assumptions as this kind of smoothness is assumed in the Laplace approximation. We estimate the hyperparameters using empirical Bayes (EB), i.e. by optimising the marginal likelihood with respect to .
For the acquisition rule in (14), we use and . We initialise the GPO algorithm using samples obtained using Latin hypercube sampling with the implementation written by Abraham Lee available at https://pypi.python.org/pypi/pyDOE. The optimisation problems in Lines 8 and 11 in Algorithm 2 are solved using the DIRECT implementation written by Joerg M. Gablonsky, available from https://pypi.python.org/pypi/DIRECT/. Finally for Line 12, we make use of the Python implementation by Per A. Brodtkorb available at https://pypi.python.org/pypi/Numdifftools.
Section 6.1: We use particles in GPO-SMC and particles with the Gaussian density with tolerance level and in GPO-SMC-ABC to produce the results in Figure 2. We run the GPO algorithms for iterations after the initialisation and re-estimate the hyperparameters of the GP prior every th iteration. The search space for the GPO algorithm is given by , and . We use the following prior densities
where denotes a truncated Gaussian distribution on ,
denotes the Gamma distribution with mean.
For PMH, we make use of the SMC-ABC algorithm with particles to estimate the log-posterior. We initialise PMH in and run the algorithm for iterations (discarding the first as burn-in). The parameter proposal is selected as
which results from an asymptotic rule-of-thumb, see Dahlin and Schön (2015), with an estimate of the posterior covariance from a pilot run.
For SPSA, we make use of particles and follow Spall (1998) to select the hyperparameters as , , , and using pilot runs.
Section 6.2: The real-world data is computed as , where denotes the price of a future contract on coffee obtained from https://www.quandl.com/CHRIS/ICE_KC2. We follow Yıldırım et al. (2014) and apply to stabilise the variance of the likelihood (and gradient estimate). A two step approach is applied to sample from the zero-mean symmetric -stable distribution . Firstly, we sample and . Secondly, we obtain a sample (when ) by applying the transformation
see Nolan (2003) for more information on how to generate -stable random numbers.
We use particles with the Gaussian density with tolerance level in SMC-ABC to estimate the log-posterior. We run the GPO algorithm using the same settings as before but add to the search space and to the prior distributions, where
denotes the Beta distribution. We initialise PMH inand the parameter proposal is selected using a pilot run as
Section 6.3 and 6.4: The data sets are downloaded from http://freakonometrics.free.fr/oil.xls and http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html, respectively. We use particles in the SMC and SMC-ABC algorithms and keep the remaining settings as Sections 6.1 and 6.2. For the stock portfolio example, we change the search space of to as the weekly log-returns in this data set can be much larger than the daily log-returns in the other data sets. The MAP estimate of the degrees of freedom in the copula is obtained by a quasi-Newton solver using a uniform prior.
Journal of Machine Learning Research, 12:2879–2904, 2011.
Gradient free parameter estimation for hidden Markov models with intractable likelihoods.Methodology and Computing in Applied Probability, 17(2):315–349, 2015.
Journal of Multivariate Analysis, 94(2):401–419, 2005.
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), Quebec City, Canada, July 2014.