I Introduction
Importance sampling (IS) is a powerful Monte Carlo tool that allows to learn properties of a distribution (called target) by utilizing weighted samples drawn from another distribution (called proposal) [1]. The choice of the proposal distribution in IS algorithms is critical for successful performance. In particular, one issue that can arise in IS when a poorly chosen proposal is used is weight degeneracy
, which relates to the problem of only a few samples having significant weights that contribute to approximation of the target distribution. High mismatch between the proposal distribution and the target distribution yields a high variance IS estimator. Alternatively, one can use a method known as multiple importance sampling (MIS), in which samples are drawn from multiple proposal distributions
[2, 3]. MIS methods have an advantage over static IS methods as they show larger sample diversity and alternative weighting schemes can be applied to reduce the variance of the IS estimator [4, 5, 6]. Adaptive importance sampling (AIS) methods were introduced in order to adapt the proposal distribution through an iterative learning algorithm [7]. While AIS methods have shown success in many practical applications [8, 9], major challenges of the method become apparent when considering multimodal and highly nonlinear target distributions. Moreover, AIS also suffers from the curse of dimensionality
[10], whereby the issue of weight degeneracy intensifies and leads to reduced sample diversity in subsequent iterations of the algorithm.Some advances have been made to AIS schemes to increase the diversity of the generated samples and limit the effects of weight degeneracy. The use of deterministic mixture weights [3] in the population Monte Carlo scheme (DMPMC) [11] significantly reduces the variance of the IS estimator and maintains the diversity of the proposals through each iteration. Unfortunately, the DMPMC algorithm does not adapt the covariance matrices of the proposal distributions. Alternatively, the nonlinear population Monte Carlo (NPMC) scheme applies a nonlinear transformation of the importance weights, similar to the approaches of [12, 13, 14]. The weight transformation guarantees representation of more samples for proposal adaptation in the case of weight degeneracy. However, in high dimensional spaces, the effect of the weight transformation may lead to minuscule adaptation of the proposal. Other AIS methods such as mixture population Monte Carlo (MPMC) [15], adaptive multiple importance sampling (AMIS) [16], and adaptive population importance sampling (APIS) [17], have also been proposed with some variations. Some of these AIS methods also adapt the covariance matrix, but the stability of the update is not guaranteed. In this paper, we propose a novel and robust method for adapting a set of proposal distributions, called covariance adaptive importance sampling (CAIS), which tackles the inherent problems posed by highdimensional targets. Specifically, we condition the covariance adaptation of each proposal on a local measurement of the effective sample size in order to achieve robustness.
Ii Problem Formulation
We introduce the estimation problem in a Bayesian context, with the goal of computing the posterior distribution , where is a
dimensional vector of unknown parameters related to a set of available observations
. It follows from Bayes’ rule that,(1) 
where is the likelihood function of the data, while is a prior distribution of the vector of unknown parameters. The goal is in computing the expectation of some function w.r.t. , such that
(2) 
In particular, we may be interested in computing quantities such as the normalizing constant or the mean of the posterior distribution. To that end, basic implementations of IS approximate using samples drawn from a proposal distribution that are weighted properly.
Iii Novel Covariance Adaptation Strategy
1. Initialization: Select the initial proposal  
2. For  
a. Draw samples from the proposal,  


b. Compute the standard importance weights,  


and normalize them, with  
c. Adapt the proposal parameters and .  
3. Return the pairs for and  
. 
We assume a family of proposal distributions, , where the parameters and are the mean and covariance matrix, respectively. Table I summarizes a basic AIS framework, where the proposal is adapted over iterations.
Iiia General Formulation
For a multivariate normal distribution, the maximum likelihood (ML) estimate of the covariance matrix for
unweighted samples with mean is given by,(3) 
It is wellknown that if and , then and cannot be inverted [18]. Analogous to the case of unweighted samples, suppose that we are given a set of weighted samples . If the weighted mean of the samples is , then the weighted empirical covariance is given by,
(4) 
Consider the set of indices . As , we have that and . If , then (4) yields a singular matrix. For this reason, adaptation of the covariance matrix in AIS is challenging due to the chance of weight degeneracy of the samples at each iteration.
We can characterize the degeneracy of the importance weights in AIS at any iteration through the effective sample size (ESS) [19], which can approximated as,
(5) 
It is important to note that the approximation in (5) is only true under certain conditions [20]. Moreover, other approximations for ESS can be used, such as , though for our experiments we use eq. (5). Under the assumption that at each AIS iteration (i.e. large enough variability in the generated samples), our criterion for robust covariance adaptation rests upon guaranteeing that the ESS at each iteration of the algorithm satisfies some lower bound condition.
IiiB Algorithm Description
In this section, we formalize a novel algorithm, covariance adaptive importance sampling (CAIS), which guarantees robust covariance adaptation for a population of proposal distributions.

Initialization: Initialize the proposal parameters for and set .

Generate samples: Draw samples from each mixand,

Compute local ESS: For each set of samples and normalized weights, , compute the local ESS, , by applying eq. (5).

Update mean: Update the mean of each mixand by taking the weighted sample mean,
(7) 
Check stopping condition: If , then return for and . Otherwise, set and go to Step 2.
IiiC Algorithm Summary
The novel technique adapts both the mean and covariance matrix for a population of proposal distributions. We emphasize that the untransformed weights are used in order to adapt the mean of each proposal, while the covariance adaptation is conditioned on the local ESS of the samples drawn by each proposal. If the local ESS is above a threshold , the untransformed weights are used for the covariance matrix adaptation. Otherwise, the importance weights are transformed using a weight transformation function and the covariance is adapted using the transformed weights, yielding a wider and hence more stable proposal.
IiiD Choice of the Weight Transformation Function
IiiD1 Weight Clipping
One choice for is the clipping function utilized in NPMC. NPMC transforms the weights by building a sorted permutation of the unnormalized weights as
(8) 
and then transforming the subset of the greatest weights as
(9) 
This choice of guarantees that the ESS determined by the transformed weights is bounded below by . Intuitively, in the case of extreme degeneracy, this can be interpreted as adapting the covariance matrix using the ML estimate of the covariance considering only the highest weighted samples.
IiiD2 Weight Tempering
While (9) guarantees that the minimum ESS condition is satisfied, it also forces the largest weights to take identical value, eliminating any information about which of these weights are most representative of the target distribution. An alternative choice, which offers more flexibility, is the weight tempering function,
(10) 
where controls the transformation of the importance weights.
Note that all strategies transform the importance weights but preserve the position of the samples. More advanced mechanisms could be devised by implementing covariance estimators where the samples are also transformed, following the ideas of [21].
IiiE Choice of Parameters
The choice of the parameter depends on the dimension of the target, . In order to guarantee stability of the covariance update, we must have that . However, we note that under poor initialization of the proposal distribution, a choice of large (relative to ) will force a strong nonlinear transformation of the importance weights, regardless of the chosen. Consequently, the adapted covariance matrix will not differ from the one from the previous iteration, resulting in slower convergence to the true target.
The weight tempering function should be such that the ESS of the transformed weights is . In order to find the optimal , it is sufficient to employ a combination of loose grid search and fine grid search, similar to the approach used in [22]. For that is suboptimal (i.e. ), the computational complexity of a standard onedimensional gridsearch is . For a reasonable choice of (not too small), this is overshadowed by the computational complexity of each covariance matrix update, which is .
Iv Numerical Examples
Iva Example 1: Unimodal Gaussian Target
We consider a toy example to demonstrate fundamental differences between existing AIS schemes which adapt the covariance matrix and the novel strategy. In particular, the target of interest is a 10dimensional unimodal Gaussian distribution with
for and a nondiagonal covariance matrix randomly generated from a Wishart distribution. We use a single proposal distribution for each method and initialize the mean as and the covariance matrix as , where is the identity matrix. This is a poor initialization since it is far from the mean of the target distribution. We draw samples over iterations. For the novel covariance adaptation strategy, we fix , which is greater than the dimension . The results are averaged over Monte Carlo runs.Results for this example are shown in Figure 1. We tested four different schemes which continuously adapt the covariance matrix after each iteration of the algorithm, including the basic AIS algorithm from Table I, with proposal adaptation using eq. (4) and (7). In the lefthand plot of Figure 1, we can see that the basic AIS algorithm adapts to a peakyproposal distribution, resulting in limited diversity in the generation of samples due to weight degeneracy. The NPMC algorithm manages to keep the adaptation of the covariance matrix more robust. However, due to the effect of the transformed weights on the adaptation of the location parameter, the proposal distribution remains far from the target. The novel covariance adaptation strategies adapt the best, with the strategy utilizing the weight tempering function performing best.
The center plot shows the sorted eigenvalues of the adapted covariance matrix and compares to that of the target distribution. The basic AIS method results in a covariance matrix whose eigenvalues are practically . The NPMC method manages to keep the eigenvalues nonzero, but fails to adapt to the covariance matrix of the target. The novel strategy works best with either choice of weight transformation function.
The righthand plot shows the evolution of the KullbackLiebler (KL) distance between the target and proposal as a function of iteration of the algorithm. This plot emphasizes the advantage of utilizing the weight tempering function over the weight clipping function. We can see that with the weight tempering function, after just iterations, the KL distance reaches a minimum, whereas the weight clipping function does not reach this value until after iterations.
IvB Example 2: HighDimensional Multimodal Target
Method  
NPMC [best]  8.8445  8.0085  5.7241  2.3912  1.8289  5.6683  10.1791 
NPMC [worst]  16.3521  17.1335  14.8301  12.7081  9.6968  11.3779  16.3428 
APIS [best]  2.6422  0.5863  0.3556  3.2842  12.9770  17.9266  22.6413 
APIS [worst]  3.5907  1.1710  0.6811  5.0860  14.3068  18.5037  23.7821 
DMPMC [best]  3.0323  0.9643  1.2454  5.7601  14.1088  17.6142  21.8480 
DMPMC [worst]  15.2424  16.7463  18.1355  15.1963  18.6797  23.4776  34.9426 
CAIS [C1]  4.7118  2.8164  1.2644  0.3865  0.4045  1.5499  5.9985 
CAIS [C2]  6.7016  2.9347  0.4929  0.1780  0.3214  0.7211  4.6918 
CAIS [T1]  1.7740  0.3834  0.2185  0.2378  0.5632  2.1462  8.8714 
CAIS [T2]  0.5793  0.1931  0.0995  0.1362  0.4035  0.9717  7.9077 
We now consider the following target distribution
(11) 
where , for , and . The covariances matrices, are nondiagonal and were randomly generated from the Wishart distribution. Our objective is to estimate the mean of the target, which is given by, . The prior mean of each proposal distribution is drawn uniformly, such that . This is considered a good initialization since all modes of the target distribution are contained in this hypercube. We assume a prior covariance matrix for each proposal, , where is the identity matrix. We test for a constant of samples, mixands for iterations, where we draw samples from each proposal at each iteration. The results are averaged over Monte Carlo simulations. The simulation results are summarized in Table II. The novel scheme (CAIS) is compared to three different stateoftheart AIS algorithms: NPMC, DMPMC, and adaptive population importance sampling (APIS). The results indicate that the novel scheme outperforms all three schemes under their best parameter configurations for each value of .
V Conclusions
In this work, we addressed the problem of proposal parameter adaptation in adaptive importance sampling methods for highdimensional scenarios. The new proposed methods adapt location (mean) parameters using the standard importance weights, while the adaptation of scale (covariance matrix) parameters are conditioned on the effective sample size and utilize transformed importance weights. This robust adaptation improves the performance of AIS and also allows us for better reconstruction of the target distribution as a mixture of kernels with different adapted covariances. Simulation results show excellent performance of the novel strategies compared to other stateoftheart algorithms. Furthermore, the novel adaptation methods have the potential to extend to other adaptive importance sampling methods, improving their performance for higherdimensional systems.
References
 [1] C. P Robert and G. Casella, Monte Carlo Statistical Methods, vol. 95, 2004.
 [2] E. Veach and L. Guibas, “Optimally combining sampling techniques for Monte Carlo rendering,” Proceedings of the 22nd Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 419–428, 1995.
 [3] A. Owen and Y. Zhou, “Safe and Effective Importance Sampling,” Journal of the American Statistical Association, vol. 95, no. 449, pp. 135–143, 2000.
 [4] V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, “Generalized multiple importance sampling,” arXiv preprint arXiv:1511.03095, 2015.
 [5] V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, “Efficient multiple importance sampling estimators,” Signal Processing Letters, IEEE, vol. 22, no. 10, pp. 1757–1761, 2015.
 [6] V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, “Heretical multiple importance sampling,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1474–1478, 2016.
 [7] M. Oh and J. O. Berger, “Adaptive importance sampling in monte carlo integration,” Journal of Statistical Computation and Simulation, vol. 41, no. 34, pp. 143–168, 1992.
 [8] S. K. Au and J. L. Beck, “A new adaptive importance sampling scheme for reliability calculations,” Structural Safety, vol. 21, no. 2, pp. 135–158, 1999.
 [9] O. Cappé, A. Guillin, J. M Marin, and C. P Robert, “Population Monte Carlo,” Journal of Computational and Graphical Statistics, vol. 13, no. 4, pp. 907–929, 2004.
 [10] T. Bengtsson, P. Bickel, and B. Li, “Curseofdimensionality revisited: Collapse of the particle filter in very large scale systems,” Probability and Statistics: Essays in Honor of David A. Freedman, vol. 2, pp. 316–334, 2008.
 [11] V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, “Improving population Monte Carlo: Alternative weighting and resampling schemes,” Signal Processing, vol. 131, pp. 77–91, 2017.
 [12] E. L. Ionides, “Truncated importance sampling,” Journal of Computational and Graphical Statistics, vol. 17, no. 2, pp. 295–311, 2008.
 [13] E. Koblents and J. Míguez, “A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models,” Statistics and Computing, vol. 25, no. 2, pp. 407–425, 2013.
 [14] A. Vehtari and A. Gelman, “Pareto smoothed importance sampling,” arXiv preprint arXiv:1507.02646, 2015.
 [15] O. Cappé, R. Douc, A. Guillin, J. M. Marin, and C. P. Robert, “Adaptive importance sampling in general mixture classes,” Statistics and Computing, vol. 18, no. 4, pp. 447–459, 2008.
 [16] J. M. Cornuet, JM. Marin, A. Mira, and C. P. Robert, “Adaptive multiple importance sampling,” Scandinavian Journal of Statistics, vol. 39, no. 4, pp. 798–812, 2012.
 [17] L. Martino, V. Elvira, D. Luengo, and J. Corander, “An Adaptive Population Importance Sampler: Learning from Uncertainty,” IEEE Transactions on Signal Processing, vol. 63, no. 16, pp. 4422–4437, 2015.
 [18] O. Ledoit and M. Wolf, “Honey, I Shrunk the Sample Covariance Matrix,” The Journal of Portfolio Management, vol. 30, no. 4, pp. 110–119, 2004.
 [19] A. Kong, “A note on importance sampling using standardized weights,” University of Chicago, Dept. of Statistics, Tech. Rep, vol. 348, 1992.
 [20] L. Martino, V. Elvira, and F. Louzada, “Effective sample size for importance sampling based on discrepancy measures,” Signal Processing, vol. 131, pp. 386–401, 2017.
 [21] L. Martino, V. Elvira, and G. CampsValls, “Group importance sampling for particle filtering and mcmc,” arXiv preprint arXiv:1704.02771, 2017.
 [22] C. Hsu, C. Chang, and C. Lin, “A Practical Guide to Support Vector Classification,” BJU international, vol. 101, no. 1, pp. 1396–400, 2008.
Comments
There are no comments yet.