1 Introduction
Markov chain Monte Carlo (MCMC) methods are essentially used to perform Monte Carlo integration, which has become a standard statistical tool. Specifically, MCMC methods produce samples from a target distribution by using an ergodic Markov chain with stationary distribution . Typically, MCMC methods are used when it is difficult to sample from the target distribution directly, e.g. when the normalisation constant is unknown. There are many ways to construct this Markov chain which have lead to many variations of MCMC methods; see, e.g., Brooks et al. (2011).
The classic MCMC method is the MetropolisHastings algorithm (Metropolis et al., 1953)
. At each iteration, the MetropolisHastings algorithm is designed to update the entire current state (i.e. all components of the random vector generated at the previous iteration) at once. However, updating individual components, or subsets of components, is possible. Indeed, this type of componentwise updating was initially proposed in
Metropolis et al. (1953), but did not receive much attention at first. In this paper, we focus on updating individual components and refer to it as the singlecomponent MetropolisHastings algorithm.Another variant of MCMC sampling is the multipletry method (Jun S. Liu, 2000) where several proposals or trials are suggested at each iteration, as opposed to a single proposal. The motivation behind the multipletry method is that more of the space is explored at the expense of an increased computational cost (i.e. proposal generation and evaluation of acceptance criterion).
Recently, in Yang et al. (2019)
the authors introduce a componentwise, multipletry MCMC method with Gaussian proposals for each trial, where each univariate proposal has a different variance. In this work, we introduce a new class of proposal distributions for use in a componentwise, multipletry MCMC method. These proposals, called
Plateau distributions, do not overlap to exploit the multipletry nature of the method. Indeed, by using proposals that do not overlap for each trial, the Markov chain is forced to explore different parts of the statespace. Conversely, using proposals that overlap, e.g. Gaussians with different variances, can lead to an inefficient algorithm, as the trials tend to be from a similar region of the statespace.Using Plateau distributions for the trial distributions leads to better exploration of the space over standard Gaussian proposals. The idea is intuitive, easy to implement and leads to good results, in the sense of exploring the state space. Moreover, using the Plateau proposals leads to a reversible Markov chain with the target distribution as its invariant distribution e.g. see Yang et al. (2019).
As is common for proposals used in MCMC methods, the Plateau distributions have parameters that need to be selected and tuned with respect to the target distribution to obtain an effective MCMC algorithm. Here, we propose an adaptive procedure that can be used to automatically tune these parameters as the MCMC progresses. Adaptation of MCMC methods typically entails tuning
the parameter(s) of a class of proposal distributions, e.g. the variance parameter in a Normal distribution, in order to improve the convergence properties of the Markov chain. For instance, in
Haario et al. (2001), a Gaussian proposal distribution is used whose variance (or covariance) is updated using the previously generated states of the Markov chain.The goal of this work is thus to introduce a general class of proposal distributions that leads to good results for a variety of target distributions by combining improved state space exploration with adaptation to the target distribution. The proposed new class of proposal distributions is for use in a componentwise multipletry MCMC method.
The remainder of this is organised as follows. In Section 2.1 a generic componentwise multipletry algorithm is presented. The novel class of Plateau proposals is introduced and discussed in Section 3. In Section 4 we discuss how to adaptively select the parameter of the Plateau proposals and offer a detailed algorithmic description of the complete method. The performance of our new method is then compared with other MCMC methods in Section 5. Finally, a commentary on improvements and a summary are provided in Section 6.
2 Componentwise Update with MultipleTries
Let
be a probability density function,
, where . Our main interest is to sample from ; this is the target distribution. We assume that sampling directly from is difficult or impossible, for example because may only be known up to a multiplicative constant. In order to sample from we use MCMC methods. Specifically, given the current state of the Markov chain at iteration , the MetropolisHastings algorithm, one of the simplest MCMC methods, proposes a candidate for the chain’s next stateby drawing a random variable from distribution with probability density function (PDF)
from which it is easier to sample from than from the target distribution. That is , where is the conditional density given the current value of, what is commonly known as, the proposal distribution. For example, the randomwalk proposal Metropolis et al. (1953); Hastings (1970) uses a multivariate normal distribution, which we will write as with given and being the identity matrix. Another example is with fixed, which is the proposal used in the Metropolisadjusted Langevin algorithm Roberts and Tweedie (1996). The realisation of the Markov chain’s next state is then selected by means of an acceptreject procedure in a way such that the resulting Markov chain’s stationary distribution is .One of the main difficulties when using an MCMC method is the choice of the proposal distribution. In particular, the choice of the proposal may significantly affect the properties of the MCMC method, including the speed of convergence to equilibrium and its mixing properties Rosenthal (2011). Typically, the proposal distribution is selected from some family of wellknown distributions, e.g., from the family of normal distributions. It is noteworthy that an optimal MCMC performance in high dimensions requires to select the “scale” of the proposal appropriately (e.g. see Roberts and Rosenthal, 2001). This problem of scaling may become increasingly more challenging as the number of dimensions increases.
Instead of proposing a single multivariate candidate from by updating all components of the current state simultaneously via the (global) proposal distribution , it is also possible to split the state space into its individual components (or small groups of components) and propose candidates for each component (or group of components) independently. This local (or projected) approach is intuitive, computationally efficient, and reduces the problem of selecting a multidimensional proposal into lower dimensional proposals that are easier to handle. MCMC methods using these types of updates are called componentwise MCMC methods which may use a potentially different proposal distribution per component of the state (Gilks et al., 1995, Ch. 1). In this work we focus on onedimensional, componentwise proposals. We emphasise, however, that the ideas that follow are not restricted to onedimensional components and are general in fact.
Considering independent onedimensional, componentwise proposals is equivalent to the proposal distribution given being separable, in the sense that
(1) 
for any . Here, , with , denotes the onedimensional proposal density given used to draw the candidate for component . That is, the proposed (global) candidate is obtained by sampling each mutually independent from any other . Notice that this onedimensional, componentwise proposal step is identical to the standard MCMC method with , and hence can be considered a natural extension to the multivariate case.
While the componentwise candidate proposal (1) is computationally efficient, by construction it does not account for correlations between the components. Consequently, the proposed candidates may not be good representatives of the target distribution (i.e. most candidates will be rejected, resulting in a very low acceptance rate), if has highly correlated components. Therefore these “uninformed” candidates may lead to a poor state space exploration and thus to a poor performance of the MCMC method. To remedy these defects, we will combine componentwise proposals with multipletries for each component. Specifically, the multipletry technique works by proposing many candidates from a proposal distribution, rather than just a single one, amongst which the “best” one is selected. Each trial may be proposed from a different proposal distribution. Thus, in combination with componentwise proposals, we (independently) generate possible candidates (i.e., trials) independently for each component . Let , , denote the proposal PDF of the th trial for the th component given .
2.1 Generic Procedure of Componentwise MultipleTry Metropolis
A generic componentwise multipletry algorithm is now described – the full pseudocode for generating samples from the target distribution is presented in Algorithm 1. Each MCMC step (or iteration) of the algorithm involves drawing multiple trials and then performing an acceptancerejection step for each component sequentially.
We now describe the intuition behind the steps of the procedure. Suppose that the state of the chain at the beginning of the th iteration is . For the first component (i.e., ), independent trials, , are drawn from . These trials are then weighted according to
where denotes the vector that is identical to except for its th component which is replaced by , that is .
The functions with for any , are nonnegative, symmetric functions in and which are selected by the user. Further it is required that whenever . Each trial , , has an associated weight . A candidate for the first component of the chain’s next state is then randomly selected amongst all trial according to these weights. The selected candidate is then accepted or rejected. The remaining components of the chain’s state are updated in order in a similar fashion; Algorithm 1 illustrates a detailed pseudocode of the corresponding MCMC method.
Specifically, to propose a candidate for the th component of the Markov chain’s next state in step 5 of Algorithm 1 each trial for the component at has an associated weight
As described above, a single trial is then randomly selected with probabilities proportional to . There are a number of choices for in the multipletry literature Jun S. Liu (2000), such as
In work Yang et al. (2019) the authors suggest to use
(2) 
where was used following a simulation study focusing on large moves in the state spaces. In simulations, not reported here, we find that performed best in terms of the mean squared error for a variety of target distributions. Consequently we use with for the remainder of the paper. We note that it is beyond the scope of this paper to propose a particular form for the class of functions due to the problem dependent nature of this choice. Instead we advocate that users perform a trial run with a variety of to determine which is best suit for their application and performance metric.
3 NonOverlapping Proposal Distributions
In principle, any family of proposals can be used in Algorithm 1. However, a careful choice of the proposals can lead to a more efficient algorithm.
Recall that is the th trial proposal distribution for the th component. As mentioned earlier, the motivation of using multipletries is to explore a larger region of the state space than is achieved by using a single proposal. Therefore, it would not be beneficial to use proposals that are similar. To illustrate this, suppose that for a fixed component we use the proposal distribution for with known and take without loss of generality. The probability density functions of these proposals of with are presented in Figure 1.
As illustrated in Figure 1, these proposals are very similar. Indeed, % of ’s density mass lies within the interval
centred around , where
is the inverse of the cumulative distribution function of a standard normal so that
. Suppose that we consider for the second proposal distribution , then for any . That is, draws from and will be located in the same region with high probability. Similar arguments hold for the wider Gaussian proposals. Thus draws from these Gaussian proposals will tend to be similar, thus leading to an inefficient use of the multipletry technique. To avoid sampling similar proposals for each trial, we seek densities which do not overlap (or overlap to a small degree).Specifically, we advocate using proposals of the type illustrated in Figure (b)b
. Each trial for each componentwise proposal distribution combines uniform distributions with exponentially decaying tails. Notice, that this means that the amount of overlapping is controlled through how fast the tails decay. Specifically, we first introduce the PDF
where
denotes the normalisation constant. This PDF is illustrated in Figure (a)a. For each component with given value , we then set the PDF of the each trial proposal as
for some values of . The trial proposals shown in Figure (b)b correspond to , and . We shall refer to the proposals of this type as Plateau proposals given the shape of their PDFs. The parameter controls the width of the central plateau centred at the current state . The parameter is the width of the other plateaus. The value controls the decay of the tails either side of the inner plateaus. The outer tails for the th proposal are described by and .
To compare with the earlier calculations for coverage probabilities for the Gaussian proposals; % of the density of with , , and lies in the interval . Suppose that , then , which is reduced by almost a factor of two compared to the overlapping Gaussian proposal. Further, if then and if then . Thus, the plateau proposals overlap less than the Gaussian proposals example and further, the extent of the overlapping of the proposals is controlled by the values of , , and .
Note that each plateau proposal distribution has a support on . This is to ensure that the support of the target distribution is included within the support of the proposals. In theory, this allows the Markov chain to explore the entire support of the target distribution. In a practical setting, however, by selecting the value of appropriately, the tails of the distribution decay to very quickly, making the inner proposals effectively uniform distributions in view of numerical simulations. Therefore, in practice one could elect to sample from these distributions using direct draws from Uniform distributions to increase computation speed. However, in all the experiments performed in this paper, we sample from the exact Plateau proposal distributions.
Note that the suggested Plateau proposals are very general; the choice of proposal parameters, , , , and , allow for diverse sets of proposals. In the remainder of the paper, we set for some value of , so that the decay of the tails of the outermost proposal, , are the same.
Using the set of Plateau proposals requires that the 4 parameters , , , and are selected. The selected values for these parameters will determine the movement of the Markov chain and thus its performance with respect to a particular target distribution. In principle, one could set the values of the parameters in an adhoc manner, e.g., a manually search over the parameter space until a certain acceptance rate is achieved. A practical approach is to automatically tune the parameters as the algorithm runs. In fact, in Section 4 an adaptation procedure that tunes the proposal parameter, , is introduced. Thus only the 3 parameters , , need to be selected by the user. In simulations we found that values of , and resulted in good performance for a variety of target distributions and have thus been used throughout this work.
4 Adaptation of Plateau Proposals
As is typically the case when working with MCMC methods, the proposals involving parameters need to be appropriately tuned for the algorithm to be effective. Instead of manually tuning the parameters, an automated method can be used to adapted the proposals as the MCMC procedure runs. These adaptive methods use the information revealed so far to tune the parameters of the proposals. For instance, Haario et al. (2001)
proposes updating the covariance of a multivariate Normal proposal using an empirical estimate of the covariance of the target. We now describe a method to adapt the Plateau proposal distributions defined in Section
3.In the innermost forloop in Algorithm 1 (steps 4 to 8) with Plateau proposals , only one trial is selected (step 6). The selected trial is associated with its generating nonoverlapping proposal. Over the Markov chain’s iterations, the frequency at which each trial is selected from a particular proposal can be monitored, which offers additional insight into the state space exploration. We advocate a procedure to update the Plateau proposals to avoid two undesirable scenarios: (i) when the innermost proposal is selected too often; and (ii) when the outermost proposal is selected too often. Scenario (i) suggests that the proposal distributions are too wide, such that trials are regularly being suggested near the previous state of the chain, i.e., the majority of moves are occurring in , when current state’s component is . Conversely, scenario (ii) suggests that the proposal distributions are too narrow, such that the trials are regularly being suggested in the “tails”, far away from the current position .
We suggest the following adaptation of the Plateau proposals to counteract these scenarios. First, adaptation can take place at regular, predefined intervals, of length . Within these intervals each proposal is selected a number of times. Let denote the number of times was selected by the th MCMC iteration.
For scenario (i), if for some , then the width of all the Plateaus is halved and the proposals are shifted closer to to leave no gaps. More precisely, the Plateau proposal parameters are updated as: and .
For scenario (ii), if for some , then the Plateaus widths are doubled and the proposals are shifted away from to leave no gaps. Formally, then the Plateau proposal parameters are updated as: and .
The proposed adaptation is summarised in Algorithm 2 which can be inserted between steps 2 and 3 of the MCMC Algorithm 1. Note that the adaptation operation is performed every iterations. At iteration the adaptation is performed with probability . This ensures that the amount of adaptation reduces the longer the algorithm runs and thus satisfies the diminishing adaptation condition; see, e.g., (Roberts and Rosenthal, 2007).
4.1 A note of convergence of adaptive componentwise multipletry algorithms
The convergence (in total variation distance) of algorithms of the form of Algorithm 1 described in Section 4 has been proven in Yang et al. (2019). The proof of convergence is ensured by the algorithm, both the MCMC algorithm and the adaptation procedure, satisfying two conditions: diminishing adaptation and containment. As mentioned earlier, diminishing adaptation is satisfied by adapting with probability . For containment to hold two technical, but not practical, modifications are required – these follow directly from Yang et al. (2019) and are presented as quotations below with altered notation. The first modification is to
“ … choose a very large nonempty compact set and force for all . Specifically, we reject all proposals (but if , then we still accept/reject by the usual rule)… ”
The second modification which is altered for our proposed Plateau distributions is
“ … choose a very large constant and a very small constant and force the proposal width to always be in … ”
The proof then follows Section 3.5 in Yang et al. (2019).
5 Results
In this section, we compare the performance of the proposed adaptive componentwise, multipletry MCMC method with other MCMC methods in simulations. The adaptive Plateau MCMC (AP) is compared a MetropolisHastings algorithm with Gaussian proposals (MH) and two versions of an adaptive Gaussian MCMC (AG1 and AG2) as introduced in Yang et al. (2019). The difference between AG1 and AG2 is that AG1 uses (2) with (i.e., the same weight as in AP), while AG2 uses as is suggested in Yang et al. (2019).
The proposal distributions in both AG1 and AG2 are adapted as follows: If the proposal with the largest variance is under(over)selected then it is halved (doubled). Conversely, if the proposal with the smallest variance is under(over)selected then it is doubled (halved). After either of these two updates, the other variances are adjusted to be equidistant on a logscale. For the complete algorithm with this adaptation scheme see Yang et al. (2019).
For all simulations and methods with multipletries we fix the number of trials . Investigation of the methods performance for differing values of is beyond the scope of this paper, which we leave for future work. However, interesting discussions in that direction already exist, see Martino and Louzada (2017) for example.
The proposal variances in the AG method are initialised at for . The Plateau parameters values are initialised at , and with for the AP method.
The proposal distributions used in the MH method depend on the particular target distribution and the choices used are summarised in Table 2 below. The various target distributions considered in the simulations are now introduced.
5.1 Target Distributions
In order to compare the aforementioned methods, we investigate their performances by applying them to sample from a variety of target distributions.
Mixture of Gaussians
Consider a mixture of two 4dimensional Gaussians
where
and
We refer to this target distribution as .
Banana Distribution
Consider the 8dimensional “bananashaped” distribution (Haario et al., 1999), which is defined as follows. Let be the density of the D normal distribution with covariance given by . The density function of the banana distribution with nonlinearity parameter is given by where the function is
The value of determines the amount of nonlinearity of . Here, we consider the target distribution . Thus, the first 2 components of this distributions are highly correlated – see Figure (a)a.
Distributions perturbed by oscillations
Another target we consider is the perturbed 2dimensional Gaussian, whose probability density function is given by
where
Figure (c)c displays the unnormalised function . Lastly, we also consider the following perturbed version of the 1D bistable distribution , whose PDF is given by
Figure (d)d displays the PDF of where the normalising constant is approximated by numerical integration.
5.2 Run parameters
Each simulation run of the MCMC methods was independently repeated times and for each run a burnin period of % of the MCMC iterations was used. During the burnin period, the AP, AG1 and AG2 were allowed to adapt their proposals. For each repetition, all methods started at the same random initial condition . The number of MCMC steps, , used for each method is presented in Table 1, which was determined by a trial run. In order to make fair comparisons, the number of MCMC steps performed by MetropolisHastings algorithm is times larger than the multipletry versions. This is because multipletry methods cycle over all components each iteration. This will ensure that the number of times the target distribution is evaluated by each MCMC method is the same and that the computational effort is approximately the same.
Target  Dimensions ()  Adaptive  MH 

4  4,000  16,000  
8  10,000  80,000  
2  3,000  6,000  
1  3,000  3,000 
The proposal distribution in the MH method for each target is presented in Table 2. Note that these proposals are based on the target distribution, which would be typically be unknown in practice. Consequently, the MH method can be viewed as being optimally tuned. The particular scaling of follows from Gelman et al. (1996).
Target  Proposal Distribution 

5.3 Simulation Results
For each target distribution, we compare the performance of the MCMC methods using measures which we now define. Denote the Markov chain produced by one of the MCMC methods for the th independent repetition as where for . Denote the componentwise variances of the target as .
We will use the integrated autocorrelation time (ACT) of the MCMC methods as a measure of performance. The ACT for the chain’s th component is given by
provided that the chain is stationary so that . For every repetition of the MCMC method, the integrated autocorrelation times are estimated based on the observed Markov chain componentwise using the initial sequence estimator introduced in Geyer (1992). With slight abuse of notation, we will denote the resulting chainbased autocorrelation times by .
Smaller autocorrelation times indicate that consecutive samples have lower correlation. Autocorrelation times are inversely proportional to the effective sample size Kong (1992); Kong et al. (1994)
, which is commonly used as a measure of performance. In fact, the effective sample size is often interpreted as the number of samples that would need to be (directly) drawn from the target in order to achieve the same variance as that from an estimator of interest using independent samples. Higher effective sample sizes and therefore lower autocorrelation times are desirable. Another way of interpreting the ACTs is through the accuracy of a chainbased Monte Carlo integration. As a matter of fact, the mean squared error of a Monte Carlo estimator can be expressed as a sum of the componentwise ACTs weighted by the componentwise variance. Consequently, a method with lower ACTs will offer more accurate Monte Carlo integration for the same chain length. It is moreover noteworthy that an MCMC method’s ACTs also characterise the asymptotic variance (socalled timeaveraged variance constant in this context) of an Monte Carlo estimator in the central limit theorem for Markov chains
(see e.g. Asmussen and Glynn, 2007). In fact, lower ACTs will lead to smaller timeaveraged variance constants. Finally, we mention that in practice, the target distribution is intractable and therefore the variances of the components are unknown. However, these variances can be estimated by the initial sequence estimator method.Another measure of performance is the chain’s average squared jump distance (ASJD), which, for the th component and repetition , we define as
The average squared jump distance measures the movement of the chain and also is linked with the acceptance rate of the MCMC method. Higher values of average squared jump distances are desired as it indicates larger moves and therefore more exploration of the space. We shall consider the ASJD as a measure of a MCMC method to move around the statespace.
In summary, in the following results we are interested in the ACTs and the ASJD per component. The distribution of the ACTs and ASJDs over the repetitions, i.e. and , will be presented using violin plots Hintze and Nelson (1998)
which are boxplots with kernel density estimates attached to the sides. We use violin plots as opposed to just boxplots to give a better illustration of the shape of the distributions. In the violin plots the median is represented by a horizontal line.
The corresponding results of the AP method without the adaptation procedure are presented in Appendix A for the reader’s convenience. These results indicate that the adaptation procedure is necessary for the MCMC method to effectively explore components with large or small variance relative to the chosen width of the Plateau proposals.
5.3.1 Mixture of Gaussians
The results for the 4dimensional mixture of two Gaussians target, , are presented in Fig. 4. Fig. (a)a indicates that the AP method achieves lower ACTs than the other methods for all components (including the MH method which is not included in this figure due to very high ACTs). Further, the range of ACT values suggest that the AP method consistently produces MCMC chains with lower ACTs.
In terms of the movement of the MCMC chains, the AP method outperforms the other methods for this target distribution. In fact, the ASJDs presented in Fig. (b)b show that the AP method moving around the statespace in larger jumps than the other methods. Since the AP and AG1 methods use the same weight function as discussed in Section 2.1 this advantage is due to using the Plateau proposals in contrast to Gaussian proposals. These results suggest that the AP method is able to move between the two Gaussians in the target densities efficiently.
5.3.2 Banana Distribution
The target, , is a difficult distribution to sample from due to the wide ranging variances in each component and the correlations between the first two components as a consequence of the unusual bananashape. The ACTs for the methods, presented in Fig. (a)a, show similar results across the AP, AG1 and AG2 for the first two components. However, the remaining components the AP is achieving notably smaller ACTs. The ACT results for the MH are substantially larger for all components and thus not included in this figure. As an indication, the median ACTs for the components are: 580.77, 903.6, 52.09, 52.55, 52.13, 51.77, 52.27, 51.15 for the MH method.
The ASJDs for methods is shown in Fig. (b)b. The AP method again outperforms the other methods by achieving higher ASJDs for all components. Note that for the first component the wide range of jumping distance produced when using the Plateau proposals. This suggests that the AP method is able to navigate the bananashape in the first component easily.
5.3.3 2D Perturbed Distribution
For the perturbed 2dimensional distribution, , the perturbations represent local modes where MCMC methods may potentially get stuck. Again, the AP method’s ability to move slightly larger distances, as shown in Fig. (b)b, gives it a slight advantage over the other methods. This ability to jump further may explain the lower ACTs for the AP method – see Fig. (a)a.
5.3.4 1D Perturbed Distribution
Similar to , the oscillations in are potential areas where an MCMC may get stuck. The ACTs for the AP, AG1 and AG2 method are presented in Fig. (a)a
. The AP achieves the lowest ACTs, however there are a few outliers which may indicate a few runs where the sampler got stuck in the local modes. This may also be the case for the AG1 method. For the MH method, the ACTs (not presented in the figure) are extremely large in comparison to the the other methods – with a median of 145 and a range of
.The ASJDs for the AP method, on average is jumping also twice the distance of the AG1 and AG2 methods – see Fig. (b)b. Again there are some outlying ASJDs for the AP method which may indicate some repetitions where the MCMC got stuck in local modes.
6 Conclusion
In this paper we have introduced Plateau distributions as a novel class of proposal distributions for the use with componentwise, multipletry Metropolis MCMC. These proposal distributions can be thought of as a combination of uniform distributions, leading to a family of distributions with not overlapping support. The notion of using nonoverlapping proposals in multipletry MCMC methods is intuitive and, in fact, motivated as means to counter the disadvantages (e.g. inefficient proposing of trials) of greatly overlapping proposal distributions such as Gaussians. Moreover, the class of Plateau distributions are simple to implement for use as proposals in MCMC methods and can straightforwardly combined with the simple, yet highly effective, adaptation procedure presented in this paper as well.
We have demonstrated that using the Plateau proposal distributions with the suggested adaptation leads to MCMC methods that perform well for a variety of target distributions. In fact, the results indicate that these methods provide better MCMC chains, in terms of lower autocorrelation times and exploration of the statespace, compared to other adaptive multipletry methods with greatly overlapping Gaussian proposals. Furthermore, the simulation results suggest that the Plateau proposals are able to efficiently sample from target distributions with distance modes, complex shapes, and many nearby modes.
The results and the simplicity of their design makes the Plateau proposals appealing for general use in componentwise, multipletry MCMC algorithms. As a matter of fact, the introduced class of Plateau distributions is one type of nonoverlapping proposals. Further research may investigate other types of nonoverlapping proposals which may have multiple interacting trials (e.g. see Casarin et al. (2013)) and may be asymmetric.
Appendix A On the effects of adapting the Plateau distribution
In this appendix we present the same results for the AP method, as shown in Section 5, along with the results for the componentwise multipletry MCMC using Plateau proposals without the adaptation method. We refer to the nonadapting MCMC method as nonAP. The initial proposal parameters are the same for both the AP and nonAP methods.
Figs 8 to 11 display the ACTs and the ASJDs for the target distributions , , , and . In general, using the adaptation procedure gives lower or similar ACTs to the nonAP method and higher or similar ASJDs. An exception is component 3 of , although this effect is negligible since the third component’s ACT is an order of magnitude smaller than the ACTs of the other components. On the other, the most notable improvements in performance when using the adaptation procedure are shown in are: Component 4 for (Fig. 8), Components 3 to 8 for (Fig. 9), Components 1 and 2 for (Fig. 10) and Component 1 for (Fig. 9). The common feature of these components are that they correspond to low valued variances. Since the nonAP does not tune the Plateau proposals, the resulting MCMC cannot scale to components with small variances. The Plateau proposals in the nonAP are relatively wide, meaning that small variances components are not explored effectively. The AP method, however, is able to tune the Plateau proposals according the individual component marginal variances.
References
 Asmussen and Glynn (2007) Asmussen, S. and Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis. Springer.
 Brooks et al. (2011) Brooks, S., Gelman, A., Jones, G. L., and Meng, X.L., editors (2011). Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, FL.
 Casarin et al. (2013) Casarin, R., Craiu, R., and Leisen, F. (2013). Interacting multiple try algorithms with different proposal distributions. Statistics and Computing, 23(2):185–200.
 Gelman et al. (1996) Gelman, A., Roberts, G., and Gilks, W. (1996). Efficient Metropolis jumping rules. Bayesian statistics, 5(599608):42.
 Geyer (1992) Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statistical Science, 7(4):473–483.
 Gilks et al. (1995) Gilks, W. R., Richardson, S., and Spiegelhalter, D. (1995). Markov chain Monte Carlo in practice. Chapman and Hall/CRC.
 Haario et al. (1999) Haario, H., Saksman, E., and Tamminen, J. (1999). Adaptive proposal distribution for random walk Metropolis algorithm. Computational Statistics, 14(3):375–395.
 Haario et al. (2001) Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli, 7(2):223–242.
 Hastings (1970) Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109.
 Hintze and Nelson (1998) Hintze, J. L. and Nelson, R. D. (1998). Violin plots: A box plotdensity trace synergism. The American Statistician, 52(2):181–184.
 Jun S. Liu (2000) Jun S. Liu, Faming Liang, W. H. W. (2000). The multipletry method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449):121–134.
 Kong (1992) Kong, A. (1992). A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Technical Report, 348.

Kong et al. (1994)
Kong, A., Liu, J. S., and Wong, W. H. (1994).
Sequential imputations and Bayesian missing data problems.
Journal of the American Statistical Association, 89(425):278–288.  Martino and Louzada (2017) Martino, L. and Louzada, F. (2017). Issues in the multiple try metropolis mixing. Computational Statistics, 32(1):239–252.
 Metropolis et al. (1953) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087–1092.
 Roberts and Rosenthal (2001) Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various MetropolisHastings algorithms. Statistical Science, 16(4):351–367.
 Roberts and Rosenthal (2007) Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte carlo algorithms. Journal of Applied Probability, 44(2):458–475.
 Roberts and Tweedie (1996) Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363.
 Rosenthal (2011) Rosenthal, J. S. (2011). Optimal proposal distributions and adaptive MCMC. In Brooks, S., Gelman, A., Jones, G., and Meng, X.L., editors, Handbook of Markov chain Monte Carlo. CRC press.
 Yang et al. (2019) Yang, J., Levi, E., Craiu, R. V., and Rosenthal, J. S. (2019). Adaptive componentwise multipletry Metropolis sampling. Journal of Computational and Graphical Statistics, 28(2):276–289.
Comments
There are no comments yet.