I Introduction
Markov Chain Monte Carlo (MCMC) is a common tool used in Bayesian inference to draw samples from a probability distribution , where . Applications of MCMC span astronomy [1] to zoology [2]. MCMC involves moving to a state on the distribution at iteration from a state at the previous iteration, with some acceptance probability such that the Markov chain is ergodic (i.e., converges to a stationary distribution), and detailed balance is maintained, such that the stationary distribution of the Markov chain is equal to the target distribution. While many variations of MCMC exist, gradient based methods such as MetropolisAdjusted Langevin (MALA)[3] and Hamiltonian Monte Carlo (HMC)[4]
have rapidly grown in popularity due to their ability to efficiently explore continuous state spaces. HMC introduces a momentum vector
to explore states via the numerical integration of Hamiltonian dynamics and requires two parameters to be tuned to sample effectively. These are (i) the stepsize taken by the numerical integrator, which must be fixed to satisfy detailed balance, and (ii) the number of steps taken between the start and end point of a trajectory. Tuning of the latter has been automated with the advent of the NoUTurn Sampler (NUTS), first proposed in [5], which calibrates the number of steps taken by stopping a trajectory once the path begins to turn back on itself. As a result of its applicability and efficient operation across a range of specific distributions, NUTS is used by popular probabilistic programming languages such as Stan [6], PyMC3[7] and NumPyro [8].Sequential Monte Carlo (SMC) samplers, first introduced in [9], provide a way of realising estimates based on a population of weighted hypotheses (often referred to as samples or particles) which evolve over iterations. In SMC, samples are mutated by means of a proposal distribution which moves the samples around the target space. While the proposal distribution used in the SMC literature is typically a Gaussian randomwalk kernel, this is not a requirement. In this letter we show how the method NUTS uses to explore the target space can be utilised by an SMC sampler. We will refer to this new approach as SMCNUTS. Furthermore, we show how this can be leveraged with a nearoptimal Lkernel, thus improving how the SMC sampler functions.
The rest of this letter is structured as follows. In Section II we present how SMC samplers operate and in Section III we show how the proposal for NUTS can be used as the proposal distribution of an SMC sampler. Section IV presents results in the context of two examples. Section V concludes the paper.
Ii Sequential Monte Carlo Samplers
In this work we consider an SMC sampler that does not target directly, but rather does so over
iterations such that the joint distribution
of all previous states is the target:(1) 
where is the Lkernel, also known as the ’backwards kernel’, which is chosen such that:
(2) 
Using importance sampling we can attribute a weight to the hypothesis at iteration , , which is updated from the previous iteration’s weight via:
(3) 
where and are the target evaluated at the sample’s new and previous state, respectively, and is the forwards proposal distribution. The forwards proposal distribution and Lkernel are probability distributions associated with transformations to the state from the state , and vice versa. can be any valid probability distribution function and it is sometimes convenient to set . We note, however, that such an approach may be suboptimal and that the appropriate selection of allows the variance of the estimates to be minimised [10].
When degeneracy occurs, where a small subset of samples have relatively high importance weights, a new set of samples are selected from the current set with probability proportional to the normalised weight, , in a process called resampling. This involves selecting elements, with replacement, from with probability into a new vector which then overwrites the old samples, i.e. , before the weights of the new samples are all set to . Resampling is typically set to occur when the effective number of samples falls below some threshold value, usually half the total number of samples.
Values of interest, e.g. expected values with respect to the target distribution, can be realised from the normalised sample weights, and furthermore, it is possible to use previous estimates to improve the current estimate at iteration using recycling schemes [11].
Iii The NoUTurn sampler as a proposal distribution of an SMC sampler
When used in MCMC, NUTS generates samples from a proposal of the form . In an SMC sampler, to calculate (3), we wish to consider a proposal of the form . We can address this disparity by considering the numerical integration of the Hamiltonian dynamics to be a nonlinear function which transforms samples to a new position by means of the momentum.
HMC and NUTS use a numerical method called Leapfrog to simulate Hamiltonian dynamics and explore the target space. Leapfrog has several useful qualities. Firstly it is symplectic, i.e. it preserves the geometric structure of the phase space , and therefore generates states with high acceptance probability for sufficiently small stepsizes. Secondly, it is both reversible and time symmetric such that detailed balance is maintained. The Leapfrog method over one step of stepsize is as follows:
(4)  
(5)  
(6) 
where is a potential energy function and is related to the target distribution by , and is a diagonal mass matrix.
Iiia Nonlinear transform of the proposal distribution
We wish to evaluate the probability that a random variable
transforms to a random variable using a Hamiltonian based proposal, and viceversa. We write this as for the forwards kernel and for the Lkernel.First we derive an expression for the forwards kernel. We generalise Leapfrog to a single function (.) which transforms a state to , i.e. . We can rewrite the forward kernel as:
(7) 
In a Hamiltonian proposal the momentum term is the stochastic term which changes the value of . We can therefore write (IIIA) in terms of a random momentum variable by using a change of variables as follows:
(8) 
To evaluate the Lkernel we utilise the fact that Leapfrog is a reversible integration method, i.e. if we start at a state and then transform this to then by applying it follows: . Following the same steps to arrive at (9), except this time we start at and with a a velocity  we arrive at:
(10) 
For each sample in an SMC iteration to calculate (3) we need to calculate the ratio of (10) and (9). As we will now explain, it is fairly straightforward to show that the determinant terms cancel when Leapfrog is used. Writing the updated state in terms of the initial state and momentum we find that:
(11) 
As Leapfrog is a reversible method, if the the momentum is reversed and the stepsize is equal to that used in the forwards case we similarly find:
(12) 
such that the determinants will cancel when calculating (3). For a reversible method, like Leapfrog, the integrator will evaluate at all the states that were previously evaluated when going forwards. As such, when using the proposal from NUTS, the determinants cancel when calculating the second fraction term in (3).
IiiB A nearoptimal Lkernel for an SMC sampler with a NUTS proposal
We are free to choose the distribution of the Lkernel. One approach is to assume the reverse of the forward proposal, i.e. we can assume that is sampled from the same distribution as the proposal for the initial momentum:
(13) 
This approach, however, is suboptimal. The optimal Lkernel is that which minimises the variance of the sample estimates. It can be shown that where is the distribution of samples at iteration [9] and, as seen in Section IIIA, for the Hamiltonian case it follows that . Closedform expressions for the optimal Lkernel are often intractable. To approximate the optimal Lkernel we follow the approach given in [10], and for the Hamiltonian case take the distribution of the negative new momentum and new samples, , and form a Gaussian approximation such that:
(14) 
where are mean vectors, and are block covariance matrices.
We then use the properties of Gaussians to define a nearoptimal Lkernel:
(15) 
where:
(16) 
and
(17) 
Returning to the incremental weights (3), updates are found via:
(18) 
where and are the initial position and momentum, and and are the position and negative momentum after a NUTS iteration. The right numerator can be evaluated from either (13) or (15) and the right denominator can be evaluated from the initial momentum distribution. Algorithm 1 shows how the nearoptimal Lkernel is used within SMCNUTS for samples over a total of iterations. Algorithm 3 in [5] can be used to generate new samples for the NUTS step. For the suboptimal symmetric Lkernel, steps 10 and 11 are replaced by (13). For the resampling step, several methods may be employed, see [13] and references therein for a discussion on resampling in the context of particle filters (which may be applied here).
Iv Results
We now demonstrate SMCNUTS in two example cases. In both examples the mass matrix is set equal to an identity matrix.
Iva Penalised Regression with Count Data
We first compare our method against a stateoftheart SMC sampler with a random walk proposal which was used to estimate parameters of a penalised regression model with count data in [14].
This problem makes use of Lasso regression
[15] whereby a penalty constraint is placed on the size of the regression coefficients . We follow [14] (with associated details from [16]) by using the exponential power distribution bridge framework for our regularizing prior:(19) 
where
. Our aim is to estimate coefficients used to generate count data. The likelihood is a Poisson distribution
for the observation, where:(20) 
To aid comparison with [14], we likewise generate 100 observations with a 12Dimensional vector where , , , , , and all other values are set to zero. The basis function is a Gaussian kernel with 11 equispaced centres and all values set to 0.5. In our case, we run the SMCNUTS based proposal but, unlike the method in [14], choose not to run the sampler with any form of tempering or step adaption. However, we do use the [14]’s recycling scheme. Furthermore, we use as our Lkernel (i.e. we investigate the benefits of the SMCNUTS proposal without the use on an approximately optimal Lkernel).
Table I shows the meansquarederror (MSE) using SMCNUTS compared to the random walk approach [16], where results were available, for different numbers of samples and iterations for . The MSE for SMCNUTS is 1318 times smaller than for SMC with a random walk proposal for the same number of samples and iterations.
T=100  T=200  

N=25  N=50  N=200  N=25  N=50  N=200  
Random Walk    5.92  5.17    5.49  4.9 
NUTS  0.51  0.471  0.420  0.324  0.306  0.267 
It should be noted that the NUTS proposal is more computationally expensive than a random walk proposal. This is demonstrated in Table II where we compare the average time it takes for SMC with NUTS to complete compared to SMC with a basic random walk proposal, i.e. with no cooling strategy, covariance adaption, or parallel implementation. While the NUTS variant takes longer to complete for an equal number of samples and iterations, with fewer samples and fewer iterations, the runtime is more competitive and still achieves substantially better accuracy. This is exemplified in the case of 25 samples for 100 iterations with a NUTS proposal which was quicker than 200 samples for 200 iterations with a random walk and achieves almost 10 times smaller meansquarederror. The random walk algorithm used to generate results for Table I is more complex that the algorithm used to generate Table II and, therefore, our runtime for the random walk is likely to be an underestimation.
T=100  T=200  

N=25  N=50  N=200  N=25  N=50  N=200  
Random Walk  0.235  0.898  0.461  0.234  0.461  1.85 
NUTS  1.50  3.08  13.40  3.05  6.41  24.0 
IvB Multivariate Studentt Distribution
In this example we aim to show how a near optimal Lkernel can improve the performance of the SMCNUTS sampler. We compare the simple symmetric Lkernel we used in the previous example with a near optimal Lkernel and purposefully initialise the samples away from the target’s probability mass. For our target distribution we choose a Student’st distribution, i.e:
(21) 
where
is the number of degrees of freedom (set to 5 in this example). We set the mean vector of the distribution to
and run the SMCNUTS for both Lkernel strategies for 200 samples and 50 iterations. The estimation of the mean vector is shown in Fig 1.Using the nearoptimal Lkernel it is clear that, through the use of an approximately optimal Lkernel, the SMC sampler is able to find the true values more rapidly than when using the symmetric Lkernel. This shows that it is possible to run the sampler for fewer iterations and achieve better results.
V Conclusions
We have shown how the proposal from the NoUTurn Sampler (NUTS) can be used as a proposal in an SMC sampler. This allows efficient exploration of a wide variety of distributions, using gradient information whilst at the same time having the benefits SMC samplers offer in terms of being readily parallelisable. We have also shown that SMCNUTS can benefit from the use of a near optimal Lkernel. An SMC sampler utilising NUTS as a proposal distribution is observed to give better results compared to a Gaussian proposal. It has also been demonstrated that it is possible to realise estimates that converge to true values in fewer iterations when this approach is taken. We recommend further work to extend the near optimal Lkernel to be applicable in a wider class of highdimensional settings than have been considered to date (e.g. n [10]) such that SMC can be applied in contexts where NUTS is likely to deliver significant benefit.
Acknowledgment
We wish to thank Dr Alexander Phillips for supplying feedback on our implementation of the penalized count data example.
References
 [1] S. Sharma, “Markov Chain Monte Carlo methods for Bayesian data analysis in astronomy”, Annual Review of Astronomy and Astrophysics Vol. 55, pp 213–259, 2017.
 [2] Z. Yang, “The BPP program for species tree estimation and species delimitation”, Current Zoology, Vol. 61, Issue 5, pp. 854–865, 2015.
 [3] G. O. Roberts and O. Stramer, “Langevin diffusions and MetropolisHastings algorithms,” Methodology and Computing in Applied Probability 4, pp. 337–357, 2002.
 [4] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid Monte Carlo,” Physics Letters B, Volume 195, Issue 2, 1987.

[5]
M. D. Hoffman and A. Gelman, “The NoUTurn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo”, Journal of Machine Learning Research, vol. 15, num. 47, pp. 1593–1623, 2014.
 [6] Stan https://mcstan.org/
 [7] PyMC3 https://docs.pymc.io/
 [8] NumPyro http://num.pyro.ai/
 [9] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006.
 [10] P. L. Green, L. J. Devlin, R. E. Moore, R. J. Jackson, J. Li, and S. Maskell “Increasing the efficiency of Sequential Monte Carlo samplers through the use of nearoptimal Lkernels”, Mechanical Systems and Signal Processing, Vol. 162, 2022.
 [11] T. L. T. Nguyen, F. Septier, G. Peters, and Y. Delignon, “Improving SMC sampler estimate by recycling all past simulated particles”, 2014 IEEE Workshop on Statistical Signal Processing (SSP) pp. 117–120
 [12] R. M. Neal, “MCMC using Hamiltonian dynamics”, Handbook of Markov Chain Monte Carlo, Chapman & Hall, 2011.
 [13] A. Varsi, J. Taylor, L. Kekempanos, E. Pyzer Knapp and S. Maskell,“A Fast Parallel Particle Filter for Shared Memory Systems,” in IEEE Signal Processing Letters, vol. 27, pp. 15701574, 2020.
 [14] T. L. T. Nguyen, F. Septier, G. Peters, and Y. Delignon, ”Efficient Sequential MonteCarlo samplers for Bayesian inference”, IEEE Trans. on Signal Processing, vol. 64, no. 5, pp. 1305–1319, March1, 2016.
 [15] R. Tibshirani, “Regression shrinkage and selection via the Lasso”, Journal of the Royal Statisitcal Society: Series B, Vol. 58, num. 1, 1996.
 [16] T. L. T. Nguyen. ”Sequential Monte Carlo Sampler for Bayesian Inference in Complex Systems”, Ph.D Thesis, 2014.
Comments
There are no comments yet.