The Euclidean path integral formulation of quantum mechanics 
allows the calculation of observable quantities as expectation values with respect to infinite-dimensional and highly peaked probability distributions. After discretising the theory on a lattice with finite spacing, expectation values are computed with Markov Chain Monte Carlo methods (see for example  for a highly accessible introduction). This approach is elegant and attractive since it can be extended to quantum field theories, where it allows first principles-predictions for strongly interacting theories such as Quantum Chromodynamics (QCD), see e.g. [3, 4]. Ultimately, however, one is interested in the value of observables in the continuum limit of vanishing lattice spacing . Since the cost of the calculation grows with a high power of , efficient Monte Carlo sampling techniques are crucial to obtain precise and accurate numerical predictions. Today state-of-the-art techniques  are routinely used to accelerate the Metropolis-Hastings algorithm [6, 7] and in particular the Hybrid Monte Carlo (HMC) method  has proved to be highly successful in lattice QCD simulations. However, lattice calculations with HMC methods still become prohibitively expensive as the continuum limit is approached. The reasons for this are twofold:
For quantum mechanical problems the cost of generating a single discretised path grows at least in proportion to the number of lattice sites, which increases with if the physical size of the simulation box is kept fixed (for a quantum field theory in dimensions the growth would be even faster with a cost of per configuration).
As the theory approaches a critical point, subsequent states in the Markov chain are increasingly correlated, which requires the generation of more paths to obtain a given number of statistically independent samples.
Furthermore, the law of large numbers dictates that to reduce the statistical (sampling) error below a given tolerance, at least independent samples have to be generated. While in lattice QCD the continuum limit is usually taken by extrapolating simulations at different lattice spacings and fixed tolerance on the statistical error, in the multilevel Monte Carlo literature (see e.g. the classical paper ) it is more common to decrease in proportion to the tolerance on the discretisation error as the lattice spacing is reduced. Reducing the combined statistical and discretisation error in this way would make optimal use of computational resources to obtain a result with a given total error for a specific fine lattice spacing.
The correlation of subsequent samples in the Markov chain is quantified by the integrated autocorrelation time , which grows particularly rapidly for some quantities, such as the topological susceptibility in QCD, where it has been observed that with . This is attributed to “freezing” of the topological charge, and can lead to observable effects. Those can be both direct, since e.g. the mass of the meson receives important contributions from the topological susceptibility in a pure Yang-Mills theory [11, 12], and more indirect due to the coupling of slow modes with large autocorrelation times to other observables. The authors of  further report a milder but still significant growth with for a range of other physically relevant observables. While the rapid growth of the integrated autocorrelation time for the topological susceptibility can be addressed by using open boundary conditions in time [13, 14], this introduces additional complications since it requires lattices with a very large extent in the temporal direction.
To estimate the overall growth in cost of a simulation, as the lattice spacingis reduced, consider a quantum mechanical observable with a discretisation error that is , where values such as are typical. To reduce the discretisation error below a tolerance of and the statistical error below incurs a cost
with standard Monte Carlo (StMC), since . To get an intuitive understanding of this and subsequent complexity estimates it might be instructive to consider the special case , : since the discretisation error decreases quadratically with , reducing this error by a factor of can be achieved by halving the lattice spacing, which in turn doubles the cost for generating a single sample if the physical size of the simulation box is kept fixed; in other words, the cost per sample grows in proportion to
In this paper, it is shown how this explosion in computational cost can be significantly reduced with hierarchical sampling  and Multilevel Monte Carlo (MLMC) [16, 9], which has recently been extended to a Markov chain setting [17, 18]. To generate samples, a hierarchy of coarser lattices with spacings of and corresponding coarse-grained versions of the original theory are constructed. Based on this hierarchy, a recursive implementation of the delayed acceptance method in  is proposed. Starting on the coarsest level, proposals are successively extended by additional modes and screened with a standard Metropolis-Hastings accept/reject step on increasingly finer lattices. At this point is important to stress that the coarse lattices are only used to accelerate sampling and do not introduce any additional bias because ultimately each new sample is accepted or rejected step with the correct, original action on the finest lattice. Since evaluating the action on the coarse lattices is substantially cheaper, the cost of generating a single fine level sample is not substantially higher than if a single-level sampler was used. In fact, when compared to a method such as HMC, it may be smaller since the cost of generating an HMC trajectory can be shifted to the coarsest level where it is substantially shorter. Since on each level proposals are screened with a Metropolis-Hastings accept/reject step, the method samples from the correct distribution on the original lattice with spacing and does not introduce any additional bias, cf. . Due to the convergence of the lattice theories on subsequent levels of the hierarchy with , hierarchical sampling eliminates the growth in autocorrelation time, reducing the computational cost to
MLMC is a variance reduction technique, which uses the fact that the expectation value of an observable (or quantity of interest) on a lattice with spacing can be written as a telescoping sum. For this assume that there is some integer and a constant such that . Further, let be the observable measured on a lattice with spacing . Then
Here, the sums in are taken over independent samples, labelled by the superscript “”. The key observation is that, except for the very coarsest level, MLMC estimates differences of the observable instead of the quantity of interest itself. Provided that theories on subsequent levels can be coupled efficiently and the variance of the difference decreases sufficiently rapidly as the lattice spacing is reduced, significantly lower numbers of samples are sufficient on the finer levels of the grid hierarchy. The majority of the cost can be shifted to the coarser levels , where sampling is substantially cheaper. Due to the exactness of the telescoping sum (i.e. the first equality in Eq. (3)), MLMC does not introduce any additional bias if the individual MC estimators are unbiased. The algorithms described in the paper allow the construction of estimators which have an arbitrarily small bias. In the numerical results presented below the size of this bias is comparable to the discretisation error on the original, fine level lattice. Compared to Eqs. (1) and (2), MLMC further reduces the computational complexity to
see below. Similar estimates have been derived in [9, 17, 18] and it has been demonstrated numerically that MLMC leads to a significant reduction in computational complexity and overall runtime for a range of applications, e.g. in uncertainty quantification (UQ) for sub-surface flow [19, 17], inverse problems  or material simulation .
While this paper focuses on the application of these new methods in quantum mechanics, the ultimate goal is to apply them in -dimensional quantum field theories, such as lattice QCD with and . For , the expected gains are even larger, since the cost to generate a single configuration grows like instead of while the accuracy is still decreasing no faster than . The predicted improvement in computational performance is summarised in the following diagram, generalising Eqs. (1), (2) and (4) to dimensions:
For example, consider the prediction of the topological susceptibility () in lattice QCD () with improved action (). In this case, hierarchical sampling reduces the cost of a Monte Carlo simulation from to and MLMC reduces the computational complexity even further to .
To discuss this further, consider first the relative advantage of MLMC over standard Monte Carlo in the continuum limit for fixed . MLMC only requires the generation of a small number of samples on the finest lattice for small (eventually only one for very small ), whereas the number of configurations that have to be generated with a standard Monte Carlo method is proportional to . As can be seen from the final two lines of Eq. (5), MLMC is a factor of faster than standard Monte Carlo with hierarchical sampling for . This argument holds for general and ; the coefficient depends on the relative cost of generating independent samples on the finest level and the coarser levels. Provided those costs are proportional to the number of unknowns on each level (and the constant of proportionality is independent of ) we expect to lie between 1 and 2.
If is kept fixed as the continuum limit is taken, eventually the statistical error will dominate the discretisation error. To avoid this, one might consider the case where and the combined root mean square error is reduced below some given tolerance . This is the common choice in the multilevel Monte Carlo literature (see e.g. ). In that case, the complexity estimates in Eq. (5) become
In quantum field theories, coarse grained actions are naturally obtained by integrating out high-frequency modes in a renormalisation group (RG) transformation, which results in an effective theory with less degrees of freedom. In practice, the RG transformation can be carried out either non-perturbatively (e.g. through a block spin transformation) or through perturbative matching. The latter would, in fact, be sufficient for MLMC as long as the variance ofdecays sufficiently rapidly since the coarse levels are only used to accelerate sampling on the original, fine level. For asymptotically free theories, such as lattice QCD, Symanzik-improved actions [22, 23] can be constructed by systematically adding suitable terms which are proportional to powers of the lattice spacing and which are multiplied by appropriate, so-called ‘improvement coefficients’. These coefficients can be tuned non-perturbatively [24, 25], or they can be computed using perturbation theory for sufficiently small lattice spacing [22, 23]. In the MLMC approach, the perturbatively calculated improvement coefficients on different levels of the lattice hierarchy are in fact sufficient since the differences of these coefficients between subsequent levels are sufficiently small on fine lattices.
As a proof-of-concept, hierarchical sampling and multilevel Monte Carlo are applied to two problems in quantum mechanics (): a non-symmetric double-well potential and the topological oscillator studied in . The latter case is particularly interesting since it has a topological quantum number, which freezes in the continuum limit (). This results in a rapid growth of the autocorrelation time of the topological susceptibility if standard HMC sampling is used. Hierarchical sampling all but eliminates this growth, resulting in a dramatic reduction in runtime. Furthermore, the coarse-grained theories can be improved using a perturbative matching technique for this problem, which further increases the efficiency of the hierarchical approach. As demonstrated in , for the topological oscillator the so-called ’cluster algorithm’  almost entirely eliminates autocorrelations through long-range spin updates. However, this method can be further accelerated with MLMC, leading to a reduction in computational complexity and in absolute runtime for high resolutions. Similar gains are observed for the non-symmetric double-well potential problem with MLMC.
In summary, the main achievements of this work are:
It is described in detail how algorithms for hierarchical sampling and multilevel Monte Carlo acceleration can be applied to the path integral formulation of quantum mechanics.
It is shown how hierarchical sampling techniques dramatically reduce autocorrelation times.
It is further demonstrated that combining this with MLMC leads to an additional reduction in computational complexity and in the total runtime.
It is explained how perturbative matching can further improve performance for the topological oscillator.
It is stressed again that the additional gains due to MLMC accelerating are expected to be significantly larger in high-dimensional theories, such as lattice QCD (see Eq. (5)). The present paper therefore aims to lay the foundation for further work on extending the described methods to quantum field theories on a lattice.
This paper is organised as follows: after briefly reviewing the literature on related approaches in Section 1.1, the application of hierarchical sampling and multilevel Monte Carlo to the path integral formulation of quantum mechanics is discussed in Section 2. The quantum mechanical model problems that are used in this work are described in Section 3, including the construction of coarse-grained actions for those problems. Numerical results for the non-symmetric double-well potential and the topological oscillator are presented in Section 4, in particular we compare the runtime of all considered algorithms for fixed . Section 5 contains the conclusion and outlines directions for future work. More technical topics, such as a detailed cost analysis of MLMC and a discussion of how the methods can be extended to higher dimensional problems, are relegated to the appendix where we also show results for .
1.1 Relationship to previous work
While hierarchical sampling techniques have been suggested previously (see e.g. [28, 29, 30, 31]), the variance reduction techniques from MLMC significantly improve on this. Eqs. (2) and (4) show that the additional acceleration will lead to a further dramatic reduction in computational complexity. The presented methods are therefore expected to be superior to the approach in , which uses a hierarchical method to initialise the simulation, but not for the Monte Carlo sampling. Earlier work in [30, 31] uses renormalisation group techniques to sample close to the critical point of the Ising model where the theories on the coarser levels become self-similar. Similarly, collective cluster-update algorithms [33, 27, 34]
have been applied to models in solid state physics close to phase transitions (see e.g.). However, the application of all those techniques is limited to spin systems. The approach here applies to general systems and delivers significant additional speedup through multilevel Monte Carlo variance reduction.
2.1 Path integral formulation of quantum mechanics
For completeness and to introduce the discretised path integral for non-experts, we recapitulate the key principles here. The path integral formulation of quantum mechanics  expresses the expectation value of physical observables as the infinite-dimensional sum over all possible configurations or paths , where for all times . In this sum each path is weighted by a complex amplitude , where is the action, the integral over the Lagrangian of the system. This formulation is very elegant since it allows the direct quantisation of any system which can be described by a Lagrangian. In the limit fluctuations around the classical path which minimises the action cancel out, and the Euler-Lagrange equations are recovered. However, for simplicity from now on we will work in atomic units where . To make the evaluation of the path integral tractable, two approximations are made: (1) time is restricted to a finite interval and (2) the time interval is divided into intervals of size , which is known as the lattice spacing. Conditions have to be imposed on the paths at and ; here we use periodic boundary conditions . Each path , which is defined for all times
, is replaced by a vector. For each the quantity approximates the position of the particle at the time . Those two approximations turn the infinite dimensional integral over all paths into an integral over a finite, but high-dimensional domain . Evaluating the integral in Euclidean time converts it to the canonical ensemble average of a statistical system at a finite temperature. More specifically, the expectation value of an observable (commonly known as “Quantity of interest”, QoI, in the UQ literature) which assigns a value to each discrete path can be written as the following ratio
with the -dimensional probability density given by
with normalisation constant . The action is an approximation of the continuum action
where is the Lagrangian.
Physically meaningful predictions, which can be compared to experimental measurements, are obtained by extrapolating to the continuum limit and infinite volume . As is inversely proportional to the lattice spacing, the integrals in Eq. (7) become very high dimensional in the continuum limit. In this paper we do not discuss finite volume errors (due to finite values of ). In other words, we take the continuum limit for finite as the “true” value for any observables studied here.
2.2 Standard Monte Carlo
Since the distribution in Eq. (8) is highly peaked, the expectation value in Eq. (7) is usually computed with importance sampling. For this, the Metropolis-Hastings algorithm [6, 7] is used to iteratively generate a sequence of samples , . The expectation value can then be approximated as the sample average
A single Metropolis-Hastings step for computing , given , is written down in Alg. 1.
Draw uniformly distributed random number.
Input: current sample
Output: new sample
The Markov chain is generated by starting from some , which is either a given vector or drawn at random. Since this is not drawn from the correct distribution, all subsequent samples are distributed according to some distribution with . In practice, the first samples are discarded, and throughout this paper we implicitly assume that is chosen such that for all subsequent samples the error due to the difference between and is much smaller than the discretisation- and sampling- errors.
The law of large numbers states that in the limit of a large number of samples the sample average in Eq. (9) is distributed according to a Gaussian with mean and variance
In this expression is the integrated autocorrelation time defined as
where is an arbitrary point in time. As can be seen from Eq. (10), the number of samples required to reduce the statistical error below a given tolerance grows with , and it is therefore important to reduce the correlation between subsequent samples as far as possible. This can be achieved by carefully choosing the proposal in line 1 of Alg. 1. In lattice QCD with dynamical fermions, Hybrid Monte Carlo  is very popular since it generates global updates. We therefore choose to use this method here, being aware that other algorithms, such as heat bath sampling, might be more efficient for particular applications. We nevertheless believe that HMC is representative, since grows with a large power of the inverse lattice spacing as the continuum limit is approached also with other sampling approaches. The only exception are some problem-specific samplers, such as the cluster algorithm  for the topological oscillator, which we therefore also consider in this work.
2.3 Multilevel Monte Carlo
We now describe hierarchical methods for overcoming the growth in autocorrelations and reducing the variance of the measured observable.
Recall that a path describes the position of the particle at the discrete points with . More formally, define a lattice as the set of points
Paths on this lattice are objects in the domain . We introduce a hierarchy of lattices for , such that lattice has points and a lattice spacing of , i.e.
Here is the original lattice with points and a spacing of . Paths on lattice are represented by vectors in the domain , where obviously .
Note that the lattices are nested, and the points of the lattice are a subset of the points of , namely the points with even indices. A path on a particular level
stores values at the odd and even lattice points, where the latter are also present on the next-coarser lattice. Formally this can be expressed as
such that all can be written as
On each lattice we define an action such that is the original action. In the simplest case the coarse-level actions are obtained by re-discretising the original action with the appropriate lattice spacings, but other choices are possible and will be discussed below. On each level the action induces a probability distribution such that
where is the normalisation constant. The probability distribution on the finest level is identical to defined in Eq. (8
). Further, introduce a conditional probability distributionfor the values at the odd points on level , given the values at the even points on the same level, namely
for all . The action should be some approximation to , such that it is possible to sample from for a given . For the quantum mechanical model problems considered in this work the construction of is described in Sections 3.1.1 and 3.2.1.
We stress that although in this paper we assume that the lattice can be partitioned into sets of mutually independent even and odd sites, the ideas developed here can be generalised to higher dimensions. This is outlined in Appendix A.
Similar to the delayed-acceptance approach in , we next introduce a hierarchical algorithm to efficiently construct a Markov chain on a given level using coarser levels: First we define the two-level Metropolis-Hastings step in Alg. 2. Setting this algorithm assumes that on a given level there is a coarse level proposal distribution which depends on . Based on this, it proposes a new fine-level state which is either accepted and returned as the new state or rejected; in the latter case the previous state is returned as . It was shown in  that this defines a correct Metropolis-Hastings algorithm targeting .
Let be the transition kernel for the process implicitly defined by Alg. 2. The key idea is now to use the algorithm recursively by using as the proposal distribution on level . The process of picking from in the first line of Alg. 2 then corresponds to a recursive call to the same algorithm on the next-coarser level. On the coarsest level () is drawn with the standard Metropolis-Hastings step in Alg. 1 with corresponding transition kernel ; here we always assume that the proposal in this Metropolis Hastings step is generated with a symmetric method such as HMC.
Input: level , current sample , proposal distribution
Output: new sample
Input: level , current sample
Output: new sample
Note that unless the proposals on all levels get accepted. At first sight this seems to imply that the overall acceptance probability of Alg. 3 drops as the number of levels increases, and subsequent samples are highly correlated. However, this turns out not to be the case if the theories on subsequent level converge with : in this case the proposal from the two-level step in Alg. 2 is almost certainly accepted on finer levels. Our numerical experiments confirm this observation.
In practice, it is more convenient to implement Alg. 3 iteratively, starting from the coarsest level. As discussed in Appendix B, the cost of executing Alg. 3 on level can be bounded by a constant times the number of unknowns on this particular level. Observe also that setting in Alg. 3 allows drawing a new sample from the original fine level probability distribution defined in Eq. (8).
Relationship to the literature.
The two-level step in Alg. 2 is closely related to similar algorithms in [15, 17]. If the coarse level sample is drawn with an arbitrary Metropolis-Hastings kernel , then Alg. 2 above is a variant of the delayed-acceptance method in [15, Alg. 1] with proposal distribution and approximation , recalling the notation , .
Multilevel Monte Carlo algorithm
As discussed in the introduction, the multilevel Monte Carlo algorithm computes the quantity of interest on the coarsest level and adds corrections to this by computing the difference of the observable on subsequent levels according to the telescoping sum in Eq. (3). Since those differences have a smaller variance, this allows shifting the cost to the coarser levels where samples can be generated cheaply. The original MLMC algorithm described in  assumes that it is possible to draw independent identically distributed (i.i.d.) samples from a distribution on each level. For the Markov chain Monte Carlo setting considered here this is not possible since subsequent samples in the chain are correlated and, as discussed in , this introduces an additional bias. This bias can be reduced by constructing sequences of samples for each level with Alg. 3 and sampling those sequences with sufficiently large sub-sampling rates . The typical rule in statistics is to use twice the integrated autocorrelation time to achieve (sufficient) independence. In our numerical experiments, we set and observe that the additional bias due to computing the coarse level samples which are only approximately independent is comparable to the discretisation error.
The multilevel Monte Carlo algorithm which we use in this work is presented in Alg. 4 and visualised in Fig. 2. It is similar to the multilevel algorithm in , but with the recursive independent sampler in [17, Alg. 3] replaced by the (suitably sub-sampled) hierarchical delayed-acceptance sampler in our Alg. 3 above. Multilevel Monte Carlo computes
which as unbiased estimator for the expectationin Eq. (3). On each level , the number of samples is chosen to be
where is the effective cost of generating an independent sample (taking into account autocorrelations) and is the variance of the quantity on level , which converges to zero as .
We now discuss how the cost of Alg. 4 increases as the tolerance on the total error is tightened, assuming for simplicity i.i.d. samples on all levels. Let the exact value of the observable in the continuum limit be . The total mean square error of the multilevel Monte Carlo estimator defined in Eq. (15) can be expanded
where the first term in the final line of Eq. (17) is the squared statistical error, whereas the second term is the squared discretisation error. An easy calculation shows that choosing as in Eq. (16) guarantees that
To analyse the complexity we assume that
the discretisation error is of order ,
converges with order for some ,
the integrated autocorrelation times of , and thus also the sub-sampling rates , can be bounded by a constant independent of such that the cost of generating an independent sample does not grow faster than the number of unknowns for all .
As shown in more detail in Appendix B, it is then possible to choose the number of levels such that the discretisation error in Eq. (17) does not exceed . As a consequence, the cost of computing the MLMC estimator in Eq. (15) with a statistical error less than and a discretisation error less than has the following computational complexity:
which is a special case of the well-known estimate in .
However, the samples created by Alg. 4 on each of the levels are generated with a Markov chain and thus only asymptotically distributed according to . As discussed in , the complexity analysis can be modified to address this issue, leading to an additional factor in Eqs. (18) and (19). This seems to be not visible in the numerical results below or in  (at least pre-asymptotically).
3 Quantum mechanical model systems
To demonstrate the performance of the methods discussed in the previous section we consider two non-trivial quantum mechanical problems.
3.1 Non-symmetric double-well potential
The first system describes a particle with mass moving subject to a non-symmetric double-well potential . Fig. 3 shows this potential for the choice of parameters that were used in our numerical experiments, namely , , , .
The corresponding Lagrangian is
where . For a given path the discretised lattice action is
The observable we consider is the average squared displacement
Note that since points on the lattice are correlated with a correlation length which is constant in physical units, the variance of this observable does not go to zero in the continuum limit. In other words, the sampling error is not automatically reduced on finer lattices.
3.1.1 Coarse level action
To construct the action defined in Eq. (14), observe that
where is the marginal distribution of the even points
Here is defined for arbitrary values , as