DeepAI
Log In Sign Up

Quantum-enhanced Markov chain Monte Carlo

03/23/2022
by   David Layden, et al.
ibm
0

Sampling from complicated probability distributions is a hard computational problem arising in many fields, including statistical physics, optimization, and machine learning. Quantum computers have recently been used to sample from complicated distributions that are hard to sample from classically, but which seldom arise in applications. Here we introduce a quantum algorithm to sample from distributions that pose a bottleneck in several applications, which we implement on a superconducting quantum processor. The algorithm performs Markov chain Monte Carlo (MCMC), a popular iterative sampling technique, to sample from the Boltzmann distribution of classical Ising models. In each step, the quantum processor explores the model in superposition to propose a random move, which is then accepted or rejected by a classical computer and returned to the quantum processor, ensuring convergence to the desired Boltzmann distribution. We find that this quantum algorithm converges in fewer iterations than common classical MCMC alternatives on relevant problem instances, both in simulations and experiments. It therefore opens a new path for quantum computers to solve useful–not merely difficult–problems in the near term.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/26/2019

Analyzing MCMC Output

Markov chain Monte Carlo (MCMC) is a sampling-based method for estimatin...
12/18/2013

On the Challenges of Physical Implementations of RBMs

Restricted Boltzmann machines (RBMs) are powerful machine learning model...
03/22/2018

Frequency violations from random disturbances: an MCMC approach

The frequency stability of power systems is increasingly challenged by v...
05/01/2019

Towards Sampling from Nondirected Probabilistic Graphical models using a D-Wave Quantum Annealer

A D-Wave quantum annealer (QA) having a 2048 qubit lattice, with no miss...
12/16/2022

Estimating truncation effects of quantum bosonic systems using sampling algorithms

To simulate bosons on a qubit- or qudit-based quantum computer, one has ...
12/18/2019

Quantum-inspired annealers as Boltzmann generators for machine learning and statistical physics

Quantum simulators and processors are rapidly improving nowadays, but th...

Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is the most popular algorithmic technique for sampling from the Boltzmann distribution of Ising models in all of the aforementioned applications. In fact, it was named one of the ten most influential algorithms of the 20th century in science and engineering [10]. It approaches the problem indirectly, without ever computing and in turn the partition function , which is believed to take time. Rather, MCMC performs a sequence of random jumps between spin configurations, starting from a generic one and jumping from to with fixed transition probability in any iteration. Such a process is called a Markov chain. For it to form a useful algorithm, the transitions probabilities must be carefully chosen to make the process converge to ; that is, for the probability of being in after many iterations to approach . Sufficient conditions for such convergence are that the Markov chain be irreducible and aperiodic (technical requirements that will always be met here [11]), and satisfy the detailed balance condition:

(2)

for all [12].

How to efficiently realize a satisfying these conditions? A powerful approach is to decompose each jump into two steps: Step 1 (proposal)- If the current configuration is , propose with some probability . Step 2 (accept/reject)- Compute an appropriate acceptance probability based on and , then move to with that probability (i.e., accept the proposal), otherwise remain at . This gives for all . For a given there are infinitely many choices of that satisfy detailed balance (2). One of the most popular is the Metropolis-Hastings (M-H) acceptance probability [13, 14]:

(3)

Although and cannot be efficiently computed, Eq. (3) depends only on their ratio, which can be evaluated in time since cancels out. This cancellation underpins MCMC, and ensures that can be efficiently computed provided can too. The same is true for the other popular choice of , called the Glauber or Gibbs sampler acceptance probability [11].

The most common way to propose a candidate is by flipping () a uniformly random spin of the current configuration . We call this the local proposal strategy since and are always neighbors in terms of Hamming distance. The resulting acceptance probability can be computed efficiently, and the overall Markov chain converges quickly to for simple model instances at moderate [12]. Various non-local strategies can also be used (sometimes in combination [15]), including the uniform proposal strategy where is picked uniformly at random from , and more complex ones which flip clusters of spins [16, 17, 18, 19]. Broadly, spin glasses at low present a formidable challenge for all of these approaches, manifesting in long autocorrelation, slow convergence, and ultimately long MCMC running times. This challenge is an active area of research due to the many applications in which it arises [20]. The problem: in Eq. (3), which comes from demanding detailed balance (2), is exponentially small in for energy-increasing proposals . Consequently, such proposals are frequently rejected. (And a rejected proposal still counts as an MCMC iteration.) This makes it difficult for Markov chains to explore rugged energy landscapes at low , as they tend to get stuck in local minima for long stretches and to rarely cross large barriers in the landscape.

Quantum algorithm

To alleviate this issue, we introduce an MCMC algorithm which uses a quantum computer to propose moves and a classical computer to accept/reject them. It alternates between two steps: Step 1 (quantum proposal)- If the current configuration is , prepare the computational basis state on the quantum processor, where

refers to an eigenvalue of

. (E.g., if , prepare . We use , and to denote , and

on qubit

respectively.) Then apply a unitary satisfying the symmetry constraint:

(4)

Finally, measure each qubit in the eigenbasis, i.e., the computational basis, denoting the outcome . Step 2 (classical accept/reject)- Compute from Eq. (3) on a classical computer and jump to with this probability, otherwise stay at . While computing and may take exponential (in ) time in general, there is no need to do so: Eq. (3) depends only on their ratio, which equals 1 since due to (4). This cancellation underpins our algorithm, and mirrors that between and . The resulting Markov chain provably converges to the Boltzmann distribution , but is hard to mimic classically, provided it is classically hard to sample the measurement outcomes of . This combination opens the possibility of a useful quantum advantage.

The symmetry requirement (4) ensures convergence to , but does not uniquely specify

. Rather, we pick the quantum step of our algorithm heuristically with the aim of accelerating MCMC convergence in spin glasses at low

, while still satisfying condition (4). We then evaluate the resulting Markov chains through simulations and experiments. Several choices of

are promising, including ones arising from quantum phase estimation and quantum annealing

[11]. However, we focus here on evolution under a time-independent Hamiltonian

(5)

for a time , where

(6)

encodes the classical model instance, (discussed below) generates quantum transitions, and is a parameter controlling the relative weights of both terms. It is convenient to include a normalizing factor in Eq. (5) so that both terms of share a common scale regardless of , which can be arbitrary. ( is the Frobenius norm of a matrix .)

In principle, could comprise arbitrary weighted sums and products of and terms. These produce a symmetric and therefore (although generically), thus satisfying condition (4). If were a dense matrix and , a perturbative analysis shows that non-trivial quantum proposals would exhibit a remarkable combination of features that suggest fast MCMC convergence [11]: Like local proposals, their absolute energy change is typically small, meaning they are likely to be accepted even if . However, like uniform or cluster proposals, and are typically far in Hamming distance, so the resulting Markov chain can move rapidly between distant local energy minima. While a dense would pose experimental difficulties, similar behavior is reported for various spin glasses in Refs. [21, 22, 23] using

(7)

and larger . Inspired by these results, we take Eq. (7) as the definition of here. The normalizing factor in Eq. (5) then takes the simple form

(8)

which—crucially—can be computed in time. Step 1 of our algorithm therefore consists of realizing quenched dynamics of a transverse-field quantum Ising model encoding the classical model instance, which can be efficiently simulated on a quantum computer [24]. Note, however, that our algorithm samples from a classical Boltzmann distribution , not from a quantum Gibbs state as in Refs. [25, 26, 27, 28].

As an initial state evolves under , it becomes delocalized due to and effectively queries the classical energy function from Eq. (1) in quantum superposition through . The measurement outcome is therefore influenced by the entire energy landscape. The details of this quantum evolution depend on the free parameters and . Rather than try to optimize these, we rely on a simple observation: instead of applying the same in every iteration of our algorithm, one could equally well pick at random in each iteration from an ensemble of unitaries, each satisfying condition (4). This amounts to sampling outputs from random circuits, initialized according to the state of a Markov chain. It is this randomized approach that we implement, by picking and uniformly at random in each iteration [11], thus obviating the need to optimize these parameters. Note that we do not invoke adiabatic quantum evolution in any way, despite the familiar form of Eq. (5). Rather, the relative weights of and are held fixed throughout each quantum step.

Performance

c. fits Quantum Mismatched quantum Local Uniform Quantum enhancement factor in : 3.6(1)

Figure 2: Average-case convergence rate simulations. The absolute spectral gap , a measure of MCMC convergence rate, using the M-H acceptance probability (3) with different proposal strategies. All strategies were simulated classically. Lines/markers show the average over 500 random fully-connected Ising model instances for each

; error bands/bars show the standard deviation in

over these instances. Dotted lines are for visibility. a. The slow-down of each strategy at low . For the local proposal strategy, also at high because an eigenvalue of its transition matrix approaches . This artifact can be easily be remedied by using a lazy chain or the Gibbs sampler acceptance probability; the same is not true at low , however [11]. b. Problem size dependence, with least squares exponential fits to the average

, weighted by the standard error of the mean.

c. The resulting fit parameters and the average quantum enhancement exponent, which is the ratio of for the quantum algorithm and the smallest among classical proposal strategies (the local strategy, here). Uncertainties are from the fit covariance matrices. Similar data for different and , model connectivity, and acceptance probabilities is shown in [11].

We now analyze the running time of our quantum algorithm through its convergence rate. MCMC convergence is inherently multifaceted. However, a Markov chain’s convergence rate is often summarized by its absolute spectral gap, , where the extremes of and describe the slowest and fastest possible convergence, respectively. This quantity is found by forming the transition probabilities into a matrix and computing its eigenvalues , all of which satisfy and describe a facet of the chain’s convergence. The absolute spectral gap, , describes the slowest facet. More concretely, it bounds the mixing time , defined as the minimum number of iterations required for the Markov chain’s distribution to get within any of in total variation distance, for any initial distribution, as [12]

(9)

Since is the only quantity on either side of (9) that depends on the proposal strategy , it is a particularly good metric for comparing the convergence rate of our quantum algorithm with that of common classical alternatives. Because depends on the model instance and on , not just on , we analyzed our algorithm’s convergence in two complementary ways: First, we simulated it on a classical computer for many instances to elucidate the average-case . Second, we implemented it experimentally for illustrative instances and analyzed both the convergence rate and the mechanism underlying the speedup observed in simulations. (Note that computing is different—and much more demanding—than simply running MCMC.) We used the M-H acceptance probability (3) throughout, although this choice has little impact on the results [11]. Unlike previously proposed algorithms which prepare a quantum state encoding , ours uses simple, shallow quantum circuits [29, 30, 31, 32, 33, 34, 35]. Its alternating quantum/classical structure means that quantum coherence need only be maintained in each repetition of Step 1, rather than over the whole algorithm [36, 37]. Moreover, it generates numerous samples per run, like a fully classical MCMC algorithm, rather than a single one. These features make it sufficiently well-suited to current quantum processors that we observed a quantum speedup experimentally on up to qubits.

Average-Case Performance

For the first part of the analysis, we generated 500 random spin glass instances on spins by drawing each and

independently from standard normal distributions. We did not explicitly fix any couplings

to zero; the random instances are therefore fully connected as in Fig. 1a. This ensemble is the archetypal Sherrington-Kirkpatrick model [38] (up to a scale factor) with random local fields, where the fields serve to break inversion symmetry and thus increase the complexity. For each instance, we explicitly computed all the transition probabilities and then as a function of for different proposal strategies . We then averaged over the model instances, and repeated this process for . The results describe the average MCMC convergence rate as a function of and . Two illustrative slices are shown in Fig. 2, where and are held fixed in turn. At high , where is nearly uniform, the uniform proposal produces a fast-converging Markov chain with near , as shown in Fig. 2a. However, both the uniform and local proposals suffer a sharp decrease in at lower . This slow-down is much less pronounced for our quantum algorithm, which gives a substantially better on average than either classical alternative for .

For all three proposal strategies, the average scaling of with at fits well to , as shown in Fig. 2b. Our algorithm appears to give an average-case polynomial enhancement in over the local and uniform proposals based on the fitted values of , shown in Fig. 2c. These values depend on , but their ratios suggest a roughly cubic/quartic enhancement at low temperatures regardless of the exact [11]. Finally, to elucidate the source of this average-case quantum enhancement, we also computed for a mismatched quantum proposal strategy, where moves are proposed like in our algorithm, but based on the wrong energy landscape. That is, rather than explore the relevant defined by coefficients and as in Eqs. (1) and (6), the mismatched quantum strategy uses the wrong coefficients and (drawn randomly from the same distribution as and ) in its quantum Hamiltonian and therefore explores the wrong energy landscape in Step 1. The resulting is comparable with that of the uniform proposal in Fig. 2. This suggests that the enhancement in our algorithm indeed arises from exploring the classical energy landscape in quantum superposition to propose moves.

Experimental Implementation

For the second part of the analysis we focus in on individual model instances, for which we implemented our quantum algorithm experimentally and analyzed it in more depth than is feasible for a large number of instances. We generated random model instances and picked illustrative ones whose lowest- configurations include several near-degenerate local minima—a central feature of spin glasses at larger which hampers MCMC at low . We then implemented our quantum algorithm experimentally for these instances on ibmq_mumbai, a 27-qubit superconducting quantum processor, using Qiskit [39]

, an open-source quantum software development platform. (The Ising model

has no relation to the processor’s physical temperature.) We approximated on this device through a randomly-compiled second-order Trotter-Suzuki product formula [40, 24, 41] with up to 48 layers of pulse-efficient 2-qubit gates [42] acting on up to 5 pairs of qubits in parallel [11], and set in the acceptance probability. Unlike in the first part of the analysis, we restricted our focus to model instances where for as in Fig. 1b, in order to match the connectivity of qubits in the quantum processor for this initial demonstration. Simulations show that the average-case for such instances is qualitatively similar to Fig. 2 [11].

Figure 3: Convergence rate experiment. The absolute spectral gap , a measure of MCMC convergence rate, for an illustrative model instance on spins with 1D connectivity. Each proposal strategy is combined with the M-H acceptance probability (3). To infer for the experimental realization of our algorithm, we recorded quantum transitions between computational states with uniformly distributed, to estimate each . For each , we used the full data set to estimate the MCMC transition matrix and in turn form a point estimate of

(solid red line), then a random subsample of the data to compute 99% bootstrap confidence intervals (red error bands).

We focus on an model instance here in which the six lowest- configurations are all local minima, and the two lowest have an energy difference of just . Similar instances with and are analyzed in [11]. For the present instance, closely follows the average in Fig. 2a for the local and uniform proposal strategies, as well as for our quantum algorithm in theory, as shown in Fig. 3. We estimated as a function of for the experimental realization of our quantum algorithm by counting quantum transitions between all computational states and to estimate the MCMC transition matrix, which we then diagonalized. At low , the inferred is smaller than the theoretical value due to experimental imperfections, but still significantly larger than that of either the local or uniform alternative. This constitutes an experimental quantum enhancement in MCMC convergence on current quantum hardware. At high , the experimental is larger than the theoretical value for our quantum algorithm, which we attribute to noise in the quantum processor mimicking the uniform proposal to a degree. We also plot for common MCMC cluster algorithms (those of Swendsen-Wang, Wolff and Houdayer) in [11]. These are substantially more complicated than the local and uniform proposals, both conceptually and practically, but offer almost no enhancement in this setting compared to the simpler classical alternatives, so we do not focus on them here.

To further illustrate the increased convergence rate of our quantum-enhanced MCMC algorithm compared to these classical alternatives, we use it to estimate the average magnetization (with respect to the Boltzmann distribution ) of this same instance. The magnetization of a spin configuration is , and the Boltzmann average magnetization is

(10)

Eq. (10) involves a sum over all configurations, but given samples from , the approximation can be accurate with high probability even if [43]. While sampling from exactly may be infeasible, it is common to approximate by the running average of over MCMC trajectories (of one or several independent chains), and likewise for other average quantities. The quality of this approximation after a fixed number of MCMC iterations reflects the Markov chains’ convergence rate [12].

Figure 4: Magnetization estimate experiment. a. The current magnetization for individual Markov chains after iterations. Each chain illustrates a different proposal strategy with uniformly random initialization. Arrows indicate the magnetization of the ground (G), 1st, 2nd and 3rd excited configurations. b. Convergence of the running average from MCMC trajectories to the true value of for different proposal strategies. For each strategy, the lines and error bands show the mean and standard deviation, respectively, of over 10 independent chains. The inset depicts the same chains over more iterations. We do not use a burn-in period or thinning (i.e., the running average starts at and includes every iteration up to

), as these practices would introduce hyperparameters that complicate the interpretation. Both panels are for the same illustrative

instance at , and use the M-H acceptance probability (3).

We used this approach to estimate at as shown in Fig. 4. At this temperature , and the Boltzmann probabilities of the ground (i.e., lowest-), 1st, 2nd and 3rd excited configurations are approximately 43%, 26%, 19% and 12% respectively. This is therefore sufficiently high that sampling from is not simply an optimization problem (where depends overwhelmingly on the ground configuration), but sufficiently low that is mostly determined by a few low- configurations out of . Efficiently estimating using MCMC therefore involves finding these configurations and—crucially—jumping frequently between them in proportion to their Boltzmann probabilities. The magnetization for illustrative trajectories is shown in Fig. 4a for the local and uniform proposal strategies, and for an experimental implementation of our quantum algorithm. While each Markov chain finds a low- configurations quickly, our quantum algorithm jumps between these configurations noticeably faster than the others. The running average estimate for from 10 independent Markov chains of each type is shown in Fig. 4b. Our quantum algorithm converges to the true value of , with no discernible bias, substantially faster than the two classical alternatives despite experimental imperfections.

Figure 5: Quantum speedup mechanism. a. The classically-simulated probabilities of proposals in our quantum algorithm, represented as a matrix whose columns are independent histograms. Both the initial and proposed configurations are sorted by increasing Ising energy . b. The estimated proposal probabilities for our algorithm’s experimental realization. We estimated each by the number of observed proposals normalized by the number of proposals, using a total of recorded quantum transitions. c. The probability distributions of Hamming distance between current () and proposed () configurations, for a uniformly random current configuration. That of the experiment uses the estimated probabilities from panel b, while the rest were computed exactly. d. The analogous distributions for

of proposed jumps. Each distribution is depicted in full detail through its cumulative distribution function, with no binning. All panels are for the same illustrative

model instance; none depend on or on the choice of acceptance probability.

Finally, we examined the mechanism underlying this observed quantum speedup. Recall that the local proposal strategy typically achieves small by picking from the neighbors of , whereas the uniform proposal strategy typically picks far from at the cost of a larger , as illustrated in Fig. 1c. Our quantum algorithm was motivated by the possibility of achieving the best of both: small , and far from . This combination of features is illustrated in Fig. 1c and borne out in Fig. 5. The proposal probabilities arising in our algorithm for the same model instance are shown in Figs. 5a-b for theory and experiment respectively. Both show the same effect with good qualitative agreement: starting from , our quantum algorithm mostly proposes jumps to configurations for which is small, even though and may be far in Hamming distance. This effect is especially pronounced between the lowest- configurations and also between highest- ones.

To further examine this effect we asked the following: for a uniformly random configuration , what is the probability of proposing a jump for which and are separated by a Hamming distance , or by an energy difference ? The resulting distributions are shown in Figs. 5c-d respectively for the local and uniform proposal strategies, as well as for the theoretical and experimental realizations of our quantum algorithm. The Hamming distance distribution for local proposals is concentrated at (by definition), whereas it is much more evenly spread for both uniform proposals and for our quantum algorithm, as shown in Fig. 5c. Conversely, the distribution of local proposals is more concentrated at small than that of uniform proposals. In theory, the corresponding distribution for our quantum algorithm is even more strongly concentrated at small , as shown in Fig. 5d. The distribution from the experimental realization of our algorithm, however, lies between those of the local and uniform proposal strategies, due to experimental imperfections.

Outlook

Current quantum computers can sample from complicated probability distributions. We proposed and experimentally demonstrated a new quantum algorithm which leverages this ability in order to sample from the low-temperature Boltzmann distribution of classical Ising models, which is useful in many applications—not just complicated. Our algorithm uses relatively simple and shallow quantum circuits, thus enabling a quantum speedup on current hardware despite experimental imperfections. It works by alternating between quantum and classical steps on a shot-by-shot basis, unlike variational quantum algorithms, which typically run a quantum circuit many times in each step [44]. Rather, it uses a quantum computer to propose a random bit-string, which is accepted or rejected by a classical computer. The resulting Markov chain is guaranteed to converge to the desired Boltzmann distribution, even though it may not be possible to efficiently simulate classically. In this sense our algorithm is partially heuristic, like most classical MCMC algorithms: the eventual result is theoretically guaranteed, while fast convergence is established empirically.

Many state-of-the-art MCMC algorithms build upon simpler Markov chains in heuristically-motivated ways; for instance, by running several local M-H chains in parallel at different temperatures and occasionally swapping them [45, 18, 19]. Our quantum algorithm may provide a potent new ingredient for such composite algorithms in the near term. However, there remain ample opportunities for refinements and variations. For instance, a more targeted method of picking the parameters and could further accelerate convergence [46]. Indeed, did not depend on the problem size in our implementation, although in some settings it should grow with if the qubits are all to remain within each others’ light cones. Moreover, different quantum processors with different connectivities, such as quantum annealers, may also be well-suited to implement our algorithm, perhaps without needing to discretize the Hamiltonian dynamics in Step 1.

Our algorithm is remarkably robust against imperfections. It achieves a speedup by proposing good jumps—but not every jump needs to be especially good for the algorithm to work well. (This is why picking and at random works well: good values arise often enough.) For instance, if an error occurs in the quantum processor while a jump is being proposed, the proposal will be accepted/rejected as usual, and the Markov chain can still converge to the target distribution provided such errors do not break the symmetry on average [11]. Rather than produce the wrong result, we found that such errors merely slow the convergence at low by making our algorithm more classical. Indeed, in the limit of fully depolarizing noise, our algorithm reduces to MCMC with a uniform proposal. Our simulations suggest that the quantum speedup increases with the problem size . However, we also expect the quantum noise to increase with , in the absence of error correction, as the number of potential errors grows. The combined effect of these competing factors at larger is currently unknown. It is interesting to note, however, that the cubic/quartic speedup we observed, should it persist at larger , might give a quantum advantage on a modest fault-tolerant quantum computer despite the error correction overhead [34, 47].

Characterizing our algorithm at larger scales will require different methods than those employed here. For instance, a Markov chain’s absolute spectral gap is a broad and unambiguous figure of merit, but it is not feasible to measure for large instances. This not an issue with our quantum algorithm in particular, but rather, with MCMC in general. Instead, a more fruitful approach may be to focus directly on how well our algorithm performs in various applications, such as in simulated annealing (for combinatorial optimization), for estimating thermal averages in many-body physics models, or for training and sampling from (classical) Boltzmann machines for machine learning applications.

Acknowledgements

We thank Easwar Magesan and Kristan Temme for useful discussions which helped shape the project, as well as Edward Chen, Nate Earnest-Noble, Andrew Eddins and Daniel Egger for crucial help with the experiments.

Author Contributions

D.L. led the theory and the experiments. G.M., R.M., M.M. and P.W. contributed to the theory; in particular, D.L. and G.M. independently proposed a variant of this algorithm which uses quantum phase estimation, described in Section V-B of [11]. J.S.K., G.M., R.M., M.M. and S.S. contributed to the design of the experiments, and S.S. also contributed to their implementation. D.L. drafted the manuscript and supplemental material; all authors contributed to revising both.

References