Estimation and uncertainty quantification for the output from quantum simulators

03/07/2019 ∙ by Ryan Bennink, et al. ∙ 0

The problem of estimating certain distributions over {0,1}^d is considered here. The distribution represents a quantum system of d qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify posterior uncertainty. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and motivation

Quantum computing holds a great promise to enable future simulations of quantum systems that would otherwise suffer from the curse of dimensionality when simulated classically. However, we are currently entering the Noisy Intermediate Scale Quantum (NISQ) device era 

[40]. The hallmarks of NISQ era are relatively small (10-50 qubits) quantum computing devices and noisy qubits. Therefore, to succeed with quantum simulation on NISQ quantum computers, it is imperative to understand and fully characterize the effects of noise on the state of a quantum device.

In quantum mechanics the state of a -qubit quantum device is generally described by a density matrix – a -by- complex-valued positive semidefinite matrix with . From the definition, it takes real parameters to fully specify an arbitrary quantum state. If denotes a quantum observable, described by a -by- Hermitian matrix (), then measuring

yields one of its eigenvalues sampled from a distribution determined by

. In particular, the probability of the th eigenvalue is , where is the

th eigenvector in the bra-ket notation. Since there are at most

eigenvalues, repeated measurements of any observable yields information about at most parameters of . To estimate every parameter of the state one must measure at least independent observables. This approach is known as quantum state tomography [37, 24]. Its applicability is limited to small quantum devices due to exponential scaling in the number of measurements and related issues of solving exponentially large systems of equations. Bayesian methods also have been suggested [29, 5, 25, 20] to facilitate uncertainty quantification (UQ) in the context of quantum state tomography. However, the existing Bayesian quantum tomography solutions face similar scalability challenge as the number of parameters to infer still grows exponentially with the size of a quantum device.

Here we consider a fundamentally new approach, which provides a scalable solution to UQ for the output of quantum simulations. We will restrict our attention to a simplified scenario in which the density matrix and observable commute, hence are simultaneously diagonalizable, and the eigenvectors are known. In this way we are able to treat the quantum state, which is represented by the density matrix, as a classical probability distribution, given by the diagonal of the density matrix. Note that the standard quantum state tomography methods would still scale exponentially even for this simplified scenario. To avoid this problem we use the

maximum entropy (MaxEnt) principle [27, 28] to model a complete probability distribution based on a selected subset of observed or exact moments. We then use well-established stochastic simulation algorithms [43, 30, 13, 14] to find the MaxEnt distribution with a cost which is polynomial in . Adopting a MaxEnt ansatz for the likelihood provides a Bayesian posterior which (i) converges to the MaxEnt distribution as the number of observations tends to infinity, and (ii) can be simulated with a cost which is polynomial in using recent stochastic simulation methods [32, 42, 49, 45]. It is the topic of ongoing investigation to generalize the results to density matrices, which will be reported in a future publication. It is hoped that a polynomial cost in can be preserved in this case as well, for example combining the methodology here with that of [22, 8].

Assume that we know the basis in which the output state from a -qubit quantum computer is diagonal. The density matrix in that basis reads,

(1)

where comprise a basis in the Hilbert space of the quantum computer . We also assume the observable that we measure is diagonal in the same basis i.e.

(2)

where are the eigenvalues of and is the probability of observing state a measurement outcome corresponding to the eigenvalue .

Without loss of generality we can interpret the probability distribution as a classical probability density

over the binary random variable

. The mathematical task at hand is then to find consistent with the observed outcomes . A canonical example of such density in our context is the Ising model

(3)

which is parametrized by some matrix .

Given first and second moments of a distribution on , the MaxEnt distribution is the Ising model (3) which satisfies the moment constraints [11] . With finitely many observations, one can replace the moments with observed empirical moments, and the corresponding MaxEnt distribution is the same one which maximizes the likelihood under a MaxEnt ansatz for its parameterized form [34]. Through a Bayesian framework [21], this likelihood can be combined with a prior on the parameters, resulting in a posterior distribution, which quantifies the uncertainty in the distribution given the observations. We will develop and implement novel algorithms for the solution of these problems with a cost which is polynomial in , based on recently developed technologies mentioned above. Now we proceed with a more rigorous mathematical setup, and a summary of the results of the paper.

2. Mathematical Setup and Summary of Results

Let , and denote the state . Consider a probability distribution , parameterized by , with density . The target represents the distribution over dependent qubits given a particular value of , and the objective of this work is to identify . We will denote the un-normalized measure by , with density , so that and , with .

Let , for , and define . It is assumed that, for a particular unknown value of , we are given either

  • exact moments :

    (4)
  • independent and identically distributed (i.i.d.) noisy observations of this process , for .

As mentioned above, the aim in either case is to reconstruct by identifying .

First, consider problem (P2). Assume we seek the probability distribution which maximizes the entropy

(5)

subject to the constraints (4), with replacing . This is in fact a convex optimization problem. Therefore, if there exists a probability distribution which satisfies the constraints (4), then the necessary first order optimality conditions are also sufficient for global optimality [7]. For this particular problem, the first order conditions can be simplified to , where

(6)

and the vector of Lagrange multipliers

is the solution to the system of equations

(7)

This is referred to as the method of maximum entropy (MaxEnt). See Chapter 11.1 of [11] for the derivation. A density which is parametrized in the form (6) is more generally referred to as a MaxEnt ansatz [34].

In general, the system of equations (7) does not have a closed form solution, and it is a challenging problem to solve, particularly for large , as one evaluation of the system would consist of operations. The issue is known as the curse of dimensionality, and precludes the use of standard iterative algorithms for solving the problem. We propose to utilize a Robbins Monro type [43, 30] stochastic iterative algorithm for its solution, together with a sequential Monte Carlo (SMC) sampler [15] unbiased estimator of a system of equations equivalent to (7), , within each iteration of the Robbins Monro. It is well-known that SMC samplers are stable with operations for such targets which can be evaluated pointwise (in ) [3], thus bypassing the curse of dimensionality. That is the first major contribution of the paper.

Now, consider problem (P2), concerned with recovering from the i.i.d. observations . Following the MaxEnt ethos, we choose an appropriate set of functions as above (4), based on our assumptions of the underlying quantum system, and adopt the ansatz (6) for the likelihood of a single observation, given : . For example, one could use the set of first and second moments . The likelihood of given is given by

(8)

The maximizer of (8) is called the maximum likelihood estimator (MLE). It is equivalent to the maximizer of . We have

(9)

where

(10)

The necessary first order optimality condition is vanishing gradient, i.e.

(11)

which is clearly equivalent to to the MaxEnt solution (7), except with empirical moments replacing exact moments. This problem can therefore be solved using the same method described above.

This solution is still somewhat unsettling. Even if we are content with our choice of , we have arrived at a point estimate for inference given a finite set of observations. The scrupulous Bayesian will not allow the story to end here. Indeed for the sake of uncertainty quantification (UQ), we require a posterior distribution over , or equivalently, a posterior distribution over . In order to accomplish this, we complete the picture with a prior on , giving

(12)

where the constant of proportionality depends only upon the observations . The second major contribution of the paper concerns quantifying a posteriori uncertainty using this approach. Due to the fact that the likelihood itself has an intractable normalizing constant , this notoriously difficult problem is referred to as doubly-intractable [33, 35].

If we could at least evaluate a non-negative and unbiased estimator of the likelihood , or equivalently (12), then there are algorithms which can be used, such as pseudo-marginal Markov chain Monte Carlo (MCMC) method [2, 1]. The recent work [31] proposes to construct unbiased but signed estimates of the target (12), and then they use such estimate within pseudo-marginal MCMC by considering the absolute value of the estimate in the acceptance ratio and then storing the sign of the accepted samples in order to correct the ultimate estimator. The work [48] proposes to use alternative unbiased estimators in this same framework. Note that earlier work also appeared which allows one to sample from such a target [33, 35], however these works require being able to sample from for a given value of

. This is possible in some cases, for example the Ising model (or Boltzmann machine)

[41, 9], but not in general. The works [31, 48] are generally applicable.

We will take a new approach to doubly intractable simulation in the present work. A number of recently developed methods require only unbiased estimators of (9). These include the stochastic gradient Langevin dynamics (SGLD) [49], the zig-zag sampler [4], or the bouncy particle sampler [6, 39]. We propose to construct an unbiased estimator of (9) using the debiasing technique from the recent works [32, 42], and then use this unbiased estimator within such MCMC methods. This is the second major contribution of this work. In particular, we constrain our attention here to the SGLD.

The rest of the paper is organized as follows. In Section 3 the SMC sampler is introduced, which will be used as a crucial component within all subsequent algorithms. In particular, it will be shown how to construct consistent estimators of and unbiased estimators of . In Section 4 we introduce the Robbins Monro algorithm, and present the first main result regarding its implementation using the SMC sampler. Section 5 concerns Bayesian posterior UQ in the context of finitely many observations. In Section 6 we simulate data from a toy model, and illustrate the algorithm from Section 4 for computing the MaxEnt solution using both exact moments and observed moments (where it coincides with the MLE), as well as the algorithm from Section 5 for UQ in the case of observations.

3. SMC samplers for estimation of integrals

Here it will be described how to estimate expectations , for bounded functions , and construct non-negative unbiased estimators of , for a given value of , using an auxillary variable technique.

Let be a known probability distribution over a general state space , and let be some probability over for , whose density can be evaluated up to a normalizing constant, but cannot be sampled from directly. Define the unnormalized measure associated to by , with density , so that

(13)

where .

The objective is to approximate expectations for bounded functions

We cannot sample from this distribution, but we can obtain a convergent estimator

using SMC samplers, as described below [15] (see also [26, 36, 10]).

Remark 3.1.

In the context of this paper the target will be considered in each inner iteration of the stochastic iterative algorithm. So . To make the presentation more concrete, let and

(14)

However, note that the choice of intermediate targets

to interpolate between an initial density

, which can be sampled from exactly, and are essentially arbitrary. For example, one could incorporate several constraints per iteration or anneal in between individual constraints.

3.1. Estimating the target

Define

Let denote a Markov kernel such that

Iterate Algorithm 1 and define

(15)

Let . For , repeat the following steps for :

  • Define .

  • Resample. e.g. select , and let .

  • Mutate. Draw .

Algorithm 1 SMC sampler
Assumption 3.1.

There exists some such that for all

Proposition 3.1.

Assume 3.1 holds. Then for , ,

The constant can be made to be independent of if an additional strong mixing condition is assumed on , uniformly in [13].

3.2. Estimating the normalizing constant

Recall , and observe that

Noting that , therefore

This is the normalizing constant of .

Observe that using Algorithm 1 we can construct an estimator by

Recall that for any we have by definition (13). Define

(16)

The following proposition is proven in [13]. The proof is reproduced in Appendix A for completeness, due to its importance to the results of this paper.

Proposition 3.2.

Assume 3.1 holds. Then .

4. Stochastic approximation

Here it will be assumed that we are given values , i.e. Problem (P2). As mentioned in Section 2, the MaxEnt approach entails solving the convex optimization problem (5) (which has domain ) subject to the constraints , and (4). It is necessary and sufficient to solve the equation (7), reproduced below

(17)

As mentioned above, a naive computation of the left-hand side of the above equation for a single value of would incur a cost of , which makes the problem intractable for very small values of or so. It is however possible to obtain a noisy estimate using a stochastic simulation algorithm. Algorithms such as MCMC [23, 18] are designed to sample the state space in steps [44], and provide a consistent estimate of (17) for a given value of .

Returning to (17), observe that solving this equation is equivalent to solving . Assume we can simulate unbiased estimates , i.e. estimates such that . The Robbins Monro algorithm [43] for stochastic fixed point iteration for finding a root of the equation takes the form

(18)

Under technical conditions, and for appropriate choice of such that and , it can be shown that this algorithm converges to a fixed point of (17) [43, 30] without ever solving the equation exactly, even once.

We make the following assumption here, which is sufficient to ensure Assumption 3.1

Assumption 4.1.

There exists a such that for all and

Proposition 4.1.

Assume 4.1. Let be such that . If is chosen so that and , then , where is given by (18) and is the solution to , hence (17).

Consider now running the SMC sampler Algorithm 1 with the choice and to construct the unbiased estimator (16) such that . With this unbiased estimator of , we can use Robbins Monro to solve the fixed point problem for (17), hence the MaxEnt target . The following theorem makes this statement precise.

Theorem 4.1 (Stochastic approximation with SMC).

Assume 4.1 holds, and let be generated by the iteration (18), where is independently generated from an SMC sampler algorithm at each iteration, as defined in (16). If is chosen so that and then , where is a solution of , hence (17).

Proof.

By Proposition 3.2 . The conclusion is an immediate consequence of Proposition 4.1. ∎

Assume the evaluation of is . Then the evaluation of is , and since there will be steps, the total cost of the algorithm will be . Since , one may require up to steps of Robbins Monro. Therefore, a conservative estimate on the cost is . Recall that is the number of constraints. For moments, there are constraints, yielding a conservative estimate on the final cost of . The constant should be rather small however, and this is of course a vast improvement on for even the best linear algorithm for solution with exact evaluation of (17).

5. Uncertainty Quantification

In the previous section we arrive at a single optimal value , and a single MaxEnt target likelihood, i.e. uncertainty is not quantified. Here we venture to quantify uncertainty by putting a prior on , and computing (sampling from) the posterior (12). This leads to a so-called doubly-intractable target, as described above. In this section a new method is introduced to sample from the target. In particular, in subsection 5.1 we introduce the debiasing method mentioned above, and describe how to construct unbiased (signed) estimators of using estimator(s) from the SMC sampler. In subsection 5.2 we introduce the SGLD algorithm and present the second main result of the paper, regarding its implementation using the debiased estimators. The section is concluded with a description in subsection 5.3 of how to separate an explicit error distribution, which is more expensive and is left to future work.

5.1. Debiasing

Some MCMC methods, such as SGLD [49], as well as some recently developed piecewise-deterministic Markov processes (PMDP) [12, 4, 6, 39], require only an unbiased estimator of the gradient of the log-likelihood for implementation. Others, such as pseudo-marginal MCMC, require an unbiased and non-negative estimator of the likelihood itself. It is generally much easier to construct unbiased estimators which are signed [32, 42]. The work [31] shows how one can utilize the pseudo-marginal MCMC even with a signed estimator. However, one may easily argue that it is more natural to use a signed and unbiased estimator within SGLD or one of the PDMP algorithms. We consider the latter strategy here.

Suppose that we have a collection of random variables , where

(19)

and we are interested in estimating , where

(20)

Note that (19) guarantees that . If we draw , where , then it is clear that

(21)

is an unbiased estimator of . Furthermore, if we instead draw , where is defined such that , then one can see that

(22)

is again an unbiased estimator:

(23)

The exchange of the infinite sum and expectation is allowed by (19) and Fubini theorem.

Assumption 5.1.

Let (19) hold and assume the probability is chosen such that .

Proposition 5.1.

For either estimator , let be i.i.d. draws of the estimator (21) or (22), for . Under assumption 5.1, and we have,

with . The expected cost is given by

where is the cost to obtain the realization .

See [32, 42, 31, 46] and references therein for proof and further discussion of such approaches. In particular, this approach will allow us to transform consistent estimators to unbiased estimators for use in the algorithms mentioned above. The recent literature on PDMP is quite extensive, and so we defer consideration of these algorithms to future work and focus on SGLD, which will be described below in generality. First we illustrate how to construct such de-biased estimator of (9).

5.1.1. Debiased SMC sampler estimator

We now describe one way to construct an unbiased estimator of (9) directly, using this de-biasing trick, without multiplying through by . In particular, the single term estimator (21) will be considered.

Let . Replace step (ii) from the SMC samplers algorithm with , i.e. omit the resampling step. This algorithm was named annealed importance sampling (AIS) in [36]. The estimator is an unbiased estimator of , and the individual terms are independent: for and all . This means in particular that if we construct

(24)

from two independent replications, one with total particles and one with total particles, then two estimators have the same distribution; in particular, the same expectation. So we have the following proposition.

Proposition 5.2.

Define the coupled estimator , with particles coming from the same realization of the algorithm described above with particles. Then , with given by (21). The factor here relates to the specific choice of and can be replaced with in general.

Proof.

Following from the explanation above, the equation (20) takes the form , so the result follows from Proposition 5.1. ∎

Observing that in this case and , we can observe that this is the so-called sub-canonical case [42]. That is, if the rate of convergence of were faster (or rate of growth of expected cost slower), then we could choose

in such a way that both asymptotic variance and cost were finite. As this is not the case, we instead choose

, as proposed in [42] (see also [16]), which results in finite variance and expected cost which is finite with high probability.

5.2. Sgld

Given a symmetric positive definite matrix the following SDE keeps invariant [19, 21]

(25)

where for

(26)

For constant , it was proposed in [49] to replace by an unbiased estimator (i.e. ) and approximate the SDE using Euler Maruyama method with stepsize , similar to Robbins Monro, as follows

(27)

For the moment, assume , and , and define the measure corresponding to by . Furthermore, define the empirical measure corresponding to steps of the constant -discretized version of (27) by and define the limiting diffusion (25) sampled at the same points , , by . Then we have

(28)

The discretization error comes from the standard Euler-Maruyama strong convergence rate for SDE with constant diffusion. Define the auto-covariance

where is a sample from (25) at time and . One expects . Therefore the integrated auto-covariance IACT, and the sample error is asymptotically (after burn-in) proportional to IACT. Balancing the error terms, one finds that is optimal, leading to MSE of . It is slightly worse than the MSE of for steps of standard MCMC methods. This simple argument suggests that the optimal choice for is , which is not the same as for SGD. In fact, this result is made rigorous in the work [45, 47] (see also references therein), leading to the following proposition.

Proposition 5.3.

Let , define the measure corresponding to by , choose , and let be generated by (27). Then for suitably regular , one has

(29)

Furthermore, this choice of is asymptotically optimal in terms of cost, and for steps, one has MSE .

Let , and recall (9), so that we have

(30)

Assume that can be computed exactly. Now define an unbiased estimator of (30) using the method of Proposition 5.2, i.e. draw , and let

(31)
Theorem 5.1.

Let , define the measure corresponding to by , choose , and let be generated by (27), with defined as in (31). Then for suitably regular , one has

(32)

Furthermore, this choice of is asymptotically optimal in terms of cost, and for steps, one has MSE .

Proof.

This follows directly from Propositions 5.2 and 5.3. ∎

5.2.1. Reimann manifold method

It is assumed in [38] that Proposition 5.3 holds also for deterministic evaluable

, yielding a Reimann manifold SGLD (RMSGLD), but this is not proven. In any case, based on the above heuristic one would expect a different scaling of

to be optimal, since the strong rate of convergence is reduced to , so balancing terms gives and an MSE of . Nevertheless, here we further postulate that (32) holds indeed for the case in which we have an unbiased estimator and a non-negative unbiased estimator .

Let , and let the target be , so that

where denotes the vector of all s, and

where . Then using the notations of Section 3, (26) takes the form

This can be unbiasedly approximated by

(33)

as can by simply .

This approach would allow direct simulation without de-biasing, since only unbiased estimates of the unnormalized measure are required, which can be constructed directly from SMC samplers, as described in Proposition 3.2. Nonetheless, this is not pursued further here due to the expected higher cost of an optimal implementation. We note that the cost of these algorithms can be improved using the multilevel Monte Carlo method [17], but this is deferred to future work.

5.3. Separating an explicit error distribution

Before concluding the section, we mention an extension of the model considered in the rest of the paper which provides deeper UQ. Suppose that we do not have i.i.d. observations , but in fact we have observations of the form

(34)

where for , and denotes a Bernoulli with probability . In other words, each is independently with probability and otherwise. This model explicitly accounts for error in the observations, i.e. with probability any qubit may have been observed wrong.

We have . The likelihood of a single observation is given by

and so we can see that again we have an integral with respect to . Let . Due to the independence between observations and errors, we have

where we defined .

Therefore

One can define

An unbiased estimator of this function can be constructed from M independent SMC samplers, one for each factor. Despite being massively parallel, this is a significant computational burden, particularly for large . Investigation of efficient algorithms for solving this problem is the topic of ongoing research.

6. Simulations

6.1. Maximum likelihood: exact marginals

Recall the form of , for a fixed from (4.1). Using the machinery from subsection 3.2 we could construct such an unbiased estimator, for any sequence of intermediate targets , and any , provided that . The natural choice is of course to let

(35)

as above, and this choice will be used here. The MCMC kernels are chosen as Metropolis-Hastings kernels with the reversible random walk proposal defined by with probability and otherwise, for . In other words, is a vector of independent Bernoullis with probability , and is hence symmetric, with tunable parameter determining the step-size. We note that a Gibbs sampler can also be used here. Now we let , and iterate (18) until convergence.

In order to test the accuracy of our methodology, we simulate the data as follows. Choose , for and , and let for and otherwise. Now consider the true target

(36)

Similarly, let be defined as for and otherwise, for coefficients . We will identify with vectors in , where . Notice this is of MaxEnt form with , i.e. first and second moments are observed. We choose , let , take MCMC steps with , and define , where , , , and . For the direction of descent is taken as , while for the full estimator (16) is used , so that this is an unbiased estimator of . This construction prevents large fluctuations in