1. Introduction and motivation
Quantum computing holds a great promise to enable future simulations of quantum systems that would otherwise suffer from the curse of dimensionality when simulated classically. However, we are currently entering the Noisy Intermediate Scale Quantum (NISQ) device era
[40]. The hallmarks of NISQ era are relatively small (1050 qubits) quantum computing devices and noisy qubits. Therefore, to succeed with quantum simulation on NISQ quantum computers, it is imperative to understand and fully characterize the effects of noise on the state of a quantum device.In quantum mechanics the state of a qubit quantum device is generally described by a density matrix – a by complexvalued positive semidefinite matrix with . From the definition, it takes real parameters to fully specify an arbitrary quantum state. If denotes a quantum observable, described by a by Hermitian matrix (), then measuring
yields one of its eigenvalues sampled from a distribution determined by
. In particular, the probability of the th eigenvalue is , where is theth eigenvector in the braket notation. Since there are at most
eigenvalues, repeated measurements of any observable yields information about at most parameters of . To estimate every parameter of the state one must measure at least independent observables. This approach is known as quantum state tomography [37, 24]. Its applicability is limited to small quantum devices due to exponential scaling in the number of measurements and related issues of solving exponentially large systems of equations. Bayesian methods also have been suggested [29, 5, 25, 20] to facilitate uncertainty quantification (UQ) in the context of quantum state tomography. However, the existing Bayesian quantum tomography solutions face similar scalability challenge as the number of parameters to infer still grows exponentially with the size of a quantum device.Here we consider a fundamentally new approach, which provides a scalable solution to UQ for the output of quantum simulations. We will restrict our attention to a simplified scenario in which the density matrix and observable commute, hence are simultaneously diagonalizable, and the eigenvectors are known. In this way we are able to treat the quantum state, which is represented by the density matrix, as a classical probability distribution, given by the diagonal of the density matrix. Note that the standard quantum state tomography methods would still scale exponentially even for this simplified scenario. To avoid this problem we use the
maximum entropy (MaxEnt) principle [27, 28] to model a complete probability distribution based on a selected subset of observed or exact moments. We then use wellestablished stochastic simulation algorithms [43, 30, 13, 14] to find the MaxEnt distribution with a cost which is polynomial in . Adopting a MaxEnt ansatz for the likelihood provides a Bayesian posterior which (i) converges to the MaxEnt distribution as the number of observations tends to infinity, and (ii) can be simulated with a cost which is polynomial in using recent stochastic simulation methods [32, 42, 49, 45]. It is the topic of ongoing investigation to generalize the results to density matrices, which will be reported in a future publication. It is hoped that a polynomial cost in can be preserved in this case as well, for example combining the methodology here with that of [22, 8].Assume that we know the basis in which the output state from a qubit quantum computer is diagonal. The density matrix in that basis reads,
(1) 
where comprise a basis in the Hilbert space of the quantum computer . We also assume the observable that we measure is diagonal in the same basis i.e.
(2) 
where are the eigenvalues of and is the probability of observing state a measurement outcome corresponding to the eigenvalue .
Without loss of generality we can interpret the probability distribution as a classical probability density
over the binary random variable
. The mathematical task at hand is then to find consistent with the observed outcomes . A canonical example of such density in our context is the Ising model(3) 
which is parametrized by some matrix .
Given first and second moments of a distribution on , the MaxEnt distribution is the Ising model (3) which satisfies the moment constraints [11] . With finitely many observations, one can replace the moments with observed empirical moments, and the corresponding MaxEnt distribution is the same one which maximizes the likelihood under a MaxEnt ansatz for its parameterized form [34]. Through a Bayesian framework [21], this likelihood can be combined with a prior on the parameters, resulting in a posterior distribution, which quantifies the uncertainty in the distribution given the observations. We will develop and implement novel algorithms for the solution of these problems with a cost which is polynomial in , based on recently developed technologies mentioned above. Now we proceed with a more rigorous mathematical setup, and a summary of the results of the paper.
2. Mathematical Setup and Summary of Results
Let , and denote the state . Consider a probability distribution , parameterized by , with density . The target represents the distribution over dependent qubits given a particular value of , and the objective of this work is to identify . We will denote the unnormalized measure by , with density , so that and , with .
Let , for , and define . It is assumed that, for a particular unknown value of , we are given either

exact moments :
(4)

independent and identically distributed (i.i.d.) noisy observations of this process , for .
As mentioned above, the aim in either case is to reconstruct by identifying .
First, consider problem (P2). Assume we seek the probability distribution which maximizes the entropy
(5) 
subject to the constraints (4), with replacing . This is in fact a convex optimization problem. Therefore, if there exists a probability distribution which satisfies the constraints (4), then the necessary first order optimality conditions are also sufficient for global optimality [7]. For this particular problem, the first order conditions can be simplified to , where
(6) 
and the vector of Lagrange multipliers
is the solution to the system of equations(7) 
This is referred to as the method of maximum entropy (MaxEnt). See Chapter 11.1 of [11] for the derivation. A density which is parametrized in the form (6) is more generally referred to as a MaxEnt ansatz [34].
In general, the system of equations (7) does not have a closed form solution, and it is a challenging problem to solve, particularly for large , as one evaluation of the system would consist of operations. The issue is known as the curse of dimensionality, and precludes the use of standard iterative algorithms for solving the problem. We propose to utilize a Robbins Monro type [43, 30] stochastic iterative algorithm for its solution, together with a sequential Monte Carlo (SMC) sampler [15] unbiased estimator of a system of equations equivalent to (7), , within each iteration of the Robbins Monro. It is wellknown that SMC samplers are stable with operations for such targets which can be evaluated pointwise (in ) [3], thus bypassing the curse of dimensionality. That is the first major contribution of the paper.
Now, consider problem (P2), concerned with recovering from the i.i.d. observations . Following the MaxEnt ethos, we choose an appropriate set of functions as above (4), based on our assumptions of the underlying quantum system, and adopt the ansatz (6) for the likelihood of a single observation, given : . For example, one could use the set of first and second moments . The likelihood of given is given by
(8) 
The maximizer of (8) is called the maximum likelihood estimator (MLE). It is equivalent to the maximizer of . We have
(9) 
where
(10) 
The necessary first order optimality condition is vanishing gradient, i.e.
(11) 
which is clearly equivalent to to the MaxEnt solution (7), except with empirical moments replacing exact moments. This problem can therefore be solved using the same method described above.
This solution is still somewhat unsettling. Even if we are content with our choice of , we have arrived at a point estimate for inference given a finite set of observations. The scrupulous Bayesian will not allow the story to end here. Indeed for the sake of uncertainty quantification (UQ), we require a posterior distribution over , or equivalently, a posterior distribution over . In order to accomplish this, we complete the picture with a prior on , giving
(12) 
where the constant of proportionality depends only upon the observations . The second major contribution of the paper concerns quantifying a posteriori uncertainty using this approach. Due to the fact that the likelihood itself has an intractable normalizing constant , this notoriously difficult problem is referred to as doublyintractable [33, 35].
If we could at least evaluate a nonnegative and unbiased estimator of the likelihood , or equivalently (12), then there are algorithms which can be used, such as pseudomarginal Markov chain Monte Carlo (MCMC) method [2, 1]. The recent work [31] proposes to construct unbiased but signed estimates of the target (12), and then they use such estimate within pseudomarginal MCMC by considering the absolute value of the estimate in the acceptance ratio and then storing the sign of the accepted samples in order to correct the ultimate estimator. The work [48] proposes to use alternative unbiased estimators in this same framework. Note that earlier work also appeared which allows one to sample from such a target [33, 35], however these works require being able to sample from for a given value of
. This is possible in some cases, for example the Ising model (or Boltzmann machine)
[41, 9], but not in general. The works [31, 48] are generally applicable.We will take a new approach to doubly intractable simulation in the present work. A number of recently developed methods require only unbiased estimators of (9). These include the stochastic gradient Langevin dynamics (SGLD) [49], the zigzag sampler [4], or the bouncy particle sampler [6, 39]. We propose to construct an unbiased estimator of (9) using the debiasing technique from the recent works [32, 42], and then use this unbiased estimator within such MCMC methods. This is the second major contribution of this work. In particular, we constrain our attention here to the SGLD.
The rest of the paper is organized as follows. In Section 3 the SMC sampler is introduced, which will be used as a crucial component within all subsequent algorithms. In particular, it will be shown how to construct consistent estimators of and unbiased estimators of . In Section 4 we introduce the Robbins Monro algorithm, and present the first main result regarding its implementation using the SMC sampler. Section 5 concerns Bayesian posterior UQ in the context of finitely many observations. In Section 6 we simulate data from a toy model, and illustrate the algorithm from Section 4 for computing the MaxEnt solution using both exact moments and observed moments (where it coincides with the MLE), as well as the algorithm from Section 5 for UQ in the case of observations.
3. SMC samplers for estimation of integrals
Here it will be described how to estimate expectations , for bounded functions , and construct nonnegative unbiased estimators of , for a given value of , using an auxillary variable technique.
Let be a known probability distribution over a general state space , and let be some probability over for , whose density can be evaluated up to a normalizing constant, but cannot be sampled from directly. Define the unnormalized measure associated to by , with density , so that
(13) 
where .
The objective is to approximate expectations for bounded functions
We cannot sample from this distribution, but we can obtain a convergent estimator
using SMC samplers, as described below [15] (see also [26, 36, 10]).
Remark 3.1.
In the context of this paper the target will be considered in each inner iteration of the stochastic iterative algorithm. So . To make the presentation more concrete, let and
(14) 
However, note that the choice of intermediate targets
to interpolate between an initial density
, which can be sampled from exactly, and are essentially arbitrary. For example, one could incorporate several constraints per iteration or anneal in between individual constraints.3.1. Estimating the target
Define
Let denote a Markov kernel such that
Iterate Algorithm 1 and define
(15) 
Assumption 3.1.
There exists some such that for all
Proposition 3.1.
Assume 3.1 holds. Then for , ,
The constant can be made to be independent of if an additional strong mixing condition is assumed on , uniformly in [13].
3.2. Estimating the normalizing constant
Recall , and observe that
Noting that , therefore
This is the normalizing constant of .
Observe that using Algorithm 1 we can construct an estimator by
Recall that for any we have by definition (13). Define
(16) 
The following proposition is proven in [13]. The proof is reproduced in Appendix A for completeness, due to its importance to the results of this paper.
Proposition 3.2.
Assume 3.1 holds. Then .
4. Stochastic approximation
Here it will be assumed that we are given values , i.e. Problem (P2). As mentioned in Section 2, the MaxEnt approach entails solving the convex optimization problem (5) (which has domain ) subject to the constraints , and (4). It is necessary and sufficient to solve the equation (7), reproduced below
(17) 
As mentioned above, a naive computation of the lefthand side of the above equation for a single value of would incur a cost of , which makes the problem intractable for very small values of or so. It is however possible to obtain a noisy estimate using a stochastic simulation algorithm. Algorithms such as MCMC [23, 18] are designed to sample the state space in steps [44], and provide a consistent estimate of (17) for a given value of .
Returning to (17), observe that solving this equation is equivalent to solving . Assume we can simulate unbiased estimates , i.e. estimates such that . The Robbins Monro algorithm [43] for stochastic fixed point iteration for finding a root of the equation takes the form
(18) 
Under technical conditions, and for appropriate choice of such that and , it can be shown that this algorithm converges to a fixed point of (17) [43, 30] without ever solving the equation exactly, even once.
We make the following assumption here, which is sufficient to ensure Assumption 3.1
Assumption 4.1.
There exists a such that for all and
Proposition 4.1.
Consider now running the SMC sampler Algorithm 1 with the choice and to construct the unbiased estimator (16) such that . With this unbiased estimator of , we can use Robbins Monro to solve the fixed point problem for (17), hence the MaxEnt target . The following theorem makes this statement precise.
Theorem 4.1 (Stochastic approximation with SMC).
Assume the evaluation of is . Then the evaluation of is , and since there will be steps, the total cost of the algorithm will be . Since , one may require up to steps of Robbins Monro. Therefore, a conservative estimate on the cost is . Recall that is the number of constraints. For moments, there are constraints, yielding a conservative estimate on the final cost of . The constant should be rather small however, and this is of course a vast improvement on for even the best linear algorithm for solution with exact evaluation of (17).
5. Uncertainty Quantification
In the previous section we arrive at a single optimal value , and a single MaxEnt target likelihood, i.e. uncertainty is not quantified. Here we venture to quantify uncertainty by putting a prior on , and computing (sampling from) the posterior (12). This leads to a socalled doublyintractable target, as described above. In this section a new method is introduced to sample from the target. In particular, in subsection 5.1 we introduce the debiasing method mentioned above, and describe how to construct unbiased (signed) estimators of using estimator(s) from the SMC sampler. In subsection 5.2 we introduce the SGLD algorithm and present the second main result of the paper, regarding its implementation using the debiased estimators. The section is concluded with a description in subsection 5.3 of how to separate an explicit error distribution, which is more expensive and is left to future work.
5.1. Debiasing
Some MCMC methods, such as SGLD [49], as well as some recently developed piecewisedeterministic Markov processes (PMDP) [12, 4, 6, 39], require only an unbiased estimator of the gradient of the loglikelihood for implementation. Others, such as pseudomarginal MCMC, require an unbiased and nonnegative estimator of the likelihood itself. It is generally much easier to construct unbiased estimators which are signed [32, 42]. The work [31] shows how one can utilize the pseudomarginal MCMC even with a signed estimator. However, one may easily argue that it is more natural to use a signed and unbiased estimator within SGLD or one of the PDMP algorithms. We consider the latter strategy here.
Suppose that we have a collection of random variables , where
(19) 
and we are interested in estimating , where
(20) 
Note that (19) guarantees that . If we draw , where , then it is clear that
(21) 
is an unbiased estimator of . Furthermore, if we instead draw , where is defined such that , then one can see that
(22) 
is again an unbiased estimator:
(23) 
The exchange of the infinite sum and expectation is allowed by (19) and Fubini theorem.
Assumption 5.1.
Let (19) hold and assume the probability is chosen such that .
Proposition 5.1.
See [32, 42, 31, 46] and references therein for proof and further discussion of such approaches. In particular, this approach will allow us to transform consistent estimators to unbiased estimators for use in the algorithms mentioned above. The recent literature on PDMP is quite extensive, and so we defer consideration of these algorithms to future work and focus on SGLD, which will be described below in generality. First we illustrate how to construct such debiased estimator of (9).
5.1.1. Debiased SMC sampler estimator
We now describe one way to construct an unbiased estimator of (9) directly, using this debiasing trick, without multiplying through by . In particular, the single term estimator (21) will be considered.
Let . Replace step (ii) from the SMC samplers algorithm with , i.e. omit the resampling step. This algorithm was named annealed importance sampling (AIS) in [36]. The estimator is an unbiased estimator of , and the individual terms are independent: for and all . This means in particular that if we construct
(24) 
from two independent replications, one with total particles and one with total particles, then two estimators have the same distribution; in particular, the same expectation. So we have the following proposition.
Proposition 5.2.
Define the coupled estimator , with particles coming from the same realization of the algorithm described above with particles. Then , with given by (21). The factor here relates to the specific choice of and can be replaced with in general.
Proof.
Observing that in this case and , we can observe that this is the socalled subcanonical case [42]. That is, if the rate of convergence of were faster (or rate of growth of expected cost slower), then we could choose
in such a way that both asymptotic variance and cost were finite. As this is not the case, we instead choose
, as proposed in [42] (see also [16]), which results in finite variance and expected cost which is finite with high probability.5.2. Sgld
Given a symmetric positive definite matrix the following SDE keeps invariant [19, 21]
(25) 
where for
(26) 
For constant , it was proposed in [49] to replace by an unbiased estimator (i.e. ) and approximate the SDE using Euler Maruyama method with stepsize , similar to Robbins Monro, as follows
(27) 
For the moment, assume , and , and define the measure corresponding to by . Furthermore, define the empirical measure corresponding to steps of the constant discretized version of (27) by and define the limiting diffusion (25) sampled at the same points , , by . Then we have
(28) 
The discretization error comes from the standard EulerMaruyama strong convergence rate for SDE with constant diffusion. Define the autocovariance
where is a sample from (25) at time and . One expects . Therefore the integrated autocovariance IACT, and the sample error is asymptotically (after burnin) proportional to IACT. Balancing the error terms, one finds that is optimal, leading to MSE of . It is slightly worse than the MSE of for steps of standard MCMC methods. This simple argument suggests that the optimal choice for is , which is not the same as for SGD. In fact, this result is made rigorous in the work [45, 47] (see also references therein), leading to the following proposition.
Proposition 5.3.
Let , define the measure corresponding to by , choose , and let be generated by (27). Then for suitably regular , one has
(29) 
Furthermore, this choice of is asymptotically optimal in terms of cost, and for steps, one has MSE .
Let , and recall (9), so that we have
(30) 
Assume that can be computed exactly. Now define an unbiased estimator of (30) using the method of Proposition 5.2, i.e. draw , and let
(31) 
Theorem 5.1.
5.2.1. Reimann manifold method
It is assumed in [38] that Proposition 5.3 holds also for deterministic evaluable
, yielding a Reimann manifold SGLD (RMSGLD), but this is not proven. In any case, based on the above heuristic one would expect a different scaling of
to be optimal, since the strong rate of convergence is reduced to , so balancing terms gives and an MSE of . Nevertheless, here we further postulate that (32) holds indeed for the case in which we have an unbiased estimator and a nonnegative unbiased estimator .Let , and let the target be , so that
where denotes the vector of all s, and
where . Then using the notations of Section 3, (26) takes the form
This can be unbiasedly approximated by
(33) 
as can by simply .
This approach would allow direct simulation without debiasing, since only unbiased estimates of the unnormalized measure are required, which can be constructed directly from SMC samplers, as described in Proposition 3.2. Nonetheless, this is not pursued further here due to the expected higher cost of an optimal implementation. We note that the cost of these algorithms can be improved using the multilevel Monte Carlo method [17], but this is deferred to future work.
5.3. Separating an explicit error distribution
Before concluding the section, we mention an extension of the model considered in the rest of the paper which provides deeper UQ. Suppose that we do not have i.i.d. observations , but in fact we have observations of the form
(34) 
where for , and denotes a Bernoulli with probability . In other words, each is independently with probability and otherwise. This model explicitly accounts for error in the observations, i.e. with probability any qubit may have been observed wrong.
We have . The likelihood of a single observation is given by
and so we can see that again we have an integral with respect to . Let . Due to the independence between observations and errors, we have
where we defined .
Therefore
One can define
An unbiased estimator of this function can be constructed from M independent SMC samplers, one for each factor. Despite being massively parallel, this is a significant computational burden, particularly for large . Investigation of efficient algorithms for solving this problem is the topic of ongoing research.
6. Simulations
6.1. Maximum likelihood: exact marginals
Recall the form of , for a fixed from (4.1). Using the machinery from subsection 3.2 we could construct such an unbiased estimator, for any sequence of intermediate targets , and any , provided that . The natural choice is of course to let
(35) 
as above, and this choice will be used here. The MCMC kernels are chosen as MetropolisHastings kernels with the reversible random walk proposal defined by with probability and otherwise, for . In other words, is a vector of independent Bernoullis with probability , and is hence symmetric, with tunable parameter determining the stepsize. We note that a Gibbs sampler can also be used here. Now we let , and iterate (18) until convergence.
In order to test the accuracy of our methodology, we simulate the data as follows. Choose , for and , and let for and otherwise. Now consider the true target
(36) 
Similarly, let be defined as for and otherwise, for coefficients . We will identify with vectors in , where . Notice this is of MaxEnt form with , i.e. first and second moments are observed. We choose , let , take MCMC steps with , and define , where , , , and . For the direction of descent is taken as , while for the full estimator (16) is used , so that this is an unbiased estimator of . This construction prevents large fluctuations in
Comments
There are no comments yet.