Informationally complete POVM-based shadow tomography

05/12/2021 ∙ by Atithi Acharya, et al. ∙ 0

Recently introduced shadow tomography protocols use classical shadows of quantum states to predict many target functions of an unknown quantum state. Unlike full quantum state tomography, shadow tomography does not insist on accurate recovery of the density matrix for high rank mixed states. Yet, such a protocol makes multiple accurate predictions with high confidence, based on a moderate number of quantum measurements. One particular influential algorithm, proposed by Huang, Kueng, and Preskill arXiv:2002.08953, requires additional circuits for performing certain random unitary transformations. In this paper, we avoid these transformations but employ an arbitrary informationally complete POVM and show that such a procedure can compute k-bit correlation functions for quantum states reliably. We also show that, for this application, we do not need the median of means procedure of Huang et al. Finally, we discuss the contrast between the computation of correlation functions and fidelity of reconstruction of low rank density matrices.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent advances in quantum information processing often require characterizing quantum states prepared during various stages of a procedure. As a result, the problem of characterising a quantum state, more specifically, a density matrix, from measurements on an ensemble of identical states, known as quantum state tomography (QST), has seen a surge of interest [Torlai_2018, Carrasquilla_2019, Huang_2020]. One of the key challenges is that, for

-qubit quantum systems, the density matrix is of size

. As the number of qubits become large, inferring the density matrix from a limited number of measurements becomes difficult.

Can we get away without fully characterizing the quantum state, but by constructing an approximate classical description that predicts many different functions of the state accurately? Shadow Tomography [aaronson2018shadow] precisely aims to do this, namely, predict a power law number of observations in number of qubits, , from copies of the density matrix . This idea was taken further by Huang et al. [Huang_2020] who have constructed such a description of low sample complexity via classical shadows (), related to states without any entanglement in the appropriate basis, corresponding to each copy of .

Quantum measurement requires specifying a set of Positive Operator Valued Measures (POVMs) [Nielsen] which is a generalization of a complete set of projection operators. The work by Huang et al. [Huang_2020] involves measurements via projection operators. Since projection operators are not informationally complete (see Sec. II.1), Huang et al. employ a set of random unitary transformations before taking measurements. In the work that follows, we directly employ a complete or overcomplete POVM system and perform shadow tomography. This simplification also gives us insight into the optimality of prediction over the choice of POVMs.

Ii Generalized Measurements

A projective measurement is described by an observable, , a Hermitian operator on the state space of the system being observed. The observable has a spectral decomposition, where

is the projector onto the eigenspace of

with eigenvalue

. The possible outcomes of the measurement corresponding to the eigenvalues,

, of the observable and the outcome probability is

. Projection Valued Measures (PVMs) are a special case of general measurements, where the measurement operators are Hermitian and orthogonal projectors. A set of Positive Operator Valued Measures (POVMs) [Nielsen] forms a generalization of PVMs. The index in the POVM element refers to the measurement outcomes that may occur in the experiment. The probability of the measurement outcome is given by and the post measurement density matrix can be written as , where are the Kraus operators [Nielsen] corresponding to the POVM, with . The operators form a complete set of Hermitian non-negative operators. Namely, they satisfy ,

for any vector

and . Such a POVM could be thought of as a partition of unity by non-negative operators.

ii.1 Informational completeness

The density matrix () is a Hermitian and unit trace operator. If we have a -dimensional system, will be a complex square matrix represented by real parameters. The operator space for this -dimensional operator will be however spanned by linearly independent basis operators. Note that PVMs only have projection operators. They are capable of providing only the diagonal elements of in a particular orthonormal basis, leaving out potential entanglement-related information from the off-diagonal elements. Thus, PVMs are examples of POVMs that are informationally undercomplete.

If the number of outcomes satisfies , and we can form exactly linearly independent operators by linearly combining the set of POVMs, such POVMs will be called informationally complete. However, in the most common terminology, informationally complete actually refers to the minimally complete POVM (). If we proceed to reconstruct the density matrix for an informationally complete POVM, we can expand as

(1)

If we have a informationally complete (minimally complete) basis set. However, if we have it forms an informationally overcomplete set [Renes_2004].

We start out by giving the example of a rather simple overcomplete set in the single-qubit Hilbert space, . Pauli-6 POVM has 6 outcomes where , , and stand for the eigenbases of the Pauli operators , , and , respectively. Experimentally, it can be implemented directly by first randomly choosing , , or , and then measuring the respective Pauli operator, which justifies the factor. However, other probabilities will also be valid for this example of an overcomplete POVM.

Now, let us give an example of a minimally complete POVM, the Pauli-4 POVM:. As a sanity check for the completeness relation, one can see . The experimental procedure will be similar to that of Pauli-6 POVM, with an additional step where three different outcomes of Pauli-6 are identified as the single element of Pauli-4, . Thus, this set contains an element which is not a rank-1 projector.
The third one is the tetrahedral POVM , whose outcomes correspond to sub-normalized rank-1 projectors along the directions , , , and in the Bloch sphere. Since the tetrahedron formed is regular, it forms an example of a symmetric informationally complete (SIC) POVM. The experimental implementation of relies on Neumark’s dilation theorem. The theorem implies that can be physically realized by coupling the system qubit to an ancillary qubit and performing a von Neumann measurement on the two qubits (see Ref. [Carrasquilla_2019, PhysRevA.86.062107] for explicit constructions).

Iii Classical Shadows with POVMs

Aaronson introduced the idea of “pretty good tomography”[Aaronson_2007], with the focus on predicting many observations accurately, based on copies of the density matrix. This idea parallels the “learnability” of quantum states in a Probably Approximately Correct (PAC) sense [PAC]. Proceeding along this line, he later introduced the concept of Shadow Tomography [aaronson2018shadow], where from copies of the density matrix , we want to predict different linear target functions up to an additive error less than .

Huang et al. [Huang_2020] build their methods on the idea of Shadow Tomography [aaronson2018shadow]. They repeatedly perform a measurement procedure, i.e. apply a random unitary to rotate the state () and perform a computational-basis measurement. Then, after the measurement, they apply the inverse of to the resulting computational basis state. This procedure collapses to a snapshot , producing a quantum channel , which depends on the ensemble of (random) unitary transformations.

If the collection of unitaries is defined to be tomographically complete, namely, if the condition i.e. for each , there exist and such that is met, then — viewed as a linear map — has a unique inverse . Huang et al. [Huang_2020] set

(2)

Although the inverted channel is not physical (it is not completely positive), one can still apply to the (classically stored) measurement outcome in a completely classical post-processing step. Even if an individual sample of is not a density matrix, the expectation of ’s is the original density matrix . One can use this property to get a good prediction of measurements performed on .

If, instead of working with the computational basis measurements, we decide to use an informationally complete set of POVMs (Sec. II.1), we can avoid dealing with particular random unitary ensembles. The only thing we need to make sure is that the resulting channel is invertible.

iii.1 POVMs for the the -qubit system

From single qubit POVMs , we introduce

operators by taking tensor products and form POVMs for the

-qubit system: . The outcomes of this measurements in this system are of the form . Now, we discuss how to form shadows from such an observation.

iii.2 A synthetic measurement channel

Let the POVM elements be diagonalised as follows: , since . Let be a strictly monotonic function which will be applied to the eigenvalues of the POVM elements. The function is defined on since the eigenvalues are non-negative. The probability outcome ‘’ is given as

(3)

Each time we perform a measurement and get an outcome ‘’, we construct a pure output state with probability . We assume each to be non-zero, guaranteeing that the denominator . Although this is a synthetic channel, we will refer to it as the measurement channel, in analogy with the case where are projections.

The measurement channel, for a single qubit, can be defined as

(4)

For simplicity, in the following discussion, we consider the case where the highest eigenvalue of each is non-degenerate. The modifications needed for the general case are obvious. If a particular POVM element is not a rank one projector and the function is very steeply increasing, then the overwhelmingly likely output is , where

is the eigenvector corresponding to the highest eigenvalue of

. An example of such a function is in the large limit. In the large limit, as we perform a measurement, the output is (snapshots) with probability . The measurement channel can be defined using a convex combination of the snapshots as

(5)

In a more general scheme, like the one mentioned in the beginning of the subsection,

is a random vector chosen according to a probability distribution. For example, in the current scheme, if the largest eigenvalue of

is degenerate, we choose any one of the corresponding eigenvectors with equal probability.

In the formalism developed in [Huang_2020], the channel and its inversion were related to the ensemble of (random) unitary transformations (e.g. Clifford unitary ensemble). The condition of tomographical completeness depended on the existence of a unitary transformation in the chosen ensemble to distinguish different density matrices [Huang_2020]. However, with our reformulation of the measurement channel, we need to use an informationally complete set POVMs (e.g. Pauli-6, see Sec. II).

In the example of a single qubit measured using the 6 projectors coming from the 3 Pauli matrices i.e. Pauli-6 POVM, the channel and its inverse can be explicitly computed. Similar to the classical shadows built out of random Pauli measurements [Huang_2020], we get a depolarizing channel i.e. a channel that contracts a pure state (lying on the surface of the Bloch sphere) towards the ‘center’ of the Bloch sphere, namely, the maximally mixed state . The inverse (a non-physical map) can be computed, which can map a point inside the Bloch ball to the outside.

Figure 1: The convex region in the figure is the set of admissible density matrices. We schematically describe the process of forming classical shadows from copies of . For the -th observation with outcome ‘’, the inverse of the channel, , acts on the projectors to construct the shadow . The sample mean of the shadows cast by i.e. fluctuates around the true and could be outside the convex region. However, while measuring -local observables [Huang_2020] , the convergence of the sample averages to the true expected values can be guaranteed with a number of samples . See Theorem 1.

Multi-qubit system: For local measurements (not necessarily the depolarizing channel), the inverse channel for the -qubit system can be written as

(6)

We can now reformulate the shadows with our overcomplete POVM set and its corresponding channel. For instance, when we work with Pauli-6 POVM, we will get

(7)

where (see Sec. A.2). Note that the matrix need not be constructed explicitly. We just need to store for each qubit .

Since the inverted channel is not physical (it is not completely positive), the in Eq. (7) need not be physical. In other words, there is no guarantee the output of the inverse channel is positive semidefinite. See Fig. 1 for a schematic description. We recover the true density matrix only in expectation. However, if the shadow matrix is forced to be positive semidefinite, we can see how the observations such as fidelity changes (see Sec. IV.1).

iii.3 Noisy shadow

Earlier, we defined our measurement channel, Eq. (5). However, we can also let each of our qubits pass through a previously characterized noise channel and then take the measurements [koh2020classical]. The combined channel is given by

(8)

We used informationally complete set of POVMs to ensure that the measurement channel was invertible. As long as the action of the noise channel itself is invertible, is also invertible. We will work with an -qubit noise channel of the form . Thus, we can still write the inverse of the new noisy measurement channel for the -qubit system in terms of the single qubit inverse shadow channel :

(9)

If we choose an amplitude damping channel with damping parameter , one of the Kraus operator representations can be given as

(10)

where , .

The inverse of the noisy shadow channel is given in Eq. (24). Its action on and is given as , and . See Sec.A.2 for a general description on the inversion of a noisy shadow channel. Here, we will construct the shadows (noisy) with the following definition:

(11)

iii.4 Predicting linear functions with classical shadows

Using the statistical properties of a single shadow, we can predict linear functions in the unknown state as

(12)

In practice, using an array of shadows (i.e.

snapshots), we can estimate the expectation

. Given an array of independent classical snapshots (each defined as in Eq. (7)) :

(13)

The sample mean is This sample mean will fluctuate around the true prediction, with .

iii.5 The algorithm and the guarantee of performance

We want to predict the expected value of multiple -local observables based on shadows using the two algorithms below.

Input: A set of informationally complete POVMs with outcomes, (N copies of the unknown density matrix)
1 Compute the measurement channel and its inverse for the chosen informationally complete POVM. (See Sec. A.2) ;
2 for  do
3      Perform measurements using the POVM elements to get outcomes ;
4      Construct shadows (See Sec. III.2, A for the general version)
5      end for
     Output:
Algorithm 1 Generating Shadows with POVMs
Input: A POVM set, N copies of unknown density matrix , L different -local Pauli observables and error parameters
Find bounds on the local observables . (See Sec.A.3 for details). Using algorithm.1, collect shadows. Compute means Output:
Algorithm 2 Predicting many properties using mean as an estimate

The existence of the bound is guaranteed by the following theorem.

Theorem 1.

With samples of , we can predict different linear target functions up to additive error with maximum failure probability .

The constant bound will depend on the measurement channel (which depends on the choice of POVM) and on the operator set ). The important thing is that is bounded for so called -local operators, as defined in [Huang_2020].

For instance, if we choose Pauli-6, the bound is given as , in which case (See Sec.A.3). See section III.2 for the algorithm, including the construction of the measurement channel. In the appendix section on sample complexity (Sec. A.3), the details of the proof is provided.

Iv Numerical Results

For many quantum systems in Condensed Matter Physics, one of the objects of interest is the two-point correlation function. Two-point correlators could be efficiently estimated using classical shadows based on Pauli-6 POVM. The predictions of two-point functions for the GHZ states with varying degree of noise is shown in Fig. 2.

Figure 2: Prediction of two-point correlations,

, for noisy GHZ target states using classical shadows for Pauli-6, with 1-standard deviation band. The standard deviations are estimated over ten independent runs, each of which involved

samples. The parameter , representing the local depolarizing noise strength, is described in Eq. (14).

We can write the action of the single qubit depolarizing noise [Nielsen] on an arbitrary written in the Bloch sphere representation:

(14)

Applying this channel to every qubit, we generate a noise GHZ state [greenberger2007going] from a pure one. The expected two-point correlations varies as with the noise parameter .
While predicting multiple , two-point or -point correlations, we monitor the maximum possible error among all the observables. This measure of error is expected to go down with increasing number of samples. This scaling, as seen in Fig. 3, gives us some idea of the appropriateness of a POVM set for a particular task.

(a) GHZ
(b) Spin down
Figure 3: Maximum error in two-point correlators. (a) Scaling of maximum error among all two-point correlations in 30 qubit pure GHZ state, plotted against different number of samples for different choice of POVMs: Pauli-6, Pauli-4 and tetrahedral. (b) Scaling of maximum error for all spin down state with Pauli-4 and Pauli-6 POVM. Pauli-4 ensures a much better scaling. See Sec. A.3 for details.

iv.0.1 1D Transverse Field Ising Model

We take antiferromagnetic ( in Eq.  (15) ) transverse field Ising model in 1D:

(15)

The quantum critical point at will be exhibited by the power-law decay of the correlations. See Fig. 4 for results in the three regimes: critical, ordered and paramagnetic. The exact numerical correlations are plotted using the matrix product representations of the ground states. [Orus_2014]. In [Carrasquilla_2019] and [Luchnikov_2019]

, POVM-based measurements, followed by a neural-network-centric approach for constructing the ground state, and computing the resulting two-point correlations were presented for the same system.

(a)
(b) , and
(c) , and
Figure 4: Two-point functions for ground states of antiferromagnetic 1D tranverse field Ising model using Pauli-6 POVM based shadows and the true value, as computed using matrix product states. The correlations are plotted against the lattice separation. The lattice size is 30 and the number of samples used is 5000. (a) Critical () antiferromagnetic 1D TFIM, showing signatures of power-law correlation. (b) Ordered state (), where correlations saturates with increasing lattice separation. (c) The paramagnetic state (, displaying exponential decay of the correlations.

iv.0.2 1D Disordered Heisenberg Model

The Hamiltonian for the 1D disordered Heisenberg model is given by,

(16)

The properties of spin- antiferromagnetic chains with various types of random exchange coupling has been studied in an exact decimation renormalization-group (strong-disorder) schemes, some of which involve generalization or modifications of the scheme introduced by Dasgupta and Ma [1980PhRvB..22.1305D]. The numerical studies done by R.N. Bhatt and P.A. Lee [Bhatt_Lee] indicate that the system could be in a random-singlet phase. In such a phase, each spin is paired with another spin that may be far away on the lattice. We perform exact diagonalization, obtain the ground state and then compute two-point quantum correlations. The 2d plot of the correlation matrix will also inform us about the locations of the singlet formations in the chain. We can also reconstruct these behavior of a ground-state corresponding to one particular disorder realization of the XXZ-Heisenberg model Eq. (16) (, ) with sufficient number of shadows. See Fig. 5, where the singlet formations are indicated by the schematics drawn on the axes of the matrix visualization plots and the results from the two methods are compared.

(a) Exact Diagonalization
(b) Reconstruction with Shadows
Figure 5: Two-point functions for ground states of disordered 1D Heisenberg spin chain with length=10 and open boundary conditions. (a) Exact diagonalization results. (b) Results from using Pauli-6 POVM based shadows using 5000 samples.

iv.1 Exploring quantum fidelity

In our approach to construct shadows using local POVMs, we ensure prediction of local observables. However, we can also explore non-local observables such as fidelity. Using sample mean as an estimator, we can construct a hypothesis state ():

(17)

When our target state is pure, we can rewrite quantum fidelity as a linear prediction with our target observable given as . Starting from this definition of quantum fidelity i.e. , using we get . Further simplification of the fidelity gives us

Figure 6: Quantum fidelity predicted for the pure GHZ state using sample mean of shadows constructed on

samples. The shaded regions are the standard deviation over ten independent runs. The inset shows the scaling of the variance of fidelity which grows exponentially with number of qubits.

The measure is equivalent to fidelity, only when the latter is defined, i.e. when . That property is likely to hold only when the number of samples is large. We expect to fluctuate around its mean value 1, as seen in Fig. 6, even when the typical is not a physical state, meaning it is not positive semidefinite. Also, the fluctuation around this mean keeps on growing exponentially with the number of qubits (see Fig. 6). This growth cannot be dealt with even by the median of means (MoM) procedure [Huang_2020] within the shadow formalism. Numerical computations using MoM also show no advantage over sample means here.

Hence, we need a procedure to find the ‘closest’ physical state to . The trace condition ensures that once , some of the eigenvalues will be greater than 1 to compensate for the negative eigenvalues. Thus, we cannot just throw away the negative eigenvalues, as would be done for projecting a Hermitian matrix to the space of positive semidefinite matrices.

We define the the convex set of physical states to be . Our nonlinear projection to is

(18)

We achieve this by diagonalizing , projecting the eigenvalues  of  onto a canonical simplex , using the recipe from Ref. [proj_wang], while leaving the eigenvectors untouched. Here, where is the total number of qubits. The projected state is a biased estimator. We can hope that the price paid by accepting some bias comes with the benefit of reduced variance. This expectation seems to be born out in Fig. 7. However, as number of qubits increase, the bias itself reduces fidelity. To compensate this effect, we need larger sample sizes (). Fig. 7 shows all these trends.

Figure 7: Quantum fidelity of the projected shadows (onto the physical positive definite space) with the noiseless GHZ state. As we increase the number of samples, from to , the quantum fidelity improves. The shaded regions indicate 1-standard deviation bands, estimated over ten independent runs.

V Discussions

We provide an approach to predict expectations of local observables without having to apply random unitary transformations, which sometimes require complex circuits of its own, and can become a practical bottleneck. We show that this can rather be done using an informationally complete set of POVMs. For illustrations, we show faithful reconstruction properties of low energy states coming from different many body Hamiltonians relevant to near-term applications of quantum devices. When we have additional information about the possible noisy channels we also adapt the shadow channel as a composition of the noise channel and the measurement channel. The invertibility becomes straightforward in the proposed framework. We also comment on why the mean as an estimator is sufficient throughout our discussion. And as long as we are dealing with local observables, we can provide efficient sample complexity using Hoeffding’s inequality directly.

We provided instances where the choice of POVM impacts the sample complexity for predicting 2-point correlators in certain quantum states for fixed maximum error. We noted that the different POVMs work better for different states. It is an exciting endeavour to understand which sets of POVM would be ideal for different classes of quantum states and observables.

Although, an exploration, we attempt to reconstruct fidelity using the locally built shadows and show that we cannot benefit from median of means as an estimator, since variance of fidelity becomes exponential in number of qubits. Additionally, when presented with few samples we raise the issue of unphysical i.e. not positive semidefinite and then provide a projection tecnique, similar to [Struchalin_2021], to estimate fidelity. Unfortunately, the estimator no longer remains unbiased. Addressing this issue would require methods to deal with non-local observables.

We did not provide an effective analog of the global Clifford unitary transformation-based method in [Huang_2020]. There has been work which provides description of global alternatives using stabilizer states [Struchalin_2021]. Whether there can be a scheme based on such states that is competitive with the classical shadows method [Huang_2020] remains to be seen.

The use of generalized measurement to unambiguously discriminate non-orthogonal states with lower failure probability is well known [Barnett:09, Nielsen, Chefles_2000]. Efficient prediction of expectations of local observables combined with the generalized measurement scheme to obtain the shadows can be used as an optimal framework in the discrimination of non-orthogonal states. In the future, it is a promising direction of exploration.

Acknowledgement

We would like to thank Shagesh Sridharan, James Stokes, Miles Stoudenmire for insightful discussions.

Appendix A Appendix

a.1 The Measurement Channel for Pauli-6

We can take the simple rank-1 Pauli-6 POVMs to see the action of a measurement channel:

(19)

where we use the Bloch representation .

The contribution of the first two POVM elements of Pauli-6 only gets contribution from and , generating

Using and this expression becomes: Following similar steps for pairs and , we get that:

making a depolarizing channel.

a.2 Inverse of the measurement channel

Given any single qubit channel, the inverse can be easily computed using the Bloch-sphere representation. We can write any 2 dimensional (single qubit) quantum operation () as . Any arbitrary trace-preserving quantum operation is given as . The map is equivalent to,

(20)

The components of displacement () is given as . The affine map between the Bloch sphere and itself is given by

, and its meaning is understood better by doing a singular value decomposition i.e.

where

are orthogonal matrices. The singular values capture the deformation of the Bloch sphere about its principal axes. A superoperator

can be defined as

(21)

Computing inverse of the channel is equivalent to writing from i.e. computing .

In the main text, the Pauli measurement channel () turns out to be a depolarizing channel, and its inverse that acts on the local qubit is given as

We take a more general example following our definition of a measurement channel:

If a particular POVM element is not rank one, can be taken as the eigenvector corresponding to the highest eigenvalue of . For Pauli-4, except for , all other elements are rank-1. Since is rank-2, when the outcome is , we take the eigenvector corresponding to eigenvalue instead of the other corresponding to . Rewriting Eq. (A.2), we get

(22)

The inverse of the channel can be written as

(23)

When we are working with a known noise channel , the inverse is given as . If we choose an amplitude damping channel with a damping parameter , the inverse can be given as

(24)

a.3 Sample complexity

a.3.1 Variance of the Estimate for a Single Observable

Given an array of independent, classical snapshots (each defined as Eq. (7)) :

(25)

The sample mean is The bound on probability of deviation of the sample mean is given by Chebyshev’s inequality:

(26)

where is the true density matrix. Fluctuations of around this desired expectation are controlled by the variance. . However, since the classical shadows are unit trace by construction, the variance depends only on the trace-less part of the observable i.e. . The minimum number of samples needed to assure a maximum failure probability () using Eq. (26) is

(27)

a.3.2 Dependence on POVM

Given a measurement channel and an observable, we can bound the variance of its estimator, using familiar maneuvers with superoperators [Huang_2020],

We broadly define a -local Pauli-observable as an operator which acts nontrivially only on qubits. Traceless local operators can be expressed as linear conbination of tensor products of indentity matrices and or less Pauli matrices. Hence, we need to focus only on special class of -local operators. Denoting as one of the Pauli matrices acting on the th qubit, we focus on of tenor products like , where, without loss of generality, we assume that the operator acts non-trivially on only the first qubits.

For Pauli-6 POVM, the inverse of the measurement channel is a self-adjoint map, and thus one can verify its action as:

where denotes a Pauli matrix and . Given a -local observable, we can further compute the bound on variance:

Now, we take up Pauli-4 POVM. One can verify the action of as:

where denotes a Pauli matrix. Using the fact that is a trace preserving map, one can say its adjoint has to be unital . Given a -local Pauli-observable, one can again compute the bound on variance:

Clearly, the above bound on variance is dependent on the state , unlike the bound we obtained using Pauli-6 POVM. Since is a density matrix and the operator is a PSD operator, one gets the minimum value for the bound when is of the form:

(28)

where is the projector into the eigenvector corresponding to the lowest eigenvalue of the operator , and is a valid density matrix in the Hilbert space of qubits on which the -local Pauli-observable acts trivially. For the above , it is simple to verify that the value of the variance bound is 1 (independent of ). Thus, for example, if the unknown state is the all spin down state, then Pauli-4 POVM works better than Pauli-6 POVM in predicting two-point correlators , since the variance is higher in the latter.

a.3.3 Improved Bound Using Hoeffding’s Inequality

Furthermore, we can use Hoeffding’s inequality to provide theoretical bounds when we are dealing with

-local Pauli observable, since we are working with bounded random variables. If

, for all , where , we can write,

(29)

The minimum number of samples needed to assure a maximum failure probability () among all the observables using Eq. (31) is

(30)

depends on the locality of the observable and the maximum eigenvalue of the inverse channel acting on the observable. The bound on random variable can be found as the range of the Rayleigh quotient of the inverse of the measurement channel, acting on the observable over all possible states. For instance, if we choose Pauli-6, the bounds can be shown to lie within in which case . Using the action of , one can verify that the value of the random variable belongs to the set for any Pauli matrix when is the inferred state for Pauli-4. Thus, the random variable is contained in the range which is exponential on the locality rather than the number of qubits.

a.3.4 The Guarantee of Performance for Multiple Observables

If we have different -local Pauli observables with the sample mean corresponding to the observable defined as . If , for all , where , we can combine the union bound with Hoeffding’s inequality to write

(31)

The minimum number of samples needed to assure a maximum failure probability () among all the observables using Eq. (31) is

(32)

The scaling is logarithmic in the number of observables , instead of linear behavior we get using Chebyshev’s inequality. We do not need to use MoM procedure [Huang_2020], which would have been necessary if we were dealing with estimate distributions with long tails (unlike the bounded estimates for -local Pauli observables).

References