# Adaptive experimental design for one-qubit state estimation with finite data based on a statistical update criterion

We consider 1-qubit mixed quantum state estimation by adaptively updating measurements according to previously obtained outcomes and measurement settings. Updates are determined by the average-variance-optimality (A-optimality) criterion, known in the classical theory of experimental design and applied here to quantum state estimation. In general, A-optimization is a nonlinear minimization problem; however, we find an analytic solution for 1-qubit state estimation using projective measurements, reducing computational effort. We compare numerically two adaptive and two nonadaptive schemes for finite data sets and show that the A-optimality criterion gives more precise estimates than standard quantum tomography.

## Authors

• 1 publication
• 1 publication
• 1 publication
• ### Neural network state estimation for full quantum state tomography

An efficient state estimation model, neural network estimation (NNE), em...
11/16/2018 ∙ by Qian Xu, et al. ∙ 0

• ### Quantum machine learning with adaptive linear optics

We study supervised learning algorithms in which a quantum device is use...
02/08/2021 ∙ by Ulysse Chabaud, et al. ∙ 0

• ### Machine learning assisted quantum state estimation

We build a general quantum state tomography framework that makes use of ...
03/06/2020 ∙ by Sanjaya Lohani, et al. ∙ 0

• ### Approximate Positively Correlated Distributions and Approximation Algorithms for D-optimal Design

Experimental design is a classical problem in statistics and has also fo...
02/23/2018 ∙ by Mohit Singh, et al. ∙ 0

• ### Machine learning pipeline for quantum state estimation with incomplete measurements

Two-qubit systems typically employ 36 projective measurements for high-f...
12/05/2020 ∙ by Onur Danaci, et al. ∙ 0

• ### Adaptive Quantum State Tomography with Neural Networks

Quantum State Tomography is the task of determining an unknown quantum s...
12/17/2018 ∙ by Yihui Quek, et al. ∙ 0

• ### Neural-Network Heuristics for Adaptive Bayesian Quantum Estimation

Quantum metrology promises unprecedented measurement precision but suffe...
03/04/2020 ∙ by Lukas J. Fiderer, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

For successful experimental implementation of any quantum protocol, the quantum states and operations involved must be confirmed to be sufficiently closed to their theoretical targets. One way to obtain such a confirmation is to perform another experiment and from the obtained data make an estimate of the quantum operator involved. Statistically, this is a constrained multi-parameter estimation problem – the quantum estimation problem – where we assume we are given a finite number of identical copies of a quantum state or operation, we perform measurements whose mathematical description is assumed to be known, and from the outcome statistics we make our estimate. Due to the probabilistic behavior of the measurement outcomes and the finiteness of the number of measurement trials, there always exist statistical errors in any quantum estimate. The size of the error depends on the choice of measurements and the estimation procedure. In statistics, the former is called an experimental design, while the latter is called an estimator. It is, therefore, a key aim of both classical and quantum estimation theory to find a combination of experimental design and estimator which gives us more precise estimation results using fewer measurement trials.

A standard combination in quantum information experiments is that of quantum tomography and maximum likelihood estimator. Although the term “quantum tomography” can be used in several different contexts, we use it to mean an experimental design in which an independently and identically prepared set of measurements are used throughout the entire experiment Paris and Řeháček (2004). The performance of different choices for the set of tomographic measurements have been studied, in, for example, de Burgh et al. (2008); Nunn et al. (2010). This of course raises the question of the performance of adaptive experimental designs, in which the measurements performed from trial to trial are not independent, and are chosen according to previous measurement settings and the outcomes obtained. Clearly, adaptive experimental designs are a superset of the nonadaptive ones, and as such can potentially achieve higher performance.

Adaptive designs are characterized by the way in which measurements are related from trial to trial, referred to as an update criterion. Previously proposed update criteria include those based on asymptotic statistical estimation theory (Fisher information) Nagaoka (1988, 2005); Fujiwara (2006), direct calculations of the estimates expected to be obtained in the next measurement Fischer et al. (2000); Happ and Freyberger (2008), mutually unbiased basis Happ and Freyberger (2011), as well as Bayesian estimators and Shannon entropy Fischer et al. (2000); Fischer and Freyberger (2000); Huszár and Houlsby (2011). Theoretical investigations report that some of the proposed update criteria give more precise estimates than nonadaptive quantum tomography, and an experimental implementation of the update criterion proposed in Fischer et al. (2000) in an ion trap system has been performed Hannemann et al. (2002). If denotes the number of measurement trials and is sufficiently large, it is known in 1-qubit state estimation that the expectation value of infidelity averaged over states, a measure of the estimation error, can decrease at best as in a nonadaptive experiment Bagan et al. (2004), compared to in adaptive experiments Bagan et al. (2006a). Most of the proposed update criteria, however, have high computational cost that makes real experiments infeasible. In this paper, we propose an adaptive experimental design whose average expected infidelity decreases as and whose update criterion, known as average-variance optimality (A-optimality) in classical statistics, has low computational cost for 1-qubit state estimation.

The paper is structured as follows. In Sec. II we lay out the notation and terminology that will be used throughout in this paper, by explaining basic concepts in adaptive experimental design, statistical parameter estimation, and A-optimality criteria. We also give a brief review of some of the proposed update criteria in the literature. In Sec. III we give the explicit form of the analytic solution of the A-optimal update criterion, (the derivation is given in the Appendix). This analytic solution makes it possible to reduce the computational cost for updating measurements, and using this we compare several estimation schemes numerically, showing that our proposal is more precise than standard quantum tomography. In Sec. IV, we discuss the feasibility of implementing the proposed scheme experimentally. A summary appears in Sec. V.

## Ii Preliminaries

### ii.1 Notation and terminology

We will adopt terms from the statistical literature, since they afford us the precision we need to properly discuss details of estimation schemes that can sometimes be subtle. In this subsection we will introduce a formalism for quantum estimation using that terminology, and apply it in a survey of several existing update criteria in Sec. II.5.

#### ii.1.1 Model selection

In statistical estimation theory, a statistical model is defined as a set of probability distributions, and we assume that the true probability distribution of interest is included in the set. In the quantum case, a probability distribution is determined by the state of the system and the action of the measurement on the state system. Let

be a Hilbert space with finite dimension and be the set of all density matrices acting on that Hilbert space. Suppose we know that the object we are trying to estimate lies in a subset , that is, the true density matrix is included in . For example, when we know that the true state is pure, is the set of all pure states. In this paper, we consider mixed state estimation, and we assume that in our finite measurement trials we prepare identical copies of an unknown state .

#### ii.1.2 Experimental design

A probability distribution of outcomes in quantum measurement requires not only a density matrix, but also a positive operator valued measure (POVM), , where is the set of outcomes. When the measurement is characterized by a POVM and the measured quantum state is characterized by a density matrix , the probability distribution of the outcomes is given by Born’s rule , where denotes the trace operation with respect to , (note that in the next subsection, a different trace operation represented as , is introduced).

We consider sequential measurements, as opposed to collective measurements, on copies of . We will index measurement trials using subscripts , and sequences using superscripts. Thus, for some symbol , is its value taken at the -th trial, while is the sequence . We will also try to use calligraphic fonts for supersets. Adaptivity in our sense means that the POVM performed at -th trial can depend on all the previous trials’ outcomes and POVMs.

The measurement class is the set of POVMs which are available at the -th trial. We choose the -th POVM, from , where denotes the set of measurement outcomes for the -th trial. When it is independent of the trial, as is usually the case, we omit the index, using for the measurement class and for the outcome set. Let denote the sequence of outcomes obtained up to the -th trial, where . We will denote the pair of measurement performed and outcome obtained by , and refer to it as the data for trial . The sequence of data up to trial is thus . After the -th measurement, we choose the next, -th, POVM according to the previously obtained data. Let denote the map from the data to the next measurement, that is, , . We call the measurement update criterion for the -th trial and the measurement update rule. Note that is a map from to and corresponds to the choice of the first measurement.

#### ii.1.3 Estimator

An estimator is a set of maps from the data to the model space, so that . The estimated density matrix is called the -th estimate. We will often omit the data dependency. In this paper we use a maximum likelihood estimator defined as

 ρMLn := argmaxσ∈Op(Dn|σ), (1)

where

 p(Dn|σ) := Tr[Π1,x1⊗Π2,x2⊗⋯⊗Πn,xnσ⊗n]. (2)

A quintuplet specifies an estimation scheme. A sketch of the procedure for a generic adaptive quantum estimation scheme is given in Fig. 1.

#### ii.1.4 Evaluation

In order to evaluate the precision of estimates of the true density matrix, we introduce a loss function

(sometimes called a cost function). A loss function

is a map from to such that (i) and (ii)

. For example, the trace-distance and the infidelity (one minus the fidelity) are loss functions for density matrices. The outcomes of quantum measurements are random variables, and the value of the loss function between an estimate and the true density matrix is also a random variable. Thus, in order to evaluate the precision of the estimator (not the estimate) for the true density matrix, we use the statistical expectation value of the loss function, called an

expected loss (sometimes called a risk function) Note1 . The explicit form is given by

 ¯ΔN(uN,ρest|ρ) := ∑DN∈DNp(DN|ρ)Δ(ρestN(DN),ρ). (3)

The value of the expected loss depends on the choice of the estimator as well as the true density matrix. The latter is of course unknown in an experiment, and there are at least two approaches to eliminate its dependence, namely the average and the maximal (or worst case) expected loss, given explicitly by

 ¯ΔaveN(uN,ρest) := ∫ρ∈Odμ(ρ)¯ΔN(uN,ρest|ρ), (4) ¯ΔmaxN(uN,ρest) := maxρ∈O¯ΔN(uN,ρest|ρ). (5)

where is a probability measure on . The task in this paper is to find a combination of a measurement update rule and estimator with average expected loss as small as possible.

### ii.2 A generalized Cramér-Rao inequality

The A-optimality criterion is a measurement update criterion based on the asymptotic theory of statistical parameter estimation Watanabe et al. (2005); Pukelsheim (2006). In this subsection we introduce a few basic results of the asymptotic theory. First let us parametrize the state space . Any density matrix on -dimensional Hilbert space can be parametrized by real numbers, , i.e. . In the case, we take , where , are the Pauli matrices, and

, is called the Bloch vector. The estimation of

is equivalent to the estimation of , and we let denote the estimator. Estimates of a density matrix and of a Bloch vector are related as .

For any estimator , any number of measurement trials , and any positive semidefinite matrix , the inequality

 ∑DN∈DNp(DN|s)[sestN(DN)−s]TH(s)[sestN(DN)−s] ≥tr[H(s)GN(uN,sest,s)TFN(uN,s)−1GN(uN,sest,s)] (6)

holds, where

 p(DN|s) :=p(DN|ρ(s)), (7) GN(uN,sest,s) :=∇s∑DN∈DNp(DN|s)sestTN(DN), (8) FN(uN,s) :=∑DN∈DN∇sp(DN|s)∇Tsp(DN|s)p(DN|s), (9)

and denotes the trace operation with respect to the parameter space. Eq.(6) is a known generalization of the Cramér-Rao inequality Rao (2002), and we give a simple proof in Appendix B. is a positive semidefinite matrix called the Fisher matrix of the probability distribution .

If the estimate converges to the true parameter, i.e., as with probability 1, the LHS of Eq.(6) converges to 0 and therefore the RHS should converge to 0. In this case, if we assume the exchangeability of the limit and derivative, the matrix

converges to the identity matrix

, and the quantity defined as

 KN(uN,s):=tr[H(s)FN(uN,s)−1] (10)

converges to . This can be interpreted as a lower bound of the weighted (by ) mean squared error when is sufficiently large. It is known that under certain regularity conditions, a maximum likelihood estimator achieves the equality of Eq.(6) asymptotically. For a given , it would be wise to choose a measurement update rule which makes the value of as small as possible. This is the guiding principle of the A-optimality criterion.

### ii.3 A-optimality criteria

We move on to the explanation of the procedure of A-optimality. The “A” stands for “average-variance” Pukelsheim (2006). According to the asymptotic theory of statistical parameter estimation described in the previous subsection, we wish to minimize the value of . Suppose that we perform trials and obtained the data sequence . We would like to choose the POVM minimizing in as the next, -th, measurement. When we consider minimizing this function, there are two problems. In order to avoid them, we introduce two approximations. The first problem is that the minimized function depends on the true parameter . Of course the true parameter is unknown in parameter estimation problems, and we must use an estimate in the update criterion, , instead. The mesurement update estimator is not necessarily the same as . The second problem is that unlike the independent and identically distributed (i.i.d.) measurement case, calculation of the Fisher matrix in the adaptive case requires summing over an exponential amount of data, and is computationally intensive. To avoid this problem, we approximate the sum over all possible measurements by that over only those measurements that have been performed:

 Fn+1(un+1,s) ≈~Fn+1(un+1,s|Dn):=n+1∑i=1F(Πi,s), (11)

where

 F(Πi,s) :=∑xi∈Xi∇sp(xi;Πi|s)∇Tsp(xi;Πi|s)p(xi;Πi|s), (12) Πi =ui(Di−1), i=1,⋯,n+1. (13)

The matrix is the Fisher matrix for the -th measurement probability distribution , and is the sum of the Fisher matrices from the first to the ()-th trial. Instead of minimizing , we consider the minimization of

 ~Kn+1(un+1,s|Dn):=tr[H(s)~Fn+1(un+1,s|Dn)−1]. (14)

It is known that the convergence of to is part of a sufficient condition for the convergence of a maximum likelihood estimator Hall and Heyde (1980), and this justifies the use of this second approximation. We explain the relationship between the conditional and unconditional Fisher matrices with respect to the estimator’s convergence in Appendix C. After making these two approximations, we define the A-optimality criterion as

 ΠA\scriptsize-optn+1 :=uA\scriptsize-optn+1(Dn) =argminΠn+1∈Mn+1tr[H(^sestn)~Fn+1(un+1,^sestn|Dn)−1]. (15)

Finding is a nonlinear minimization problem with high computational cost in general. In this paper, we derive the analytic solution of Eq. (15) in the 1-qubit case, reducing the computational cost significantly.

### ii.4 Estimation setting

We consider a 1-qubit mixed state estimation problem, so that . We identify the Bloch parameter space with , where we restrict the true state space to be strictly the interior in order to avoid the possible divergence of the Fisher matrix. Suppose that we can choose any rank-1 projective measurement in each trial. Let denote the POVM corresponding to the projective measurement onto the -axis , whose elements can be represented as

 Π±(a)=12(1±a⋅σ). (16)

This is the Bloch parametrization of projective measurements. We identify the set of parameters with the measurement class .

For our loss functions, we use both the squared Hilbert-Schmidt distance and the infidelity Bagan et al. (2004):

 ΔHS(s,s′): =12Tr[(ρ(s)−ρ(s′))2] (17) =14(s−s′)2, (18) ΔIF(s,s′): =1−Tr[√√ρ(s)ρ(s′)√ρ(s)]2 (19) =12(1−s⋅s′−√1−∥s∥2√1−∥s′∥2). (20)

We note that the Hilbert-Schmidt distance coincides with the trace distance in a 1-qubit system. The asymptotic behavior of the average expected fidelity is known in the 1-qubit state estimation case Bagan et al. (2004, 2006b, 2006a). The measure used for calculating this average is the Bures distribution, . If we limit our available measurements to be sequential and independent (i.e., nonadaptive), behaves at best as Bagan et al. (2004, 2006b). On the other hand, if we are allowed to use adaptive, separable, or collective measurements, can behave as Bagan et al. (2006a). In Bagan et al. (2004, 2006b, 2006a), the coefficient of the dominant term in the asymptotic limit is also derived.

In Sec. III.2.1, we show numerical results. A maximum likelihood estimator is used, and it is shown that the average expected infidelity of an A-optimal scheme behaves as , illustrating that the A-optimality criterion is indeed making use of adaptation to outperform nonadaptive schemes.

### ii.5 Survey of some other update criteria

We briefly review some of the other adaptive measurement update criteria proposed in the literature, using our terminology and notation introduced in the previous subsections.

Before explaining update criteria that are performed at each and every trial, such as A-optimality, we briefly review a simpler update criterion. The two-step adaptation criterion requires the measurement update only once during a measurement sequence. We have

 un+1(Dn)={Π1stif n

Thus, for all trials up to and including trial a fixed POVM is performed, and an estimate is calculated from the obtained data. Using that data we choose a new POVM for the remaining copies. In Hayashi and Matsumoto (1998); Hayashi and Matsumoto (2005); Gill and Massar (2000); Bagan et al. (2006a), two-step adaptation criteria are used to prove mathematically an asymptotic bound for weighted mean squared errors in 1-qubit state estimation. In Řeháček et al. (2004); Petz et al. (2007), some numerical results are shown for a few two-step adaptation schemes.

#### ii.5.2 N88 criterion

In Nagaoka (1988, 2005); Fujiwara (2006), an update criterion based on the Cramér-Rao inequality is proposed. The update criterion is given by

 un+1(Dn) =argminΠ∈Mn+1tr[H(^sestn)F(Π,^sestn)−1]. (22)

The difference from the A-optimality criterion is that in Eq. (22) the Fisher information matrix used in the update does not take into account all measurements, but about only the -th measurement. The advantage of course is that this reduces the computational cost of updates. The disadvantage is that when consists of informationally incomplete POVMs, as is the case in most experiments, the estimates cannot converge to the true state. As explained in Sec. II.4, in this paper is restricted to rank-1 projective measurements, and in this setting Eq. (22) does not work well.

#### ii.5.3 FKF00 criteria

In Fischer et al. (2000), two update criteria are proposed.

1. The first criterion is based on the Shannon entropy of the estimated measurement probability distribution, and is given by

 un+1 (Dn)=argmaxΠ∈Mn+1 (23)
2. The second criterion uses a third state estimator such that

 (un+1 (Dn),^^ρestn(Dn))=argmax(Π,σ)∈Mn+1×O (∑x∈Xn+1p(x;Π|^ρestn(Dn))Δ(^ρestn+1(Dn+1),σ)). (24)

Numerical simulation is performed for the case where is the set of 1-qubit pure states and is the set of projective measurements, while is a biased maximum likelihood estimator, is a Bayesian estimator up to . Average (not expected) infidelity is used as the evaluation function.

#### ii.5.4 HF08 criterion

In Happ and Freyberger (2008), an update criterion given by

 un+1 (Dn)=argmaxΠ∈Mn+1 (∫Odρ∑x∈Xn+1p(x;Π|ρ)Δ(^ρestn+1(Dn+1),ρ)), (25)

is proposed. A numerical simulation is performed in Happ and Freyberger (2008), where the setting is that is the set of 1-qubit pure states, is a set of parity measurements using an ancilla system, and and are maximum likelihood estimators. The behavior of the average expected fidelity is numerically analyzed up to .

#### ii.5.5 HF11 criterion

An update criterion proposed in Happ and Freyberger (2011) is given by

 un+1 (Dn)=argmaxΠ∈Mn+1 (−∑x∈Xn+1n∑i=1Tr[Πi,xiΠx]lnTr[Πi,xiΠx]), (26)

and the estimator is defined as

 ρestn(Dn)=argmaxρ∈OTr[ρ¯ρ(Dn)], (27) ¯ρ(Dn)=1nn∑i=1Πi,xi. (28)

In the numerical simulations, the estimation setting is such that is the set of pure states on -dimensional Hilbert space , and is the set of projective measurements on . Numerical simulations of average expected fidelity are shown for and , all up to .

#### ii.5.6 FF00 criterion

In Fischer and Freyberger (2000), an update criterion based on Bayesian estimation and Shannon entropy is proposed. Let denote a prior distribution on . The update criterion is

 un+1 (Dn)=argmaxΠ∈Mn+1 (∑x∈Xn+1pave(x;Π|Dn)∫OdρP(ρ|Dn+1)lnP(ρ|Dn+1)P(ρ|Dn)) (29) (Dn)=argmaxΠ∈Mn+1 (−∫ρ∈OdρP(ρ|Dn)lnP(ρ|Dn)+ ∑x∈Xn+1pave(x;Π|Dn)∫OdρP(ρ|Dn+1)lnP(ρ|Dn+1)) (30)

where

 pave(x;Π|Dn):=∫OdρP(ρ|Dn)p(x;Π|ρ), (31) P(ρ|Dn):=P(ρ)p(Dn|ρ)∫OdσP(σ)p(Dn|σ). (32)

In Fischer and Freyberger (2000), the case in which is the set of 1-qubit mixed states and is the set of projective measurements is numerically analyzed up to . The evaluation function used is the average (not expected) infidelity.

#### ii.5.7 HH11 criterion

In Huszár and Houlsby (2011), an update criterion given by

 un+1 (Dn)=argmaxΠ∈Mn+1(−∑x∈Xn+1pave(x;Π|Dn)lnpave(x;Π|Dn) +∫OdρP(ρ|Dn)∑x∈Xn+1p(x;Π|ρ)lnp(x;Π|ρ)), (33)

is proposed, where Eqs. (31) and (32) have been used. From a simple calculation, we can see that the criteria defined in Eq. (30) and in Eq. (33) are equivalent. This criterion involves an integration which requires high computational cost. In Huszár and Houlsby (2011), a special technique for calculating the integral, called a sequential importance sampling method, is used in order to reduce that computational cost. The authors performed numerical simulation for the case in which is the set of 1-qubit mixed states and are projective measurements up to . They also considered the case in which is the set of 2-qubit states and are a set of mutually unbiased bases, a set of pairwise Pauli measurements, and a set of separable measurements up to . The evaluation function is the average expected infidelity, and it is shown that their scheme is more precise than standard quantum tomography. In Sec. III.2.1, we point out that our numerical results for 1-qubit show that A-optimality gives even more precise estimates than those given by Eq. (33), at least from to .

## Iii Results and analysis

As explained in Sec. II.4, we consider the A-optimality criterion for 1-qubit state estimation using projective measurements. In Sec. III.1 we give the analytic solution, and in Sec. III.2 we show the results of numerical simulations.

### iii.1 Analytic solution for A-optimality in 1-qubit state estimation

First, we give the explicit form of the Fisher matrix for projective measurements. The probability distribution for the rank-1 projective measurement is given by

 p(±;a|s) =12(1±s⋅a), (34)

and the Fisher matrix is

 F(a,s)= ∇sp(+;a|s)∇Tsp(+;a|s)p(+;a|s) (35) +∇sp(−;a|s)∇Tsp(−;a|s)p(−;a|s) = aaT1−(a⋅s)2. (36)

In this case, Eq. (15) is rewritten in the Bloch vector representation as

 aA\scriptsize-optn+1:=argmina∈Atr[H(^sestn){~Fn(an,^sestn|Dn)+F(a,^sestn)}−1]. (37)

We present the analytic solution of Eq.(37) in the form of the following theorem.

###### Theorem 1

Given a sequence of data , the -th estimate , and a real positive matrix , the A-optimal POVM Bloch vector is given by

 aA\scriptsize-optn+1=Bnemin(Cn)∥Bnemin(Cn)∥, (38)

where

 Bn =√~Fn(an,^sestn|Dn)H(^sestn)−1~Fn(an,^sestn|Dn), (39) Cn =Bn(I−^sestn^sestTn+~Fn(an,^sestn|Dn)−1)Bn, (40)

is the eigenvector of the matrix

corresponding to the minimal eigenvalue, and

is the identity in the parameter space.

We give the proof of Theorem 1 in Appendix A.

In Eq. (40), the inverse of the matrix appears. In the proof of Theorem 1, the invertibility of is assumed. The invertibility of is equivalent to the condition that is a basis of . When we choose the second and third measurements, and are not invertible. Thus the update scheme does not apply to these steps, and the choices are arbitrary. One simple choice is to perform -, -, and -projective measurements at the first, second and third trials respectively, and this can be shown to satisfy Theorem 1 as follows. The choice of the first measurement is always arbitrary, and we choose , a -projective measurement. Then for any true Bloch vector the rank of is , and if we interpret the inverse matrix in Eq. (40) as a generalized inverse matrix, is a rank matrix with minimal eigenvalues . The supports of , , and are the span of . Therefore is an arbitrary vector in the -dimensional space spanned by and , and we choose . Then using the same logic, the third measurement is fixed to .

From the explicit formulae of the squared Hilbert-Schmidt distance and infidelity in Eqs. (18) and (20), we have

 ΔHS(s,s′) =(s′−s)T14I(s′−s), (41) ΔIF(s,s′) =(s′−s)T14(I+ssT1−∥s∥2)(s′−s) +O(∥s′−s∥3). (42)

Therefore when we use the Hilbert-Schmidt distance as our loss function, we substitute and into Eqs.(38), (39), and (40) to obtain

 Bn =~Fn(an,^sestn|Dn), (43) Cn =~Fn(an,^sestn|Dn)(I−^sestn^sestTn)~Fn(an,^sestn|Dn) +~Fn(an,^sestn|Dn), (44)

and we do not need to explicitly calculate the inverse or square root matrices for A-optimality. On the other hand, when our loss function is the infidelity, we must use and .

### iii.2 Numerical simulation

We performed Monte Carlo simulations of the following four experimental designs described in detail below; A-optimal adaptive scheme for the squared Hilbert-Schmidt distance, the same for infidelity, XYZ repetition, and uniformly random selection.

A-optimality for the squared Hilbert-Schmidt distance is the adaptive scheme defined by Eq.(37) with . Similarly, A-optimality for the infidelity is that with . As explained in the previous subsection, the choice of measurement Bloch vectors at the first and second trials is arbitrary; we choose and , i.e., at the first trial we perform the projective measurement of , and that of at the second — the third trial is automatically the projective measurement of , corresponding to . The XYZ repetition scheme is nonadaptive, in which we repeat the measurements of , , and , corresponding to standard quantum state tomography. Uniformly random selection is also nonadaptive, where at each trial we choose the next measurement direction randomly on the Bloch surface, according to the SO(3) Haar measure. For consistency with the other three schemes, we fix the first, second and third measurements to be the projective measurements of , respectively, and randomly select directions from the fourth trial on.

We choose a maximum likelihood estimator in all four experimental designs. It is known that the estimators minimizing and are Bayesian estimators Bagan et al. (2004); Blume-Kohout (2010), but the integrations necessary for Bayesian estimation take too much computation time. For the two A-optimality criteria, we choose both the real and the dummy estimators to be maximum likelihood, . We used a Newton-Raphson method to solve the (log-)likelihood equation and the completely mixed state as the initial point of the iterative method. When a search point came out of the Bloch sphere during the procedure, we chose the previous point (included in the sphere) as the estimate.

In the following subsections, we show the plots for two loss functions; the squared Hilbert-Schmidt distance and infidelity . The average expected losses are shown in Sec. III.2.1, and pointwise expected losses are shown in Sec. III.2.2. In the both subsections, the line styles are fixed as follows: solid (black) line for A-optimality for the squared Hilbert-Schmidt distance (AHS), dashed (red) line for A-optimality for the infidelity (AIF), chain (blue) line for XYZ repetition (XYZ), Dotted (green) line for Uniformly random selection (URS).

#### iii.2.1 Average expected losses

We analyse the average behaviour of the estimation errors over the Bloch sphere. The integration for averaging is approximated by a Monte Carlo routine, and the statistical expectation is approximated by an arithmetric mean using pseudo-random numbers.

Figure 2 shows the average expected loss functions against the number of trials (the horizontal and vertical axes are both logarithmic scale): (HS-Bures) integrated via the Bures distribution , (HS-Euclid) integrated via the Euclidean distribution , (IF-Bures) integrated via , and (IF-Euclid) integrated via . Fig. 2 (HS-Bures) and (HS-Euclid) shows that the estimation errors of the four experimental designs are almost equivalent from the viewpoint of the squared Hilbert-Schmidt distance. As depicted in (HS-Bures), the estimation errors of the two A-optimality schemes are slightly larger than the other nonadaptive schemes; as we show in the next subsection (pointwise analysis), this gap decreases as becomes larger. On the other hand, Fig. 2 (IF-Bures) and (IF-Euclid) show the explicit gap between the adaptive and nonadaptive schemes. The gradients of the curves begin to differentiate from around , and as depicted in (IF-Bures), the gradients of XYZ and URS are almost around . This means that the average expected infidelity behaves as

and is consistent with the result of the asymptotic analysis presented in

Bagan et al. (2004). On the other hand, the gradients of AHS and AIF are greater than the nonadaptive limit , indicating that AHS and AIF make good use of adaptive resources. Around the gradient of AIF is almost , which is the bound for adaptive experimental designs Bagan et al. (2006a).

Let us compare the estimation errors of A-optimality and the HH11 criteria explained in Sec. II.5.6. From Fig. 2 (IF-Bures), the average expected infidelity of AHS and AIF are and at . On the other hand, the corresponding amount for the HH11 criterion can be estimated roughly from Fig. 2 (a) in Huszár and Houlsby (2011) to be . This implies that for 1-qubit state estimation, the average expected infidelity of the A-optimality criterion is about two-times smaller than that of Eq. (33), at least around .

#### iii.2.2 Pointwise expected losses

Next, we analyse the behaviour of the estimation errors at several true Bloch vectors . Figure 3 shows the pointwise expected loss functions against the number of trials (the horizontal and vertical axes are both logarithmic scale): (HS-P1), (HS-P2), and (HS-P3) are plots of the expected squared Hilbert-Schmidt distances for given by , and (IF-P1), (IF-P2), and (IF-P3) are the expected infidelities for the same three true states, respectively.

As depicted in (HS-P1) and (IF-P1), the estimation errors of all four schemes are almost equivalent for the completely mixed state, . As the Bloch radius becomes larger, the differences between the four schemes become clearer. Figure 3 (HS-P2) and (HS-P3) are the plots of the expected squared Hilbert-Schmidt distances at a high purity point, . In the region of to around , the squared Hilbert-Schmidt error of the two adaptive schemes is larger than that of the two nonadaptive schemes. In particular, the error of AHS is larger that that of AIF; this might seem strange, but in the region of , the error of AHS becomes smaller than that of AIF, indeed it eventually becomes the smallest of the four schemes. We believe that there are two reasons for A-optimality’s large error for small . First, the A-optimality criterion is based on an asymptotic theory of statistical estimation. When the number of measurement trials is small, the Cramér-Rao bound is not necessary suitable for characterizing estimation errors. Second, it uses a dummy estimator in the measurement update. When is small, is not a good estimate, and thus the choice of the next measurements can be unreliable. Of course, when becomes sufficiently large, both of these problems are alleviated.

The gap between the estimation errors of adaptive and nonadaptive schemes becomes smaller as becomes larger in (HS-P2) and (HS-P3), while it grows in (IF-P2) and (IF-P3). Only the XYZ scheme changes dramatically between (IF-P2) and (IF-P3); the other three schemes do not because AHS, AIF, and URS are invariant under rotation of the true Bloch vector (for very small , there are differences, and these are because the first three measurements are fixed to -projective measurements and not rotationally invariant). Figure 3 (IF-P2) is the case in which the directions of the measurement and the true Bloch vector are matched (to ). In this case, XYZ is the best scheme, exhibiting the smallest estimation error. Around , the estimation error of AIF becomes as small as that of XYZ. That of AHS is smaller than URS, but larger than the other two schemes. We believe that this is because the selected Hessian matrix used in the update routine is unsuitable for the loss function in (IF-P2) (and (IF-P3)). Figure 3 (IF-P3) is the case in which the directions of the measurement and the true Bloch vector are the most discrepant (for a fixed purity). In this case, the estimation errors of XYZ and URS are almost the same and behave as , and those of the adaptive schemes are smaller than those of the nonadaptive ones, (this behavior of expected infidelity for i.i.d. measurements is discussed in Bagan et al. (2006a); de Burgh et al. (2008), and a detailed analysis will appear in Sugiyama et al. ). When we consider the whole Bloch sphere, of course the cases in which the direction of XYZ measurements and the Bloch vector are matched are few, and therefore the average expected infidelities of AHS and AIF are smaller than those of XYZ and URS. This also indicates that the adaptive schemes have better worst-case performance (lower , Eq. (5)) than the nonadaptive schemes.

#### iii.2.3 Purity dependence

Figure 4 shows the purity dependence of the average expected infidelity at . The average is taken over all directions and for each Bloch radius . It indicates that the average expected infidelities of the two adaptive schemes are smaller than those of the two nonadaptive schemes. The appearance of peaks for XYZ and URS is discussed in Appendix D.

#### iii.2.4 Measurement sequences

Figure 5 is a plot of the measurement Bloch vectors at (left column), (middle column), and (right column) for runs. The true state is , and the upper three subplots are AHS while the lower three are AIF. Figure 5 shows that the measurement Bloch vectors are clustered around the true state, with some interesting behaviour at . In (AHS-10000), the measurement directions are clustered very narrowly at the true state and also around the great circle that it defines. In (AIF-10000), on the other hand, the directions are clustered widely around the true state. This is due to the difference between the loss functions employed in the update routine, namely squared Hilbert-Schmidt distance in the former and infidelity in the latter. We mention that for a completely mixed true state, the measurement Bloch vectors are distributed randomly on the Bloch sphere for large .

## Iv Discussion

### iv.1 Implementation

There are two main issues when considering the practical implementation of an adaptive scheme, namely the ease with which measurement updates can be made in the apparatus, and the time required to compute those updates. In quantum optics, projective measurements and single qubit rotations are standard tools in quantum information processing experiments. Figure 6 illustrates a simple implementation example for a one photon polarization system. In this regard, the first issue is not a problem — in general, of course it will depend on the experimental state of the art.

### iv.2 Generalization to higher dimensional systems

In order to compare the performance of the A-optimality criterion to the other update schemes, we have considered 1-qubit states as the estimation objective. Current and future quantum information processing is concerned with higher dimensional estimation objectives, not only states but also processes. In 1-qubit state estimation, we can reduce the computational cost for A-optimality by using the analytic solution of Theorem 1, but as we see in Appendix A, the techniques used to derive that solution depend on the properties of 1-qubit states and projective measurements. A-optimality in higher dimensional systems will need a new solution, or must deal with the increasing complexity of the nonlinear minimization problem. One possible approach is to place constraints on the measurement class . Instead of considering a continuous set of measurement candidates, we could consider a discrete set. One expects that the resulting discrete minimization problem would be much simpler. If the number of discrete measurement candidates is too small however, the estimation error could be worse than standard quantum tomography. The relation between the reduction in computational cost and the (probable) increase in estimation error by introducing such discrete minimization is an open problem.

## V Summary

In this paper, we considered adaptive experimental design and applied a measurement update method known in statistics as the A-optimality criterion to 1-qubit mixed state estimation using arbitrary rank-1 projective measurements. We derived an analytic solution of the A-optimality update procedure in this case, reducing the complexity of measurement updates considerably. Our analytic solution is applicable to any case in which the loss function can be approximated by a quadratic function to least order. We performed Monte Carlo simulation of this and several nonadaptive schemes in order to compare the behaviour of estimation errors for a finite number of measurement trials. We compared the average and pointwise expected squared Hilbert-Schmidt distance and infidelity of the following four measurement update criteria: A-optimality for the squared Hilbert-Schmidt distance (AHS), A-optimality for the infidelity (AIF), repetition of three orthogonal projective measurements (XYZ), and uniformly random selection of projective measurements (URS). The numerical results showed that AHS and AIF give more precise estimates than URS and XYZ which corresponds to standard quantum tomography with respect to expected infidelity.

###### Acknowledgements.
T.S. would like to thank Fuyuhiko Tanaka for helpful discussion on mathematical statistics and Terumasa Tadano for useful advice on numerical simulation. This work was supported by JSPS Research Fellowships for Young Scientists (22-7564) and Project for Developing Innovation Systems of the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan.

## Appendix A Proof of Theorem 1

We give the proof of Theorem 1. First, we introduce a lemma about matrix inverses.

###### Lemma 1

Horn and Johnson (1985