Robustness of Adaptive Quantum-Enhanced Phase Estimation

09/14/2018 ∙ by Pantita Palittapongarnpim, et al. ∙ 0

As all physical adaptive quantum-enhanced metrology schemes operate under noisy conditions with only partially understood noise characteristics, so a practical control policy must be robust even for unknown noise. We aim to devise a test to evaluate the robustness of AQEM policies and assess the resource used by the policies. The robustness test is performed on adaptive phase estimation by simulating the scheme under four phase noise models corresponding to the normal-distribution noise, the random telegraph noise, the skew-normal-distribution noise, and the log-normal-distribution noise. The control policies are devised either by a reinforcement-learning algorithm in the same noise condition, albeit ignorant of its properties, or a Bayesian-based feedback method that assumes no noise. Our robustness test and resource comparison can be used to determining the efficacy and selecting a suitable policy.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Quantum-enhanced metrology (QEM) employs a quantum state of  particles as a resource to estimate an unknown parameter  with the goal of attaining imprecision


that asymptotically surpasses the standard quantum limit (SQL)  Giovannetti et al. (2004, 2011) but saturates at or below the Heisenberg limit (HL)  Braunstein et al. (1996); Tóth and Apellaniz (2014). Our expression (1) differs from usual proportionality expressions in the literature, e.g., by explicitly recognizing the relevance of lower-order terms Tóth and Apellaniz (2014) through the use of the big-O notation Vrajitoru and Knight (2014a).

QEM is vital for high-precision applications, such as gravitational wave detection Hollenhorst (1979); Caves et al. (1980); Caves (1981), atomic clocks Bollinger et al. (1996); Borregaard and Sørensen (2013), and magnetometry Tóth and Apellaniz (2014); Danilin et al. (2018) whose systems are operating at the limit of their power tolerance. Some schemes consider ideal measurements that typically involve measuring multiple particles simultaneously Zhang and Duan (2014); Matthews et al. (2016), whereas adaptive QEM (AQEM) focuses on single-particle measurements augmented by feedback such that the SQL is beat and the HL is approached Wiseman (1995); Armen et al. (2002); Berry and Wiseman (2000).

AQEM Performance critically depends on policy choice Rosolia et al. (2018), which can be obtained by optimizing a known mathematical model Roy et al. (2015); Wiseman and Killip (1997, 1998) or by using reinforcement-learning algorithms Hentschel and Sanders (2010); Lovett et al. (2013); Palittapongarnpim et al. (2017a). Whereas policies from these methods are resistant to known noise models Roy et al. (2015), whether they are robust against unknown noise is yet unstudied but critical property of a QEM scheme as noise can destroy the entanglement advantage and restore the SQL Escher et al. (2011); Demkowicz-Dobrzanski et al. (2012). Our aim is to test the robustness of AQEM policies in the presence of noise with unknown properties.

Our test focuses on adaptive interferometric phase estimation, whose policies have been devised using Bayesian techniques Berry and Wiseman (2000); Berry et al. (2001) and by reinforcement learning Hentschel and Sanders (2010); Lovett et al. (2013); Palittapongarnpim et al. (2017a)

. The Bayesian technique computes feedback based on a trusted, noiseless quantum model, whereas reinforcement learning devises the AQEM policies for feedback based on trial and error and is ignorant of the quantum-dynamical nature but employs heuristics to shrink the search space. Here both methods are applied to adaptive phase estimation including phase noise, which could arise from path-length fluctuation in the interferometer 

Bobroff (1987); Sinclair et al. (2014).

Typically, noise is assumed to be normal as a result of the central limit theorem 

Severini (2005a). Periodicity of phase makes the normal distribution problematic unless the noise is small compared to radians, which we assume here; technically, we would use the wrapped-up normal distribution Breitenberger (1963). As our aim is to test robustness for unknown noise, we consider three other noise distributions for our test: random telegraph Newman (1968), skew-normal Azzalini and Capitanio (2014) and log-normal Limpert et al. (2001) noise. The random telegraph noise simulates a discrete noise process. Skew-normal and log-normal distributions represent asymmetric noise, which serve as distinct generalizations of the normal distribution. Both distributions are used to simulate noise in detectors and electronics Pourahmadi (2007); Jouini (2011); Rezaie and Eidsvik (2014).

For AQEM, we seek an efficient procedure that beats the SQL, and we choose the policy that requires the least resource to run. We assess the policy-generating procedure according to the complexity of its time cost Vrajitoru and Knight (2014b), which is evaluated by the scaling in the number of operations with the number of particles

. Here we consider two policy-design procedure, namely, a reinforcement-learning algorithm and a method based on Bayesian inference, resulting one policy designed by each method. To determine which policy is superior, we compare the complexity in space and time cost 

Vrajitoru and Knight (2014a). Thus, we are able to assess and compare the costs for generating policies and determine the best policy.

Through our analysis, we find that both Bayesian-feedback and reinforcement-learning policies are robust in the face of unknown noise. Specifically, the Bayesian method yields AQEM that approaches HL and outperforms reinforcement-learning policies for most noise models. This performance superiority is due to the Bayesian method memorizing the measurement history through complete knowledge of the quantum state. Storing the entire model in the computer yields better scaling but leads to higher space and operational time costs compared to the reinforcement-learning policy.

Disparity between the space costs suggest that the two approaches use different amounts of information to execute the feedback control, and a fair comparison should be made instead when they are using the same space cost. In principle, the reinforcement-learning policy can be generalized to attain the HL as long as the design cost, which scales in high-polynomial, is practical.

Ii Background

In this section, we present essential background knowledge for assessing robustness of adaptive quantum-enhanced phase estimation. In this subsection, we cover four key notions as subsections. In §II.1 we discuss QEM including AQEM. Subsequently, in §II.2, we address a particular case of AQEM, namely, adaptive phase estimation. As noise is an important in practical AQEM, in §II.3 we discuss noise models that we use to evaluate policy robustness. Finally, in §II.4

, we review regression analysis and model selection techniques that are required for analyzing whether the SQL has been surpassed.

ii.1 Quantum-enhanced metrology

In this subsection, we explain QEM strategies, mainly collective non-adaptive Giovannetti et al. (2011) and adaptive measurements Palittpongarnpim et al. (2016), and lower imprecision bounds attained by using the classical and quantum resources. Specifically, we focus on strategies that employ finite  input states. Whereas the collective non-adaptive strategy is useful to calculate lower bounds for imprecision, adaptive strategies offer the simpler alternative of individual-particle measurements that can achieve imprecision scaling close to these theoretical bounds. Here we focus on single-parameter estimation; multi-parameters QEM Helstrom (1969); Humphreys et al. (2013); Yue et al. (2014) would be the subject of future study.

ii.1.1 Non-adaptive and adaptive strategies

Here we describe collective non-adaptive and adaptive QEM strategies, which are two of many types of QEM schemes. Whereas we focus on these two techniques, there are others, such as the sequential technique Higgins et al. (2007); Demkowicz-Dobrzański and Maccone (2014) and the ancilla-assisted techniques Hotta et al. (2005); Sbroscia et al. (2018), which we do not cover.

Collective strategy.

A collective non-adaptive scheme utilizes  -level (typically, for standard two-level atoms or two-path interferometry) particles prepared in a collective state


which is the space of positive-definite, trace-class, self-adjoint linear operators acting on a tensor product of 

copies of a -dimensional Hilbert space  Landsman (2017). The system, upon which metrology is performed, is represented as a quantum channel (completely-positive trace-preserving map) , acting on any , with  the single-unknown parameter of the channel Holevo (2012). In the special case of an isolated system without noise, decoherence or loss, the channel is represented by a unitary transformation Yurke et al. (1986)


for  a unitary operator acting on .

After the particles exit the system, they are measured and this measurement is described a positive-operator-valued measures Hayashi (2006); Wiseman and Milburn (2009), which are positive semidefinite operators


assuming the measurement outcomes

is a finite set. This outcome is random with probability


The measurement  is repeated multiple times to sample the distribution (5) sufficiently well to get good estimates, and then is then inferred from these samples.

Bundle and individual-particle measurement.

Instead of collective measurement, we can consider measuring subsets of particles, which we call bundles, and, at the extreme limit, which is of interest here, the individual-measurement case of measuring a single particle at a time. Mathematically, we split the particles into  bundles of  particles where  Palittpongarnpim et al. (2016) so the Hilbert space can be expressed as


In this case, both  and  act on . For localized measurements on each bundle, the POVM is


with outcomes from this tensor-product POVM being concatenations of  length  strings of -dimensional digits,




measured form the  bundle.

In one extreme case, each bundle contains only one particle, which leads to and . The string of outcomes becomes


The POVM is


which is a tensor product of  qudit POVMs.

For two-level particles, the state (2) is simplified to


and the POVM simplifies from (11) to


The outcome (10) is simplified to


which is an -bit string. Henceforth, we restrict to the (two-level system), (single-particle-per-bundle case) for simplicity and without loss of generality.

Adaptive strategy.

The adaptive strategy involves incorporating quantum feedback control James and Nurdin (2015) such that the system operation depends on both the unknown parameter  and a control parameter 

, for some degree of freedom, on the

bundle. We assume that incorporating a control preserves the system acting as a channel and thus write the channel acting on the bundle as . Measurement of the bundle leads to an update of the control parameter to  for the next bundle.

The control-parameter update is determined by a policy


which uses the string of outcomes  and the most recent control parameter  to obtain the next control parameter . This procedure continues until reaching the final particle, i.e., the particle, at which point the estimate of the unknown parameter is


Therefore, the adaptive strategy can be used for single-shot measurement; i.e., inferring  from one instance of the measurement procedure.

ii.1.2 Imprecision limits

Imprecision of the estimate (16) is denoted  (1). Assuming the measurement is optimal and that the quantum channel is noiseless Caves et al. (1980); Schreppler et al. (2014), imprecision lower bounds are calculated for classical and quantum resources. These lower bounds are the SQL and HL, respectively. In a noisy system, which pertains in practice, a QEM scheme is unlikely to saturate the bound. Despite the presence of noise, the SQL is still the bound that must be surpassed to claim QEM, which requires using quantum resources. Here we review these limits as benchmarks for both the robustness test and to compare AQEM policies.

Standard quantum limit.

The SQL is the imprecision lower bound if classical resources are used, which means that the input state (12) is separable, i.e., unentangled Kołodyński and Demkowicz-Dobrzański (2013). The simplest case of such a separable state is a tensor product of  independent particles Giovannetti et al. (2004),


Interacting this state to the quantum channel leads to the output state


Measuring this state (18) according to the POVM (13

) leads to output governed by the probability distribution,


which is independent and identically distributed (iid) for . Calculating the imprecision, such as through the central limit theorem, leads to because of this iid condition Maccone (2013). The scaling also holds for the imprecision lower bound, which is calculated from (18) using the Cramér-Rao lower bound Kay (1993), and is irrespective of the quantum channel.

Heisenberg limit.

If quantum resources are employed, e.g., squeezing Braunstein (2005) or entanglement Wootters (1998), the SQL can be surpassed Caves (1981); Tóth and Apellaniz (2014). The lower bound to using the quantum resource can be computed from the quantum version of the Cramér-Rao lower bound Helstrom (1969), which depends on the input state and the quantum channel Jacobs (2014). Therefore, unlike the SQL, the HL is specific to the QEM scheme Zwierz et al. (2012). In the case of interferometric phase estimation, the HL is known to be  Bondurant and Shapiro (1984), although this limit can only be attained through the use of optimal measurement. As an optimal POVM could be infeasible, adaptive phase-estimation schemes provides an attractive alternative to achieving close to this lower bound Wiseman et al. (2009).

ii.2 Adaptive phase estimation

Phase estimation underlies many QEM applications Caves (1981); Yurke et al. (1986); Giovannetti et al. (2004) and thus is widely used for devising quantum-enhanced techniques, including several AQEM schemes Wiseman et al. (2009); Hentschel and Sanders (2010); Roy et al. (2015). Here we explain the interferometric adaptive phase-estimation scheme controlled by Bayesian feedback Berry and Wiseman (2000); Berry et al. (2001) or reinforcement-learning policies Hentschel and Sanders (2011a); Lovett et al. (2013); Palittapongarnpim et al. (2017a), which we compare in terms of robustness and resource consumption for control.

ii.2.1 Adaptive interferometric phase-estimation procedure

One method of estimating phase is to use an interferometer, which infers phase shifts from the interference between two or more modes Armstrong (1966). In particular, we use an adaptive phase-estimation scheme based on a Mach-Zehnder interferometer, which has two modes and therefore we are looking at the case of  representing the modes. The mathematics of Mach-Zehnder interferometry applies to other forms of SU(2) interferometry, such as Ramsey, Sagnac and Michelson interferometry Yurke et al. (1986); Hariharan and Sanders (1996); Giovannetti et al. (2004). In this subsubsection, we present the input state, adaptive channel, detection, feedback, inference and imprecision.

Input state.

For non-adaptive quantum interferometry with collective measurement, the unitary interferometric transformation is in the Lie group SU with irrep (Casimir-invariant label) . For adaptive quantum interferometry or individual measurements, the interferometric unitary transformation is SU for  particles and two paths. However, the two descriptions converge if the input state is permutationally symmetric; technically, Schur-Weyl duality dictates that the applicable transformation is SU(2) with irrep  Hentschel and Sanders (2011b).

Notationally, modes are labelled by


which conveys which of the two paths, such as input or output port or intra-interferometric path, pertains. Thus, the state  refers to photon being in path . The multiphoton basis is the tensor-product state


For the Hamming weight, i.e., sum of bits, of , the permutationally-symmetric basis is


for  the total number of particles in mode .

The sine state serves as a loss-tolerant symmetric state that minimizes phase-estimation imprecision Summy and Pegg (1990); Wiseman and Killip (1997, 1998), and is expressed as Berry and Wiseman (2000); Berry et al. (2001)


for the Wigner- function Mishchenko (2014). This state also has the advantage of being robust against photon loss Hentschel and Sanders (2011b), which is a desirable property for practical QEM and so is used in the adaptive phase estimation procedures.

Adaptive channel.

The particles in the sine state (II.2.1) are divided into single-particles bundles ( case), each of which passes through the Mach-Zehnder interferometer. For a noiseless interferometry, the quantum channel for one photon is


for  a Pauli matrix Sanders and Milburn (1995). Therefore, the channel is


acting on the state space (12).

In a physical implementation of an interferometer, mechanical disturbances, air-pressure changes and the thermal fluctuations induce optical-path fluctuations. These effects randomize the phase difference


according to prior distribution . The quantum channel is thus Hentschel and Sanders (2011a)


where  is the state after the  photon is measured.

Detection, feedback, and inference.

After the  photon passes through the interferometer, the photon is detected by one of the single-photon detectors positioned outside the output ports. The information of about the exit port is , which is given to a controller. The controller then uses this information to compute  from the policy  before the next photon arrives. The procedure of simulating the injection of the next photon from the sine state (II.2.1) followed by action of the channel (25) and then measurement at the output ports is repeated until all photons are consumed, and the estimate is inferred from , assuming no loss of photons.


Imprecision of the estimate (1

) is related to the Holevo variance 

Wiseman and Killip (1997)


by the sharpness function


which quantifies the width of a distribution over a periodic variable. The sharpness (29), hence the Holevo variance (28), is estimated by repeatedly simulating  times Hentschel and Sanders (2010) for uniformly randomly chosen  in each run with  the unknown (noiseless) interferometric phase shift. In the simulation  the mode (most frequent value) of the unimodal prior distribution . The result  (29) is itself sampled from a distribution  with a mean


Consequently, the Holevo variance (28), and thus the imprecision of the estimate, is only estimated through this procedure.

ii.2.2 Policy generation

Whether the adaptive estimation scheme is capable of breaking the SQL and reaching HL depends on the feedback policy; in general, the feedback policy does not come close to the HL although can surpass the SQL, which is our success criterion. We compare two set of policies, one designed based on Bayes’s theorem Berry and Wiseman (2000) and the other by reinforcement learning Hentschel and Sanders (2010); Palittapongarnpim et al. (2017a). The most significant difference between these two methods is that the Bayesian feedback computes  based on a trusted model of how the quantum state evolves, whereas the reinforcement-learning algorithm here uses only the measurement outcomes  to decide the value of . A model-free approach that is used in the reinforcement-learning algorithm should, in principle, be robust to changes in the estimation scheme although whether the policies can preform better than the model-based approach is unknown.

Bayesian feedback.

The idea behind the Bayesian feedback is that every outcome from the measurement improves our knowledge of the the value of  Berry and Wiseman (2000); Berry et al. (2001). This is mathematically captured in Bayes’s theorem Riley et al. (2006a), where the initial knowledge of  is represented by a uniform prior, indicating that  can be any value in  with equally probability.

For each particle exiting the interferometer, the probability for the particle being detected in output port  can be computed by assuming a perfect input state and known quantum dynamics such as the unitary evolution (25). The prior for  is then updated using Bayes’s theorem. The squared width of this prior is quantified by  (28) in each measurement step; the optimal  that minimizes width for the next particle is then computed from the model.

Although the Bayesian method gets close to the HL Berry et al. (2001), the problem with Bayesian inference subject to a trusted model is that the policy might not be robust to variations of input state and to noise in the system. This mismatch is known to be a major concern in designing controllers Leigh (2004); Wang et al. (2016) and one way to address this concern is to adopt a design-and-feedback approach that does not use models but the data from the physical setup Hou and Wang (2013).

Reinforcement learning.

Reinforcement learning is used to devise a policy  by trial and error Sutton and Barto (2017); Sutton et al. (1992), and the learning algorithm iteratively optimizes  by testing the policy based on evaluating the outcome of the specified control task. For noisy adaptive phase estimation, policy evaluation is based on average sharpness (30Hentschel and Sanders (2011a), which averages out noise. In practice, due to the high cost of sampling  (30) very few runs are performed to obtain .

Reinforcement learning does not employ a model but rather uses the control outcomes to learn Degris et al. (2012); however, we currently train based on simulations that employ a model, but training could and should ultimately be performed under conditions of deployment such as in the laboratory or in the field. For reinforcement learning, feedback is assumed to follow a logarithmic-search heuristic Markovian update rule Hentschel and Sanders (2010)


with the phase-adjustment vector


optimized during the training stage by a scalable, noise-resistant reinforcement-learning algorithm such that average sharpness (30) is maximized Palittapongarnpim et al. (2017a). This update rule (31) corresponds to turning the “phase knob” up or down by a fixed amount , after the photon, subject only to the previous outcome and ignoring the full measurement history.

Reinforcement learning incurs a time cost to generate a policy . The time cost for generating the policy is quantified by loop analysis of the learning algorithm, assuming the algorithm is run on a single processor Lovett et al. (2013). The scaling of time cost with respect to the particle number determines the complexity, and in practice this scaling is polynomial so the degree of the polynomial convey the complexity for generating the policy.

ii.3 Models of phase noise

In this subsection, we explain the choices of the phase-noise model for the robustness test. This noise is simulated turning 

into a random variable that has a unimodal probability distribution with the peak at 

. The mode  is assumed to be the unknown parameter to be estimated. For the test,  follows one of these four distributions: normal, three-stage random telegraph, skew-normal or log-normal distribution. We summarize the relationship between the noise parameters and the variance and skewness Severini (2005b) as we use both in selecting the parameters for the robustness test and the variance in particular to quantify the noise level.

ii.3.1 Normal-distribution noise

Normal-distribution noise is important for testing robustness of the learning algorithm because the normal distribution is especially prominent due to the central-limit theorem, which states that the average of a random variable has a normal distribution Severini (2005a). Due to the prevalence of the normal distribution, assuming normal-distribution noise model is common Kirkup (2012). The normal distribution


is parametrized by the mean 

and standard deviation 

. As this distribution is symmetric, skewness  is identically zero and thus the mode is at , and the variance is .

In our simulations, we set  so the only free parameter is , which is bounded above by  as otherwise the width would exceed the domain of . This value of  is, of course, too high to expect any measurement scheme to operate with reasonable imprecision, and so the robustness test is conducted lower .

ii.3.2 Random telegraph noise

Random telegraph noise Newman (1968) is a discrete distribution that, for each time step, randomly switches between two values, one being the correct and the other an erroneous value. Whereas this noise is most relevant to digital electronics as it simulate a bit-flip error, it can be use to simulate other digitized noise, such as salt-and-pepper noise in image processing Boncelet (2005).

We modify two-stage random telegraph noise to have three stages,


The probability of switching to an erroneous value is , and  is the distance between the true and erroneous values leading to


with the last relation following from the symmetry of the distribution.

Unimodality of the distribution implies that . Furthemore, to avoid overlap with another distribution over the phase domain. To comply with both constraints and being able to raise the noise level to at least  so the result can be compared to that of other distributions, we fix  for the test and vary only .

ii.3.3 Skew-normal-distribution noise

The skew-normal distribution Azzalini and Capitanio (2014) is modified from a normal distribution by multiplying with a function whose skewness parameter is . Skew-normal noise is a class of noise that includes normal-distribution noise as a limiting case. Although this distribution is not widely used as noise, it arises in simulations of noise for filters and detectors Pourahmadi (2007); Rezaie and Eidsvik (2014).

The skew-normal distribution is


for  the error function Riley et al. (2006b). Skewness of the distribution is


and the variance is


The mode, however, does not have a closed form although it remains close to as increases. For the simulation, we assume the mode is .

ii.3.4 Log-normal-distribution noise

Log-normal Limpert et al. (2001) noise has a heavy-tailed skewed distribution that provides another approach to generalizing the normal distribution and is employed in the study of networks Kish et al. (2015); Kai et al. (1987) and electronics Jouini (2011). In this case, the logarithm of the random variable is said to have a normal distribution, leading to the distribution


with mode and variance


respectively, and skewness

As this distribution is defined for , we first generate a random number within the compact phase domain given  and  and then apply the shift


so that the mode of the distribution is centred at  (40).

ii.4 Regression analysis

The imprecision  and  are asymptotically power-law related (1). However, when the system is noisy, this relationship fails for low , with the actual bound on  depending on the noise model. We employ regression analysis to select the subset of  at high  that scales as  and estimate the corresponding  by building piecewise functions and selecting the best candidate to represent the data. In this subsection, we explain our regression-analysis procedure for fitting a model given a set of data.

ii.4.1 Fitting the model

Regression analysis aims to determine the mathematical relationship between dependent ( here) and independent variables (here Montgomery et al. (2012a). The process of building this mathematical model begins with selecting a function  based on the knowledge of the mechanism and observations of the trends Chatterjee and Simonoff (2013a). The function is only a best guess as the discerned trend could be subjective.

After a function is selected, the function is then fitted to the data by finding the parameters that minimize the error between the predicted and the data  Chatterjee and Hadi (2012). The method we employ is the least-squaress estimation Montgomery et al. (2012b)

, used in linear regression to calculate the variables by constraining the gradients to zero and solving the resulting system of linear equations. We choose this method as we fit linear and piecewise linear equations to log-log plot of

and .

ii.4.2 Consistency of the fit

As the fitted function is only an educated guess, the fitting result must be examined for inconsistencies with respect to the model’s assumptions Chatterjee and Hadi (2012). An alternative function can then be proposed, fitted, and compared to the previous function in order to find one that best represents the data. Deciding on the best model from the set is done using statistical criteria that either estimate the goodness of the fit to the data or between two models fitted to the same data Montgomery et al. (2012c). The model that is consistently shown to fit well according to each of the criteria is then selected to represent the data.

Common linear-regression criteria include

  • the coefficient of determination


    adjusted to


    with  the number of data points in the fit.

  • the corrected Akaike Information Criteria , which quantities information lost due to the discrepancy between the model function and the true function , and is ‘corrected’ to avoid overfitting,

  • the -test, which assesses a full model (maximum 

    ), as the null hypothesis, vs a reduced model (reduction from the full model) as the alternative hypothesis 

    Chatterjee and Simonoff (2013b, a), and

  • Mallows’s  Chatterjee and Simonoff (2013a); Montgomery et al. (2012c) (but we use  rather than the traditional  for the number of parameters), which estimates the mean-square prediction error Montgomery et al. (2012c) to compare a reduced model to the full model, where the reduced model with the smallest  close to is chosen.

Each of these criteria is designed to penalize functions with many parameters  to avoid overfitting the data Chatterjee and Simonoff (2013a).

Iii Approach

In this section, we devise a test to determine whether quantum-enhanced precision is feasible in the presence of unknown noise. We then assess whether power-law scaling of phase imprecision vs particle number  is valid asymptotically and establish a method to determine this power . Finally, we define the resource for generating and implementing the control policies in terms of the scaling of the space and time cost with .

iii.1 Robustness test

The robustness of AQEM policies is determined by testing the policies in the presence of noise whose model is not recognized by the policies and the method that generates the policies, although reinforcement-learning policies are learned in trainings that include the noise. Here we define the test for adaptive phase estimation, including phase noise from §II.3. We specify the domain of  for simulating the phase estimation schemes to obtain  in noisy conditions. The noise parameters are variance  and skewness II.3), but here we fix  for the asymptotic distributions, and we obtain the robustness-test threshold in terms of , which is the maximum for each noise model such that the SQL is violated.

Varying .

To ascertain the asymptotic value for , we simulate adaptive phase estimation for


as  computed from this domain is sufficient to show power-law relationship at high . Furthermore, increasing  further requires changing double-precision arithmetic to quadruple-precision arithmetic to generate and manipulate the sine state without rounding error. Consequently, this increase in precision leads to a fifteen-fold increase in run-time at , which is a large expense for generating a single data point. Therefore, we do not attempt to verify the robustness beyond this 100 particles.


We fix skewness  to a single value for all runs and only vary  because  is the dominant term in our noise models and  has a small effect Palittapongarnpim et al. (2017b). We fix the skewness for the asymmetric distribution to


which is sufficiently large to distinguish between the various noise models; otherwise all noise looks Gaussian. This value of  (45) corresponds to for the skew-normal distribution where we are able to observe its effect on  when compared to symmetric noise distributions. This same level of skewness corresponds to in the log-normal distribution.

Robustness threshold.

Our policy is robust if the SQL-breaking condition  is satisfied for all four noise models in §II.3. As discussed in ¶III.1, we fix , and we ignore higher cumulants; thus the policy robustness threshold is in terms of , i.e., the maximum  such that holds for all four noise models. This optimization problem is hard so we adopt a simpler characterization procedure instead to get insight into the robustness threshold. Our approach is to run the simulations for  for symmetric noise and  for asymmetric noise, and we do not push beyond to keep below an imprecision width of . We use these data to determine whether AQEM policies pass the robustness test.

iii.2 Determining asymptotic power-law scaling

To ascertain the robustness of AQEM policies, the asymptotic  is estimated from a subset of at sufficiently high , and determining this subset is done by fitting piecewise linear equations to a log-log plot of vs . In this subsection, we introduce five piecewise functions that are constructed from observations regarding the trend of vs . We then explain the method of finding the break points between segments in the piecewise function and fitting the functions to the data. Using the criteria §II.4, we create a majority-vote method for selecting the function that best represents the data and thus  from the last segment of the fit is used to estimate the asymptotic scaling.

iii.2.1 Piecewise models.

The trend in vs  differs under noisy conditions, and here we describe the trends we have observed that lead to piecewise linear functions. We construct five such functions, containing 1 to 3 segments that are then fitted to vs .

When the interferometer is noiseless, the relationship appears to be a power-law captured in a linear equation, although the accept-reject criterion in the reinforcement-learning algorithm can lead to a different  for . Once the noise level becomes high, typically , the relationship does not appear to be linear for low

. Therefore, we include segmented models that fits linear interpolation to the data in the first segment.

Combining these observations, we construct five piecewise-linear models that can potentially represent as a piecewise function of . The models have 1 to 3 segments, each segment connected at the break points determined by the fitting methods. Three of the models are one, two, and three linear models, whereas the other two are a two-segment model, where the first segment (low ) is a linear interpolation, and a three-segment model where the first segment is a linear interpolation and the second and third segments are linear.

iii.2.2 Fitting method

The method for fitting the linear equations that we use is the least-squares method Montgomery et al. (2012b). However, because the functions in §III.2.1 are segmented, we include a step to optimize the break points depending on the specific function.

The full model for the regression analysis is the three-segment linear function, which is fitted using linear-square method and the segments determined by a heuristic global optimization algorithm Jekel (2017). The two-segment linear function is also fitted using the same least-square method although a brute-force search is used to find the break point starting from .

The method for finding the break points for models with interpolation are different as the linear interpolation leads to a small residual. Thus, optimizing using the least-squares method can lead to a single segment of linear interpolation. For this reason, we first find the stop point for the first segment by fixing the latter segments to a single linear line and search for break point that results in a large decrease in sum square error. As for the single-segment linear model, we use a standard library to fit to the data.

iii.2.3 Fitting figures of merit and model selection

After the functions are fitted to , the criteria , , the -value, and Mallows’s II.4.2) are calculated for each of the function and the fits are visually inspected. These criteria are used to select the function that best fits the data. Here we explain how the best function is chosen.

After the functions are fitted and the criteria are calculated, each fit is visually presented and inspected to ascertain that the the segmentation fits the pattern. If correction are unnecessary, the functions are then ranked for each of the criteria. Note that we do not perform the full -test as we discovered that reduced models typically fail the test even though there is no discernible difference when compared to the full model. However, the -value can still be used to quantify the difference between using the full and reduced model, so we use the -value to rank the functions instead of conducting a pass-fail test.

After the functions are ranked, the function that is voted as best by most of the criteria is chosen to represent the data. In the case where the full model, the three-segment linear function, is voted according to the criteria, the value of  from the last segment and the subset of where this value is computed is compared to the next alternative function to determine whether the function overfits the data. If  from the two functions differs more than 0.001, then the full model is chosen; otherwise, the alternative function is chosen. The limit we use here is specified based on the precision used in this paper and can be changed based on the desired precision of .

iii.3 Resource complexity

To compare and select between policies and methods of generating policies, we determine the complexity of designing and implementing a policy using the loop-analysis method in algorithm analysis. We begin this subsection by explaining the time complexities of generating policies. We then explain what the controller does and the requisite resources, quantified by the space and time complexity for executing a policy with  particles.

iii.3.1 Design complexity

When an optimization or a learning algorithm is used to generate a policy, there is a time cost associated with the use of the algorithm. The scaling of the upper bound of this cost is called the design complexity. Here we explain the assumptions behind the calculation of this complexity.

We assume that this task is performed on a simulation of the AQEM task, as is common practice in policy generation in quantum control Khaneja et al. (2005), and, therefore, the time cost includes the cost of simulating the AQEM task. We assume that only a single processor is used for the purpose of comparing policies generation methods, although this cost can be reduced by parallelizing the learning or the optimization on multiple CPUs.

The design complexity for the reinforcement-learning policies in §II.2.2 is shown to be through loop analysis Lovett et al. (2013), and this complexity does not change when noise is included. For policies that are devised through analytical optimization, such as the Bayesian feedback, this cost is zero as no algorithm is used.

iii.3.2 Controller complexities

In this subsection, we explain the resource complexity required to use AQEM policies, quantified by the scaling of the space and time costs with the number of particles  Vrajitoru and Knight (2014a). We begin by explaining the connection between an AQEM policy and an algorithm by viewing the controller as a computer, allowing us to use the method of algorithm analysis to calculate the complexities Vrajitoru and Knight (2014b). We then define the space and time cost for implementing policies and how these costs are calculated.


The controller holds a policy  (15) and uses this policy to execute decisions based on feedback from measurement of where the particles are detected. To execute this task the controller requires computer memory and sufficient time to affect the policy, which we use to determine the scaling of resource cost.

The controller is essentially a computer that receives input from detectors and transmits a control signal to an actuator that shifts the interferometric phase. In this perspective, the policy  is represented as a computer algorithm expressed as a computer program. The candidate policy can be generated by various means including Bayesian feedback ¶II.2.2 and reinforcement learning ¶II.2.2. Space and time costs are discussed in the next two subsubsections.

Space complexity

We determine the upper bound for space cost, which is the worst-case amount of memory used by an algorithm reported as a big O function of the size of the problem Vrajitoru and Knight (2014a). As  is executed by an algorithm, this worst-case, or maximum-size, memory corresponds to how much space is required to hold the critical information required to execute the feedback.

For , the computer’s space cost for memory depends on the type of policy. For Bayesian feedback, the size of the stored policy is , as shown in §IV.3 and specifically in Table 2, and the size of the stored policy for reinforcement learning is  Lovett et al. (2013). This linear scaling for policies obtained by reinforcement learning is due to the generalized-logarithmic-search heuristic, leading to the size  phase-adjustment vector (32). The information used by the policy are the policies parameters, where there are for an AQEM scheme that uses particles.

Time complexity

Time complexity is the scaling for the upper bound in time cost for implementing a single shot of AQEM. This cost is calculated by assuming the time a particle takes to pass through an interferometer is constant, mimicking the physical implementation of the control procedure, and use loop analysis, which counts the number of loops that perform operations, which we assume all take the same constant time Vrajitoru and Knight (2014b). For a shot of AQEM task using particles, is computed times, corresponding to each particle passing through the interferometer and being detected. For each particle, there can be loops nested in the computation of according to the policy . The complexity is reported as the scaling of this time cost with .

We determine the implementation cost for reinforcement-learning policies by recognizing that the update of according to Eq. (31) is constant in time. Therefore, the adaptive phase estimation procedure consisted of only one loop over the number of particles, and the implementation complexity is . Bayesian feedback, on the other hand, has nested loops for updating the quantum state, which is in time complexity for every computation of . Hence, the implementation complexity is .

Iv Results

In this section, we report results for the robustness test and the resources based on sampling from simulated adaptive phase estimation. We present and compare  (28) obtained by reinforcement learning discussed in ¶II.2.2 and by Bayesian feedback discussed in ¶II.2.2. Our analysis considers all four types of noise discussed in §II.3. We report values of  (1) from the regression procedure discussed in §III.2 and resource complexities discussed in §III.3 for both the reinforcement-learning policies and the Bayesian feedback.

iv.1 Variance vs number of particles

In this subsection, we present results for  as a function of number  of particles. Specifically, we present plots of  vs  from 4 to 100 particles, which is enough to determine scaling as discussed in §III.3. Both cases of using reinforcement-learning policies and Bayesian feedback are presented as log-log plots and compared to the SQL, with these plots obtained by computing from simulations of noiseless phase estimation using a product state following the notation of Eq. (22). The HL is generated based on the intercept of the SQL data using the scaling of to provide a benchmark.

iv.1.1 Reinforcement learning

Here we present the log-log plots of vs  from adaptive phase estimations using reinforcement-learning policies, as shown in Fig. 5.

(a) normal-distribution noise
(b) random telegraph noise
(c) skew-normal-distribution noise
(d) log-normal-distribution noise
Figure 5: Logarithmic plots of Holevo variance from simulation of adaptive phase estimation. The policies are generated using reinforcement learning implemented in the specified noise condition, namely, (a) normal-distribution noise, (b) random telegraph noise, (c), skew-normal-distribution noise, and (d) log-normal-distribution noise. The plot for the normal-distribution noise also includes the data from the noiseless simulation (brown side-facing triangle) and its linear fit (green solid). The blue circles are data when , the red triangles when , the green squares when , the brown plus when , the brown crosses when , and the purple diamonds when . The lines shown are the piecewise linear fits of the data whose scaling is reported. The solid black line if the HL and the dashed purple line in the SQL generated from noiseless adaptive phase estimation.

Subfigures 5(a–d) present  when normal-distribution noise, random telegraph noise, skew-normal noise, log-normal noise are included, respectively.

Figure 5(a) also includes  from noiseless interferometry. This locus appears as a straight line in the plot, indicating a power-law relationship between  and . As the noise variance  increases, this power-law relationship breaks into two parts, clearly visible in the  vs  plot from . This trend also appears at  as the the model selection procedure III.2.3 selects the two-segment model for this data set. The observation that the power-law relationship fails when noise is included is also evident in Figs. 5(b–d). In these cases, the plots are fit to two- or three-segment linear equations as  appears to have a bump at low  as  increases.

The increase in phase noise  also results in an increase in the intercepts of  power-law lines; however, the rate of change appears to depend on the noise model. The difference can be seen in Fig. 5(a–b), both include symmetric noise distributions but with different spacing of the intercepts. The same observation holds for Fig. 5(c–d), which are from asymmetric distributions. Comparing the four plots shows that the intercepts appear to increase slower for asymmetric distributions than the symmetric distributions, being close to 1 for in the former and for the latter.

iv.1.2 Bayesian feedback

Log-log plots of as a function of , shown in Fig. 10, are computed from simulations of adaptive phase estimation controlled by Bayesian feedback.

(a) normal-distribution noise
(b) random telegraph noise
(c) skew-normal-distribution noise
(d) log-normal-distribution noise
Figure 10: Logarithmic plots of Holevo variance from simulation of adaptive phase estimations that use Bayesian feedback method. The simulation includes from the four noise models, namely, (a) normal-distribution noise, (b) random telegraph noise (c), skew-normal-distribution noise, and (d) log-normal-distribution noise. The plot for the normal-distribution noise also includes the data from the noiseless simulation (brown side-facing triangle) and its linear fit (green solid). The blue circles are data when , the red triangles when , the green squares when , the brown plus when , the brown crosses when , and the purple diamonds when . The lines shown are the piecewise linear fit of the data whose scaling is reported. The solid black line if the HL and the dashed purple line in the SQL generated from noiseless adaptive phase estimation.

Figures 10(a–d) present  in the presence of normal-distribution noise, random telegraph noise, skew-normal noise, and log-normal noise respectively.

Similar to Fig. 5, the trend of  vs  in Fig. 10 shows that the power-law relationship also breaks into parts. Instead of a bump, from Bayesian feedback exhibits noise for low . For this reason, the model-selection procedure III.2.3 favours model with linear interpolation in the first segment. The subsequent segment appears straight in the log-log plots, although some, such as the  in Fig. 10(c), shows a break into two linear segments.

The intercepts of the vs plots increase with the increase of , and the observation of the changes are similar to when reinforcement-learning policies are used (§IV.1.1). The asymmetric noise show slow increase in intercept when compared to symmetric noise, and the rate of change depends on the noise model.

iv.2 Power-law scaling

In this subsection, we present values of , summarized in Table. 1, that are estimated by fitting  plots in §IV.1.

SQL 1 1
HL 2 2
No noise 1.459 0.9998 1.957 0.9993
1 0 0.9999 0.9985
Normal 2 0 0.9999
3 0 0.9992 0.9997
4 0 0.9948
1 0 0.9999 0.9991
Random telegraph 2 0 0.9997 0.9967
3 0 0.9993 0.9892
1 0.8509 0.9999
Skew-normal 3 0.8509 0.9999 0.9987
5 0.8509 0.9998 0.9927
7 0.8509 0.9996 0.9964
1 0.8509 0.9999
Log-normal 3 0.8509 0.9998 0.9919
5 0.8509 0.9997 0.9961
7 0.8509 0.9994 0.7965
Table 1: Power-law scaling from adaptive phase estimation in noisy condition using reinforcement-learning policies and Bayesian feedback .

These ’s are from the last segment of the selected piecewise linear models (§III.2.1), which changes with the increase in . We also include (43) to show the goodness of fit.

The power-law scaling for reinforcement-learning policy shows a decrease as the noise level increases, starting from the noiseless phase estimation at =1.459. The reinforcement-learning policies fail to deliver when for the symmetric noise distributions. This limit increases with asymmetric noise models to in log-normal noise. The skew-normal noise only shows a scaling at approaches the SQL but does not breach it at all.

Similar trends are observed for the Bayesian feedback. The scaling from noiseless interferometer closely approximates the HL at and approaches SQL when for normal-distribution noise. This limit drops to when random telegraph noise is included. This limit also appears at for log-normal noise, whereas the same noise-level only lead to approaching SQL when skew-normal noise is present. These trend, aside from the case of normal-distributed noise, is the same as the trend for the reinforcement-learning policies.

The goodness-of-fit for these fits are reported in term of , where indicates a perfect fit. The values of the goodness for delivered by reinforcement-learning policies and for Bayesian feedback except for when a log-normal noise of is present. Overall, the models chosen using the method in §III.2.3 provide good fits to the data and the reinforcement-learning policies always deliver fits with .

iv.3 Bounds on time and space costs

The result from the calculation of space and time complexities for both generating and implementing the reinforcement-learning policies and the Bayesian feedback is shown in Table 2.

Complexity RL BF
Design time
Policy space
Implementation time
Table 2: Upper bound in policy space and time cost of the policy from reinforcement-learning algorithm (RL) and the Bayesian feedback (BF).

Here we compare how these results.

The time complexity for generating policies, called here the design time, is of high polynomial when reinforcement-learning algorithm is used. Bayesian feedback, which is designed through an analytical process, incurs no time cost for the design . When the implementation time is compared, the time complexity of Bayesian feedback is two order above the reinforcement-learning policies. The space complexity, which shows the memory used to store feedback information, is also larger for Bayesian feedback than are reinforcement-learning policies by an polynomial degree, specifically, going from linear scaling in the reinforcement-learning policy to quadratic for Bayesian feedback.

V Discussion

In this section, we discuss the robustness and its threshold for AQEM policies and possible reasons behind the high level of noise-resistant. We explain the difference between the attained by reinforcement-learning policies and Bayesian feedback using the space complexity of the policies. We opt for policies and methods for generating policies based on the resource complexity of the generated policies.

v.1 Robustness of AQEM policies

In this subsection, we discuss robustness for the AQEM policies and the robustness threshold based on  in Table 1. Here we discuss the robustness threshold and propose explanations for this high threshold.

Both the reinforcement-learning policies and the Bayesian feedback are able to deliver for all four noise models until when the scaling obtained in the present of random telegraph noise fails to exceed the SQL. For noise models that are asymmetric, quantum-enhanced precision is observed up to at the skewness of . This high level of robustness is unexpected, and especially so for the Bayesian feedback as the dynamic of the interferometer no longer matches the noiseless assumption.

One possible reason behind this high robustness threshold is the sine state (II.2.1), which is already known to be robust against loss Hentschel and Sanders (2011b). The structure of the sine state may also contribute to robustness against phase noise as well, although AQEM policies also play a role in the robustness of the AQEM scheme.

The effect of the feedback policy is highlighted by the threshold for normal-distribution noise, where the threshold for reinforcement-learning policies is at as opposed to the Bayesian feedback at . This result, however, does not imply that robustness against noise of unknown distribution is improved as the thresholds for all other noise models are the same for both the reinforcement-learning policies and the Bayesian feedback.

v.2 Space cost and power-law scalings

Table 1 shows that the power-law scaling delivered by Bayesian feedback is consistently superior to those delivered by the reinforcement-learning policies before the robustness threshold is reached. Here we use the space complexity of the policies in Table 2 to explain the reason behind this difference.

Table 2 shows that the Bayesian feedback has a space cost that scales a polynomial degree higher than for the reinforcement-learning policies, which indicates that the Bayesian feedback utilizes more information and hence is more complex than the reinforcement-learning policies. By using a quantum-state model, Bayesian feedback effectively uses the history of measurement outcomes to determine instead of the current outcome , which is the approach used by the reinforcement-learning policies in (31). As such, reinforcement-learning policies are restricted by the generalized-logarithm-search strategy (§II.2.2) and so cannot deliver a value of  that approaches the HL. Improvement of reinforcement-learning policies can be done by changing the update rule so that the policy uses a part of the measurement history.

v.3 Choosing an AQEM policy

In this subsection, we explain how the space and implementation time complexity (§III.3.2) can be used to decide between competing policies and method of generating the policies. In particular, we discuss choosing between the reinforcement-learning policies and the Bayesian feedback.

The consideration of the space and time complexity of the policies comes after ascertaining that the candidate policies are able to deliver the target performance. In the case of AQEM, the target is to attain , which both the reinforcement-learning policies and the Bayesian feedback are able to deliver. Both methods also have the same robustness threshold against phase noise of unknown distribution. Based on these comparison, both policies appears equally suitable.

When comparing space and implementation complexity (§IV.3), reinforcement-learning policies shows an advantage as the scaling of both costs are linearly bound, whereas the Bayesian feedback is quadratic in space complexity and cubic in time complexity. For this reason, we favour the reinforcement-learning policies for robust adaptive phase estimation.

We do not consider the design complexity, which is used in the generation of the policies, as the scaling of the cost can be improved by parallelizing the training. This cost may be of interest when the learning occurs in a physical setup where parallelizing is not possible and one shot of the experiment is expensive. In this case, there maybe an upper bound in number of experiments and hence time that can be invested in training a policy, and the design complexity can be used to infer the method that generate a policy within this bound.

Vi Conclusion

We have tested AQEM policy robustness based on Bayesian feedback and reinforcement learning and compared the resource complexities for implementing the policies. We find that both the reinforcement-learning policies and Bayesian feedback are robust against phase noise up to noise level of . Although the imprecision scaling delivered by Bayesian feedback is superior to the policies from reinforcement learning, the latter policies use less resource than does Bayesian feedback with respect to both space and time complexity.

Although we develop a robustness test and method of comparing policies based on resource complexity with AQEM procedures in mind, these ideas can be applied to other cases in QEM and quantum control. Robustness is a desirable property of any QEM schemes and the test presented here can be adapted to quantify robustness of non-adaptive procedures and investigate the role of the input state to the robustness of QEM generally. Quantifying the resource used by control policies can be used to show efficacy of the policies not only in AQEM but in other quantum control tasks, and the comparison of the resource complexity can be used to select a policy that is most efficient in accomplishing a task.

The computational work is enabled by the support of WestGrid ( through Compute Canada Calcul Canada ( and by NSERC and AITF.


  • Giovannetti et al. (2004) V. Giovannetti, S. Lloyd,  and L. Maccone, Science 306, 1330 (2004).
  • Giovannetti et al. (2011) V. Giovannetti, S. Lloyd,  and L. Maccone, Nat. Photon. 5, 222 (2011).
  • Braunstein et al. (1996) S. L. Braunstein, C. M. Caves,  and G. J. Milburn, Ann. Phys. (N. Y.) 247, 135 (1996).
  • Tóth and Apellaniz (2014) G. Tóth and I. Apellaniz, J. Phys. A: Math. Theor. 47, 424006 (2014).
  • Vrajitoru and Knight (2014a) D. Vrajitoru and W. Knight, “Introduction,” in Practical Analysis of Algorithms (Springer, Cham, Switzerland, 2014) Chap. 1, pp. 1–7.
  • Hollenhorst (1979) J. N. Hollenhorst, Phys. Rev. D 19, 1669 (1979).
  • Caves et al. (1980) C. M. Caves, K. S. Thorne, R. W. P. Drever, V. D. Sandberg,  and M. Zimmermann, Rev. Mod. Phys. 52, 341 (1980).
  • Caves (1981) C. M. Caves, Phys. Rev. D 23, 1693 (1981).
  • Bollinger et al. (1996) J. J. . Bollinger, W. M. Itano, D. J. Wineland,  and D. J. Heinzen, Phys. Rev. A 54, R4649 (1996).
  • Borregaard and Sørensen (2013) J. Borregaard and A. S. Sørensen, Phys. Rev. Lett. 111, 090801 (2013).
  • Danilin et al. (2018) S. Danilin, A. V. Lebedev, A. Vepsäläinen, G. B. Lesovik, G. Blatter,  and G. S. Paraoanu, npj Quantum Inf. 4, 29 (2018).
  • Zhang and Duan (2014) Z. Zhang and L. M. Duan, New J. Phys. 16, 103037 (2014).
  • Matthews et al. (2016) J. C. F. Matthews, X.-Q. Zhou, H. Cable, P. J. Shadbolt, D. J. Saunders, G. A. Durkin, G. J. Pryde,  and J. L. O’Brien, Npj Quantum Inf. 2, 16023 (2016).
  • Wiseman (1995) H. M. Wiseman, Phys. Rev. Lett. 75, 4587 (1995).
  • Armen et al. (2002) M. A. Armen, J. K. Au, J. K. Stockton, A. C. Doherty,  and H. Mabuchi, Phys. Rev. Lett. 89, 133602 (2002).
  • Berry and Wiseman (2000) D. W. Berry and H. M. Wiseman, Phys. Rev. Lett. 85, 5098 (2000).
  • Rosolia et al. (2018) U. Rosolia, X. Zhang,  and F. Borrelli, Annual Review of Control, Robotics, and Autonomous Systems 1, 259 (2018).
  • Roy et al. (2015) S. Roy, I. R. Petersen,  and E. H. Huntington, New J. Phys. 17, 063020 (2015).
  • Wiseman and Killip (1997) H. M. Wiseman and R. Killip, Phys. Rev. A 56, 944 (1997).
  • Wiseman and Killip (1998) H. M. Wiseman and R. Killip, Phys. Rev. A 57, 2169 (1998).
  • Hentschel and Sanders (2010) A. Hentschel and B. C. Sanders, Phys. Rev. Lett. 104, 063603 (2010).
  • Lovett et al. (2013) N. B. Lovett, C. Crosnier, M. Perarnau-Llobet,  and B. C. Sanders, Phys. Rev. Lett. 110, 220501 (2013).
  • Palittapongarnpim et al. (2017a) P. Palittapongarnpim, P. Wittek, E. Zahedinejad, S. Vedaie,  and B. C. Sanders, Neurocomputing 268, 116 (2017a).
  • Escher et al. (2011) B. M. Escher, R. L. de Matos Filho,  and L. Davidovich, Nat. Phys. 7, 406 (2011).
  • Demkowicz-Dobrzanski et al. (2012) R. Demkowicz-Dobrzanski, J. Kołodynski,  and M. Guţă, Nat. Commun. 3, 1063 (2012).
  • Berry et al. (2001) D. W. Berry, H. M. Wiseman,  and J. K. Breslin, Phys. Rev. A 63, 053804 (2001).
  • Bobroff (1987) N. Bobroff, Appl. Opt. 26, 2676 (1987).
  • Sinclair et al. (2014) L. C. Sinclair, F. R. Giorgetta, W. C. Swann, E. Baumann, I. Coddington,  and N. R. Newbury, Phys. Rev. A 89, 023805 (2014).
  • Severini (2005a) T. A. Severini, “Central limit theorems,” in Elements of Distribution Theory, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2005) Chap. 12, pp. 365–399.
  • Breitenberger (1963) E. Breitenberger, Biometrika 50, 81 (1963).
  • Newman (1968) D. S. Newman, Ann. Math. Stat. 39, 890 (1968).
  • Azzalini and Capitanio (2014) A. Azzalini and A. Capitanio, in The Skew-Normal and Related Families, Institute of Mathematical Statistics Monograph, edited by D. R. Cox, A. Agresti, B. Hambly, S. Holmes,  and X.-L. Meng (Cambridge University Press, Cambridge, UK, 2014) Chap. 2, pp. 24 – 56.
  • Limpert et al. (2001) E. Limpert, W. A. Stahel,  and M. Abbt, BioScience 51, 341 (2001).
  • Pourahmadi (2007) M. Pourahmadi, Commun. Stat. Theory Methods 36, 1803 (2007).
  • Jouini (2011) W. Jouini, IEEE Signal Process. Lett. 18, 423 (2011).
  • Rezaie and Eidsvik (2014) J. Rezaie and J. Eidsvik, Comput. Stat. Data Anal 75, 1 (2014).
  • Vrajitoru and Knight (2014b) D. Vrajitoru and W. Knight, “Deterministic analysis of algorithms,” in Practical Analysis of Algorithms (Springer, Cham, Switzerland, 2014) Chap. 5, pp. 169–293.
  • Palittpongarnpim et al. (2016) P. Palittpongarnpim, P. Wittek,  and B. C. Sanders, in Proc. SPIE Quantum Communications and Quantum Imaging XIV, Vol. 9980 (SPIE, Bellingham, WA, 2016) pp. 99800H–99800H–11.
  • Helstrom (1969) C. W. Helstrom, J. Stat. Phys. 1, 231 (1969).
  • Humphreys et al. (2013) P. C. Humphreys, M. Barbieri, A. Datta,  and I. A. Walmsley, Phys. Rev. Lett. 111, 070403 (2013).
  • Yue et al. (2014) J.-D. Yue, Y.-R. Zhang,  and H. Fan, Sci. Rep. 4, 5933 (2014).
  • Higgins et al. (2007) B. L. Higgins, D. W. Berry, S. D. Bartlett, H. M. Wiseman,  and G. J. Pryde, Nature 450, 393 (2007).
  • Demkowicz-Dobrzański and Maccone (2014) R. Demkowicz-Dobrzański and L. Maccone, Phys. Rev. Lett. 113, 250801 (2014).
  • Hotta et al. (2005) M. Hotta, T. Karasawa,  and M. Ozawa, Phys. Rev. A 72, 052334 (2005).
  • Sbroscia et al. (2018) M. Sbroscia, I. Gianani, L. Mancino, E. Roccia, Z. Huang, L. Maccone, C. Macchiavello,  and M. Barbieri, Phys. Rev. A 97, 032305 (2018).
  • Landsman (2017) K. Landsman, “Quantum physics on a general hilbert space,” in Foundations of Quantum Theory: From Classical Concepts to Operator Algebras (Springer International, Cham, Switzerland, 2017) Chap. 4, pp. 103–123.
  • Holevo (2012) A. S. Holevo, in Quantum Systems, Channels, Information: A Mathematical Introduction, De Gruyter Studies in Mathematical Physics, Vol. 16 (Walter de Gruyter, Berlin, Germany, 2012) Chap. Quantum evolutions and channels, pp. 103–131.
  • Yurke et al. (1986) B. Yurke, S. L. McCall,  and J. R. Klauder, Phys. Rev. A 33, 4033 (1986).
  • Hayashi (2006) M. Hayashi, “Mathematical formulation of quantum systems,” in Quantum Information: An Introduction (Springer, Berlin, Germany, 2006) pp. 9–25.
  • Wiseman and Milburn (2009) H. M. Wiseman and G. J. Milburn, Quantum Measurement and Control (Cambridge University Press, Cambridge, MA, 2009).
  • James and Nurdin (2015) M. R. James and H. I. Nurdin, in 2015 IEEE Conference on Control Applications (CCA), IEEE (IEEE CCA, Sydney, NSW, 2015) pp. 1–12.
  • Schreppler et al. (2014) S. Schreppler, N. Spethmann, N. Brahms, T. Botter, M. Barrios,  and D. M. Stamper-Kurn, Science 344, 1486 (2014).
  • Kołodyński and Demkowicz-Dobrzański (2013) J. Kołodyński and R. Demkowicz-Dobrzański, New J. Phys. 15, 073043 (2013).
  • Maccone (2013) L. Maccone, Phys. Rev. A 88, 042109 (2013).
  • Kay (1993) S. M. Kay, “Cramer-Rao lower bound,” in Fundamentals of Statistical Signal Processing: Estimation Theory, edited by A. V. Oppenheimer (Prentice-Hall, Upper Saddle River, NJ, 1993) Chap. 3.
  • Braunstein (2005) S. L. Braunstein, Phys. Rev. A 71, 055801 (2005).
  • Wootters (1998) W. K. Wootters, Philos. Trans. Royal Soc. A: Math. Phys. Eng. Sci. 356, 1717 (1998).
  • Jacobs (2014) K. Jacobs, “Metrology,” in Qauntum Measurement Theory and its Applications (Cambridge University Press, Cambridge, UK, 2014) Chap. 6, pp. 303–322.
  • Zwierz et al. (2012) M. Zwierz, C. A. Pérez-Delgado,  and P. Kok, Phys. Rev. A 85, 042112 (2012).
  • Bondurant and Shapiro (1984) R. S. Bondurant and J. H. Shapiro, Phys. Rev. D 30, 2548 (1984).
  • Wiseman et al. (2009) H. M. Wiseman, D. W. Berry, S. D. Bartlett, B. L. Higgins,  and G. J. Pryde, IEEE J. Sel. Top. Quantum Electron. 15, 1661 (2009).
  • Hentschel and Sanders (2011a) A. Hentschel and B. C. Sanders, Phys. Rev. Lett. 107, 233601 (2011a).
  • Armstrong (1966) J. A. Armstrong, J. Opt. Soc. Am. 56, 1024 (1966).
  • Hariharan and Sanders (1996) P. Hariharan and B. C. Sanders, in Progress in Optics, Vol. XXXVI, edited by E. Wolf (Elsevier, Amsterdam, Netherlands, 1996) Chap. 2, pp. 49–128.
  • Hentschel and Sanders (2011b) A. Hentschel and B. C. Sanders, J. Phys. A: Math. Theor. 44, 115301 (2011b).
  • Summy and Pegg (1990) G. Summy and D. Pegg, Opt. Commun. 77, 75 (1990).
  • Mishchenko (2014) M. I. Mishchenko, “Wigner d-functions,” in Electromagnetic Scattering by Particles and Particle Groups: An Introduction (Cambridge University Press, 2014) Chap. Appendix F, pp. 385–389.
  • Sanders and Milburn (1995) B. C. Sanders and G. J. Milburn, Phys. Rev. Lett. 75, 2944 (1995).
  • Riley et al. (2006a) K. F. Riley, M. P. Hobson,  and B. S. J., “Probability,”  (Cambridge University Press, Cambridge, UK, 2006) Chap. 30, pp. 1119–1220, 3rd ed.
  • Leigh (2004) J. R. Leigh, “Mathematical modelling,” in Control Theory (2nd Edition) (Institution of Engineering and Technology, London, UK, 2004) Chap. 6, pp. 61–81, 2nd ed.
  • Wang et al. (2016) S. Wang, J. M. Simkoff, M. Baldea, L. H. Chiang, I. Castillo, R. Bindlish,  and D. B. Stanley, in 11th IFAC Symposium on Dynamics and Control of Process Systems Including Biosystems 2016, Vol. 49 (Elsevier, Amsterdam, Netherlands, 2016) pp. 25–30.
  • Hou and Wang (2013) Z.-S. Hou and Z. Wang, Information Sciences 235, 3 (2013).
  • Sutton and Barto (2017) R. S. Sutton and A. G. Barto, “Introduction,” in Reinforcement Learning: An Introduction

    , Adaptive Computation and Machine Learning (MIT, Massachusetts, 2017) Chap. 1, pp. 3–24, 2nd ed.

  • Sutton et al. (1992) R. S. Sutton, A. G. Barto,  and R. J. Williams, IEEE Control Systems 12, 19 (1992).
  • Degris et al. (2012) T. Degris, P. M. Pilarski,  and R. S. Sutton, in 2012 American Control Conference (ACC), IEEE (IEEE, Montreal, Canada, 2012) pp. 2177–2182.
  • Severini (2005b)

    T. A. Severini, “Moments and cumulants,” in 

    Elements of Distribution Theory, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2005) Chap. 4, pp. 94–131.
  • Kirkup (2012) L. Kirkup, “Data distributions i,” in Data Analysis for Physical Scientists: Featuring Excel® (Cambridge University Press, Cambridge, UK, 2012) Chap. 3, pp. 90–145, 2nd ed.
  • Boncelet (2005) C. Boncelet, in Handbook of Image and Video Processing, edited by A. Bovik (Elsevier, Amsterdam, Netherlands, 2005) pp. 397–409.
  • Riley et al. (2006b) K. F. Riley, M. P. Hobson,  and B. S. J., “Special functions,”  (Cambridge University Press, Cambridge, UK, 2006) Chap. 18, pp. 577–647, 3rd ed.
  • Kish et al. (2015) E. A. Kish, C.-G. Granqvist, A. Dér,  and L. B. Kish, Cogn. Neurodyn. 9, 459 (2015).
  • Kai et al. (1987) S. Kai, S. Higaki, M. Imasaki,  and H. Furukawa, Phys. Rev. A 35, 374 (1987).
  • Montgomery et al. (2012a) D. C. Montgomery, E. A. Peck,  and G. G. Vining, “Introduction,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 1, pp. 1–11, 5th ed.
  • Chatterjee and Simonoff (2013a) S. Chatterjee and J. S. Simonoff, “Model building,” in Handbook of Regression Analysis (John Wiley & Sons, Hoboken, NJ, 2013) Chap. 2, pp. 23–52.
  • Chatterjee and Hadi (2012) S. Chatterjee and A. S. Hadi, “Introduction,” in Regression Analysis by Example, Wiley Series in Probablity and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 1, pp. 1–24, 5th ed.
  • Montgomery et al. (2012b) D. C. Montgomery, E. A. Peck,  and G. G. Vining, “Simple linear regression,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 2, pp. 12–66, 5th ed.
  • Montgomery et al. (2012c) D. C. Montgomery, E. A. Peck,  and G. G. Vining, “Variable selection and model building,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 10, pp. 327–371, 5th ed.
  • Chatterjee and Simonoff (2013b) S. Chatterjee and J. S. Simonoff, “Multiple linear regression,” in Handbook of Regression Analysis (John Wiley & Sons, Hoboken, NJ, 2013) Chap. 1, pp. 3–22.
  • Palittapongarnpim et al. (2017b) P. Palittapongarnpim, P. Wittek,  and B. C. Sanders, in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE (IEEE Systems, Man, and Cybernetics Society, Banff, AB, 2017) pp. 294–299.
  • Jekel (2017) C. Jekel, “Fitting a piecewise linear function to data,” (2017).
  • Khaneja et al. (2005) N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbrüggen,  and S. J. Glaser, J. Magn. Reson. 172, 296 (2005).