I Introduction
Quantumenhanced metrology (QEM) employs a quantum state of particles as a resource to estimate an unknown parameter with the goal of attaining imprecision
(1) 
that asymptotically surpasses the standard quantum limit (SQL) Giovannetti et al. (2004, 2011) but saturates at or below the Heisenberg limit (HL) Braunstein et al. (1996); Tóth and Apellaniz (2014). Our expression (1) differs from usual proportionality expressions in the literature, e.g., by explicitly recognizing the relevance of lowerorder terms Tóth and Apellaniz (2014) through the use of the bigO notation Vrajitoru and Knight (2014a).
QEM is vital for highprecision applications, such as gravitational wave detection Hollenhorst (1979); Caves et al. (1980); Caves (1981), atomic clocks Bollinger et al. (1996); Borregaard and Sørensen (2013), and magnetometry Tóth and Apellaniz (2014); Danilin et al. (2018) whose systems are operating at the limit of their power tolerance. Some schemes consider ideal measurements that typically involve measuring multiple particles simultaneously Zhang and Duan (2014); Matthews et al. (2016), whereas adaptive QEM (AQEM) focuses on singleparticle measurements augmented by feedback such that the SQL is beat and the HL is approached Wiseman (1995); Armen et al. (2002); Berry and Wiseman (2000).
AQEM Performance critically depends on policy choice Rosolia et al. (2018), which can be obtained by optimizing a known mathematical model Roy et al. (2015); Wiseman and Killip (1997, 1998) or by using reinforcementlearning algorithms Hentschel and Sanders (2010); Lovett et al. (2013); Palittapongarnpim et al. (2017a). Whereas policies from these methods are resistant to known noise models Roy et al. (2015), whether they are robust against unknown noise is yet unstudied but critical property of a QEM scheme as noise can destroy the entanglement advantage and restore the SQL Escher et al. (2011); DemkowiczDobrzanski et al. (2012). Our aim is to test the robustness of AQEM policies in the presence of noise with unknown properties.
Our test focuses on adaptive interferometric phase estimation, whose policies have been devised using Bayesian techniques Berry and Wiseman (2000); Berry et al. (2001) and by reinforcement learning Hentschel and Sanders (2010); Lovett et al. (2013); Palittapongarnpim et al. (2017a)
. The Bayesian technique computes feedback based on a trusted, noiseless quantum model, whereas reinforcement learning devises the AQEM policies for feedback based on trial and error and is ignorant of the quantumdynamical nature but employs heuristics to shrink the search space. Here both methods are applied to adaptive phase estimation including phase noise, which could arise from pathlength fluctuation in the interferometer
Bobroff (1987); Sinclair et al. (2014).Typically, noise is assumed to be normal as a result of the central limit theorem
Severini (2005a). Periodicity of phase makes the normal distribution problematic unless the noise is small compared to radians, which we assume here; technically, we would use the wrappedup normal distribution Breitenberger (1963). As our aim is to test robustness for unknown noise, we consider three other noise distributions for our test: random telegraph Newman (1968), skewnormal Azzalini and Capitanio (2014) and lognormal Limpert et al. (2001) noise. The random telegraph noise simulates a discrete noise process. Skewnormal and lognormal distributions represent asymmetric noise, which serve as distinct generalizations of the normal distribution. Both distributions are used to simulate noise in detectors and electronics Pourahmadi (2007); Jouini (2011); Rezaie and Eidsvik (2014).For AQEM, we seek an efficient procedure that beats the SQL, and we choose the policy that requires the least resource to run. We assess the policygenerating procedure according to the complexity of its time cost Vrajitoru and Knight (2014b), which is evaluated by the scaling in the number of operations with the number of particles
. Here we consider two policydesign procedure, namely, a reinforcementlearning algorithm and a method based on Bayesian inference, resulting one policy designed by each method. To determine which policy is superior, we compare the complexity in space and time cost
Vrajitoru and Knight (2014a). Thus, we are able to assess and compare the costs for generating policies and determine the best policy.Through our analysis, we find that both Bayesianfeedback and reinforcementlearning policies are robust in the face of unknown noise. Specifically, the Bayesian method yields AQEM that approaches HL and outperforms reinforcementlearning policies for most noise models. This performance superiority is due to the Bayesian method memorizing the measurement history through complete knowledge of the quantum state. Storing the entire model in the computer yields better scaling but leads to higher space and operational time costs compared to the reinforcementlearning policy.
Disparity between the space costs suggest that the two approaches use different amounts of information to execute the feedback control, and a fair comparison should be made instead when they are using the same space cost. In principle, the reinforcementlearning policy can be generalized to attain the HL as long as the design cost, which scales in highpolynomial, is practical.
Ii Background
In this section, we present essential background knowledge for assessing robustness of adaptive quantumenhanced phase estimation. In this subsection, we cover four key notions as subsections. In §II.1 we discuss QEM including AQEM. Subsequently, in §II.2, we address a particular case of AQEM, namely, adaptive phase estimation. As noise is an important in practical AQEM, in §II.3 we discuss noise models that we use to evaluate policy robustness. Finally, in §II.4
, we review regression analysis and model selection techniques that are required for analyzing whether the SQL has been surpassed.
ii.1 Quantumenhanced metrology
In this subsection, we explain QEM strategies, mainly collective nonadaptive Giovannetti et al. (2011) and adaptive measurements Palittpongarnpim et al. (2016), and lower imprecision bounds attained by using the classical and quantum resources. Specifically, we focus on strategies that employ finite input states. Whereas the collective nonadaptive strategy is useful to calculate lower bounds for imprecision, adaptive strategies offer the simpler alternative of individualparticle measurements that can achieve imprecision scaling close to these theoretical bounds. Here we focus on singleparameter estimation; multiparameters QEM Helstrom (1969); Humphreys et al. (2013); Yue et al. (2014) would be the subject of future study.
ii.1.1 Nonadaptive and adaptive strategies
Here we describe collective nonadaptive and adaptive QEM strategies, which are two of many types of QEM schemes. Whereas we focus on these two techniques, there are others, such as the sequential technique Higgins et al. (2007); DemkowiczDobrzański and Maccone (2014) and the ancillaassisted techniques Hotta et al. (2005); Sbroscia et al. (2018), which we do not cover.
Collective strategy.
A collective nonadaptive scheme utilizes level (typically, for standard twolevel atoms or twopath interferometry) particles prepared in a collective state
(2) 
which is the space of positivedefinite, traceclass, selfadjoint linear operators acting on a tensor product of
copies of a dimensional Hilbert space Landsman (2017). The system, upon which metrology is performed, is represented as a quantum channel (completelypositive tracepreserving map) , acting on any , with the singleunknown parameter of the channel Holevo (2012). In the special case of an isolated system without noise, decoherence or loss, the channel is represented by a unitary transformation Yurke et al. (1986)(3) 
for a unitary operator acting on .
After the particles exit the system, they are measured and this measurement is described a positiveoperatorvalued measures Hayashi (2006); Wiseman and Milburn (2009), which are positive semidefinite operators
(4) 
assuming the measurement outcomes
is a finite set. This outcome is random with probability
(5) 
The measurement is repeated multiple times to sample the distribution (5) sufficiently well to get good estimates, and then is then inferred from these samples.
Bundle and individualparticle measurement.
Instead of collective measurement, we can consider measuring subsets of particles, which we call bundles, and, at the extreme limit, which is of interest here, the individualmeasurement case of measuring a single particle at a time. Mathematically, we split the particles into bundles of particles where Palittpongarnpim et al. (2016) so the Hilbert space can be expressed as
(6) 
In this case, both and act on . For localized measurements on each bundle, the POVM is
(7) 
with outcomes from this tensorproduct POVM being concatenations of length strings of dimensional digits,
(8) 
where
(9) 
measured form the bundle.
In one extreme case, each bundle contains only one particle, which leads to and . The string of outcomes becomes
(10) 
The POVM is
(11) 
which is a tensor product of qudit POVMs.
Adaptive strategy.
The adaptive strategy involves incorporating quantum feedback control James and Nurdin (2015) such that the system operation depends on both the unknown parameter and a control parameter
, for some degree of freedom, on the
bundle. We assume that incorporating a control preserves the system acting as a channel and thus write the channel acting on the bundle as . Measurement of the bundle leads to an update of the control parameter to for the next bundle.The controlparameter update is determined by a policy
(15) 
which uses the string of outcomes and the most recent control parameter to obtain the next control parameter . This procedure continues until reaching the final particle, i.e., the particle, at which point the estimate of the unknown parameter is
(16) 
Therefore, the adaptive strategy can be used for singleshot measurement; i.e., inferring from one instance of the measurement procedure.
ii.1.2 Imprecision limits
Imprecision of the estimate (16) is denoted (1). Assuming the measurement is optimal and that the quantum channel is noiseless Caves et al. (1980); Schreppler et al. (2014), imprecision lower bounds are calculated for classical and quantum resources. These lower bounds are the SQL and HL, respectively. In a noisy system, which pertains in practice, a QEM scheme is unlikely to saturate the bound. Despite the presence of noise, the SQL is still the bound that must be surpassed to claim QEM, which requires using quantum resources. Here we review these limits as benchmarks for both the robustness test and to compare AQEM policies.
Standard quantum limit.
The SQL is the imprecision lower bound if classical resources are used, which means that the input state (12) is separable, i.e., unentangled Kołodyński and DemkowiczDobrzański (2013). The simplest case of such a separable state is a tensor product of independent particles Giovannetti et al. (2004),
(17) 
Interacting this state to the quantum channel leads to the output state
(18) 
Measuring this state (18) according to the POVM (13
) leads to output governed by the probability distribution,
(19) 
which is independent and identically distributed (iid) for . Calculating the imprecision, such as through the central limit theorem, leads to because of this iid condition Maccone (2013). The scaling also holds for the imprecision lower bound, which is calculated from (18) using the CramérRao lower bound Kay (1993), and is irrespective of the quantum channel.
Heisenberg limit.
If quantum resources are employed, e.g., squeezing Braunstein (2005) or entanglement Wootters (1998), the SQL can be surpassed Caves (1981); Tóth and Apellaniz (2014). The lower bound to using the quantum resource can be computed from the quantum version of the CramérRao lower bound Helstrom (1969), which depends on the input state and the quantum channel Jacobs (2014). Therefore, unlike the SQL, the HL is specific to the QEM scheme Zwierz et al. (2012). In the case of interferometric phase estimation, the HL is known to be Bondurant and Shapiro (1984), although this limit can only be attained through the use of optimal measurement. As an optimal POVM could be infeasible, adaptive phaseestimation schemes provides an attractive alternative to achieving close to this lower bound Wiseman et al. (2009).
ii.2 Adaptive phase estimation
Phase estimation underlies many QEM applications Caves (1981); Yurke et al. (1986); Giovannetti et al. (2004) and thus is widely used for devising quantumenhanced techniques, including several AQEM schemes Wiseman et al. (2009); Hentschel and Sanders (2010); Roy et al. (2015). Here we explain the interferometric adaptive phaseestimation scheme controlled by Bayesian feedback Berry and Wiseman (2000); Berry et al. (2001) or reinforcementlearning policies Hentschel and Sanders (2011a); Lovett et al. (2013); Palittapongarnpim et al. (2017a), which we compare in terms of robustness and resource consumption for control.
ii.2.1 Adaptive interferometric phaseestimation procedure
One method of estimating phase is to use an interferometer, which infers phase shifts from the interference between two or more modes Armstrong (1966). In particular, we use an adaptive phaseestimation scheme based on a MachZehnder interferometer, which has two modes and therefore we are looking at the case of representing the modes. The mathematics of MachZehnder interferometry applies to other forms of SU(2) interferometry, such as Ramsey, Sagnac and Michelson interferometry Yurke et al. (1986); Hariharan and Sanders (1996); Giovannetti et al. (2004). In this subsubsection, we present the input state, adaptive channel, detection, feedback, inference and imprecision.
Input state.
For nonadaptive quantum interferometry with collective measurement, the unitary interferometric transformation is in the Lie group SU with irrep (Casimirinvariant label) . For adaptive quantum interferometry or individual measurements, the interferometric unitary transformation is SU for particles and two paths. However, the two descriptions converge if the input state is permutationally symmetric; technically, SchurWeyl duality dictates that the applicable transformation is SU(2) with irrep Hentschel and Sanders (2011b).
Notationally, modes are labelled by
(20) 
which conveys which of the two paths, such as input or output port or intrainterferometric path, pertains. Thus, the state refers to photon being in path . The multiphoton basis is the tensorproduct state
(21) 
For the Hamming weight, i.e., sum of bits, of , the permutationallysymmetric basis is
(22) 
for the total number of particles in mode .
The sine state serves as a losstolerant symmetric state that minimizes phaseestimation imprecision Summy and Pegg (1990); Wiseman and Killip (1997, 1998), and is expressed as Berry and Wiseman (2000); Berry et al. (2001)
(23) 
for the Wigner function Mishchenko (2014). This state also has the advantage of being robust against photon loss Hentschel and Sanders (2011b), which is a desirable property for practical QEM and so is used in the adaptive phase estimation procedures.
Adaptive channel.
The particles in the sine state (II.2.1) are divided into singleparticles bundles ( case), each of which passes through the MachZehnder interferometer. For a noiseless interferometry, the quantum channel for one photon is
(24) 
for a Pauli matrix Sanders and Milburn (1995). Therefore, the channel is
(25) 
acting on the state space (12).
In a physical implementation of an interferometer, mechanical disturbances, airpressure changes and the thermal fluctuations induce opticalpath fluctuations. These effects randomize the phase difference
(26) 
according to prior distribution . The quantum channel is thus Hentschel and Sanders (2011a)
(27) 
where is the state after the photon is measured.
Detection, feedback, and inference.
After the photon passes through the interferometer, the photon is detected by one of the singlephoton detectors positioned outside the output ports. The information of about the exit port is , which is given to a controller. The controller then uses this information to compute from the policy before the next photon arrives. The procedure of simulating the injection of the next photon from the sine state (II.2.1) followed by action of the channel (25) and then measurement at the output ports is repeated until all photons are consumed, and the estimate is inferred from , assuming no loss of photons.
Imprecision
Imprecision of the estimate (1
) is related to the Holevo variance
Wiseman and Killip (1997)(28) 
by the sharpness function
(29) 
which quantifies the width of a distribution over a periodic variable. The sharpness (29), hence the Holevo variance (28), is estimated by repeatedly simulating times Hentschel and Sanders (2010) for uniformly randomly chosen in each run with the unknown (noiseless) interferometric phase shift. In the simulation the mode (most frequent value) of the unimodal prior distribution . The result (29) is itself sampled from a distribution with a mean
(30) 
Consequently, the Holevo variance (28), and thus the imprecision of the estimate, is only estimated through this procedure.
ii.2.2 Policy generation
Whether the adaptive estimation scheme is capable of breaking the SQL and reaching HL depends on the feedback policy; in general, the feedback policy does not come close to the HL although can surpass the SQL, which is our success criterion. We compare two set of policies, one designed based on Bayes’s theorem Berry and Wiseman (2000) and the other by reinforcement learning Hentschel and Sanders (2010); Palittapongarnpim et al. (2017a). The most significant difference between these two methods is that the Bayesian feedback computes based on a trusted model of how the quantum state evolves, whereas the reinforcementlearning algorithm here uses only the measurement outcomes to decide the value of . A modelfree approach that is used in the reinforcementlearning algorithm should, in principle, be robust to changes in the estimation scheme although whether the policies can preform better than the modelbased approach is unknown.
Bayesian feedback.
The idea behind the Bayesian feedback is that every outcome from the measurement improves our knowledge of the the value of Berry and Wiseman (2000); Berry et al. (2001). This is mathematically captured in Bayes’s theorem Riley et al. (2006a), where the initial knowledge of is represented by a uniform prior, indicating that can be any value in with equally probability.
For each particle exiting the interferometer, the probability for the particle being detected in output port can be computed by assuming a perfect input state and known quantum dynamics such as the unitary evolution (25). The prior for is then updated using Bayes’s theorem. The squared width of this prior is quantified by (28) in each measurement step; the optimal that minimizes width for the next particle is then computed from the model.
Although the Bayesian method gets close to the HL Berry et al. (2001), the problem with Bayesian inference subject to a trusted model is that the policy might not be robust to variations of input state and to noise in the system. This mismatch is known to be a major concern in designing controllers Leigh (2004); Wang et al. (2016) and one way to address this concern is to adopt a designandfeedback approach that does not use models but the data from the physical setup Hou and Wang (2013).
Reinforcement learning.
Reinforcement learning is used to devise a policy by trial and error Sutton and Barto (2017); Sutton et al. (1992), and the learning algorithm iteratively optimizes by testing the policy based on evaluating the outcome of the specified control task. For noisy adaptive phase estimation, policy evaluation is based on average sharpness (30) Hentschel and Sanders (2011a), which averages out noise. In practice, due to the high cost of sampling (30) very few runs are performed to obtain .
Reinforcement learning does not employ a model but rather uses the control outcomes to learn Degris et al. (2012); however, we currently train based on simulations that employ a model, but training could and should ultimately be performed under conditions of deployment such as in the laboratory or in the field. For reinforcement learning, feedback is assumed to follow a logarithmicsearch heuristic Markovian update rule Hentschel and Sanders (2010)
(31) 
with the phaseadjustment vector
(32) 
optimized during the training stage by a scalable, noiseresistant reinforcementlearning algorithm such that average sharpness (30) is maximized Palittapongarnpim et al. (2017a). This update rule (31) corresponds to turning the “phase knob” up or down by a fixed amount , after the photon, subject only to the previous outcome and ignoring the full measurement history.
Reinforcement learning incurs a time cost to generate a policy . The time cost for generating the policy is quantified by loop analysis of the learning algorithm, assuming the algorithm is run on a single processor Lovett et al. (2013). The scaling of time cost with respect to the particle number determines the complexity, and in practice this scaling is polynomial so the degree of the polynomial convey the complexity for generating the policy.
ii.3 Models of phase noise
In this subsection, we explain the choices of the phasenoise model for the robustness test. This noise is simulated turning
into a random variable that has a unimodal probability distribution with the peak at
. The mode is assumed to be the unknown parameter to be estimated. For the test, follows one of these four distributions: normal, threestage random telegraph, skewnormal or lognormal distribution. We summarize the relationship between the noise parameters and the variance and skewness Severini (2005b) as we use both in selecting the parameters for the robustness test and the variance in particular to quantify the noise level.ii.3.1 Normaldistribution noise
Normaldistribution noise is important for testing robustness of the learning algorithm because the normal distribution is especially prominent due to the centrallimit theorem, which states that the average of a random variable has a normal distribution Severini (2005a). Due to the prevalence of the normal distribution, assuming normaldistribution noise model is common Kirkup (2012). The normal distribution
(33) 
is parametrized by the mean
. As this distribution is symmetric, skewness is identically zero and thus the mode is at , and the variance is .In our simulations, we set so the only free parameter is , which is bounded above by as otherwise the width would exceed the domain of . This value of is, of course, too high to expect any measurement scheme to operate with reasonable imprecision, and so the robustness test is conducted lower .
ii.3.2 Random telegraph noise
Random telegraph noise Newman (1968) is a discrete distribution that, for each time step, randomly switches between two values, one being the correct and the other an erroneous value. Whereas this noise is most relevant to digital electronics as it simulate a bitflip error, it can be use to simulate other digitized noise, such as saltandpepper noise in image processing Boncelet (2005).
We modify twostage random telegraph noise to have three stages,
(34) 
The probability of switching to an erroneous value is , and is the distance between the true and erroneous values leading to
(35) 
with the last relation following from the symmetry of the distribution.
Unimodality of the distribution implies that . Furthemore, to avoid overlap with another distribution over the phase domain. To comply with both constraints and being able to raise the noise level to at least so the result can be compared to that of other distributions, we fix for the test and vary only .
ii.3.3 Skewnormaldistribution noise
The skewnormal distribution Azzalini and Capitanio (2014) is modified from a normal distribution by multiplying with a function whose skewness parameter is . Skewnormal noise is a class of noise that includes normaldistribution noise as a limiting case. Although this distribution is not widely used as noise, it arises in simulations of noise for filters and detectors Pourahmadi (2007); Rezaie and Eidsvik (2014).
The skewnormal distribution is
(36) 
for the error function Riley et al. (2006b). Skewness of the distribution is
(37) 
and the variance is
(38) 
The mode, however, does not have a closed form although it remains close to as increases. For the simulation, we assume the mode is .
ii.3.4 Lognormaldistribution noise
Lognormal Limpert et al. (2001) noise has a heavytailed skewed distribution that provides another approach to generalizing the normal distribution and is employed in the study of networks Kish et al. (2015); Kai et al. (1987) and electronics Jouini (2011). In this case, the logarithm of the random variable is said to have a normal distribution, leading to the distribution
(39) 
with mode and variance
(40) 
respectively, and skewness
As this distribution is defined for , we first generate a random number within the compact phase domain given and and then apply the shift
(41) 
so that the mode of the distribution is centred at (40).
ii.4 Regression analysis
The imprecision and are asymptotically powerlaw related (1). However, when the system is noisy, this relationship fails for low , with the actual bound on depending on the noise model. We employ regression analysis to select the subset of at high that scales as and estimate the corresponding by building piecewise functions and selecting the best candidate to represent the data. In this subsection, we explain our regressionanalysis procedure for fitting a model given a set of data.
ii.4.1 Fitting the model
Regression analysis aims to determine the mathematical relationship between dependent ( here) and independent variables (here ) Montgomery et al. (2012a). The process of building this mathematical model begins with selecting a function based on the knowledge of the mechanism and observations of the trends Chatterjee and Simonoff (2013a). The function is only a best guess as the discerned trend could be subjective.
After a function is selected, the function is then fitted to the data by finding the parameters that minimize the error between the predicted and the data Chatterjee and Hadi (2012). The method we employ is the leastsquaress estimation Montgomery et al. (2012b)
, used in linear regression to calculate the variables by constraining the gradients to zero and solving the resulting system of linear equations. We choose this method as we fit linear and piecewise linear equations to loglog plot of
and .ii.4.2 Consistency of the fit
As the fitted function is only an educated guess, the fitting result must be examined for inconsistencies with respect to the model’s assumptions Chatterjee and Hadi (2012). An alternative function can then be proposed, fitted, and compared to the previous function in order to find one that best represents the data. Deciding on the best model from the set is done using statistical criteria that either estimate the goodness of the fit to the data or between two models fitted to the same data Montgomery et al. (2012c). The model that is consistently shown to fit well according to each of the criteria is then selected to represent the data.
Common linearregression criteria include

the coefficient of determination
(42) adjusted to
(43) with the number of data points in the fit.

the corrected Akaike Information Criteria , which quantities information lost due to the discrepancy between the model function and the true function , and is ‘corrected’ to avoid overfitting,

the test, which assesses a full model (maximum
), as the null hypothesis, vs a reduced model (reduction from the full model) as the alternative hypothesis
Chatterjee and Simonoff (2013b, a), and 
Mallows’s Chatterjee and Simonoff (2013a); Montgomery et al. (2012c) (but we use rather than the traditional for the number of parameters), which estimates the meansquare prediction error Montgomery et al. (2012c) to compare a reduced model to the full model, where the reduced model with the smallest close to is chosen.
Each of these criteria is designed to penalize functions with many parameters to avoid overfitting the data Chatterjee and Simonoff (2013a).
Iii Approach
In this section, we devise a test to determine whether quantumenhanced precision is feasible in the presence of unknown noise. We then assess whether powerlaw scaling of phase imprecision vs particle number is valid asymptotically and establish a method to determine this power . Finally, we define the resource for generating and implementing the control policies in terms of the scaling of the space and time cost with .
iii.1 Robustness test
The robustness of AQEM policies is determined by testing the policies in the presence of noise whose model is not recognized by the policies and the method that generates the policies, although reinforcementlearning policies are learned in trainings that include the noise. Here we define the test for adaptive phase estimation, including phase noise from §II.3. We specify the domain of for simulating the phase estimation schemes to obtain in noisy conditions. The noise parameters are variance and skewness (§II.3), but here we fix for the asymptotic distributions, and we obtain the robustnesstest threshold in terms of , which is the maximum for each noise model such that the SQL is violated.
Varying .
To ascertain the asymptotic value for , we simulate adaptive phase estimation for
(44) 
as computed from this domain is sufficient to show powerlaw relationship at high . Furthermore, increasing further requires changing doubleprecision arithmetic to quadrupleprecision arithmetic to generate and manipulate the sine state without rounding error. Consequently, this increase in precision leads to a fifteenfold increase in runtime at , which is a large expense for generating a single data point. Therefore, we do not attempt to verify the robustness beyond this 100 particles.
Skewness.
We fix skewness to a single value for all runs and only vary because is the dominant term in our noise models and has a small effect Palittapongarnpim et al. (2017b). We fix the skewness for the asymmetric distribution to
(45) 
which is sufficiently large to distinguish between the various noise models; otherwise all noise looks Gaussian. This value of (45) corresponds to for the skewnormal distribution where we are able to observe its effect on when compared to symmetric noise distributions. This same level of skewness corresponds to in the lognormal distribution.
Robustness threshold.
Our policy is robust if the SQLbreaking condition is satisfied for all four noise models in §II.3. As discussed in ¶III.1, we fix , and we ignore higher cumulants; thus the policy robustness threshold is in terms of , i.e., the maximum such that holds for all four noise models. This optimization problem is hard so we adopt a simpler characterization procedure instead to get insight into the robustness threshold. Our approach is to run the simulations for for symmetric noise and for asymmetric noise, and we do not push beyond to keep below an imprecision width of . We use these data to determine whether AQEM policies pass the robustness test.
iii.2 Determining asymptotic powerlaw scaling
To ascertain the robustness of AQEM policies, the asymptotic is estimated from a subset of at sufficiently high , and determining this subset is done by fitting piecewise linear equations to a loglog plot of vs . In this subsection, we introduce five piecewise functions that are constructed from observations regarding the trend of vs . We then explain the method of finding the break points between segments in the piecewise function and fitting the functions to the data. Using the criteria §II.4, we create a majorityvote method for selecting the function that best represents the data and thus from the last segment of the fit is used to estimate the asymptotic scaling.
iii.2.1 Piecewise models.
The trend in vs differs under noisy conditions, and here we describe the trends we have observed that lead to piecewise linear functions. We construct five such functions, containing 1 to 3 segments that are then fitted to vs .
When the interferometer is noiseless, the relationship appears to be a powerlaw captured in a linear equation, although the acceptreject criterion in the reinforcementlearning algorithm can lead to a different for . Once the noise level becomes high, typically , the relationship does not appear to be linear for low
. Therefore, we include segmented models that fits linear interpolation to the data in the first segment.
Combining these observations, we construct five piecewiselinear models that can potentially represent as a piecewise function of . The models have 1 to 3 segments, each segment connected at the break points determined by the fitting methods. Three of the models are one, two, and three linear models, whereas the other two are a twosegment model, where the first segment (low ) is a linear interpolation, and a threesegment model where the first segment is a linear interpolation and the second and third segments are linear.
iii.2.2 Fitting method
The method for fitting the linear equations that we use is the leastsquares method Montgomery et al. (2012b). However, because the functions in §III.2.1 are segmented, we include a step to optimize the break points depending on the specific function.
The full model for the regression analysis is the threesegment linear function, which is fitted using linearsquare method and the segments determined by a heuristic global optimization algorithm Jekel (2017). The twosegment linear function is also fitted using the same leastsquare method although a bruteforce search is used to find the break point starting from .
The method for finding the break points for models with interpolation are different as the linear interpolation leads to a small residual. Thus, optimizing using the leastsquares method can lead to a single segment of linear interpolation. For this reason, we first find the stop point for the first segment by fixing the latter segments to a single linear line and search for break point that results in a large decrease in sum square error. As for the singlesegment linear model, we use a standard library to fit to the data.
iii.2.3 Fitting figures of merit and model selection
After the functions are fitted to , the criteria , , the value, and Mallows’s (§II.4.2) are calculated for each of the function and the fits are visually inspected. These criteria are used to select the function that best fits the data. Here we explain how the best function is chosen.
After the functions are fitted and the criteria are calculated, each fit is visually presented and inspected to ascertain that the the segmentation fits the pattern. If correction are unnecessary, the functions are then ranked for each of the criteria. Note that we do not perform the full test as we discovered that reduced models typically fail the test even though there is no discernible difference when compared to the full model. However, the value can still be used to quantify the difference between using the full and reduced model, so we use the value to rank the functions instead of conducting a passfail test.
After the functions are ranked, the function that is voted as best by most of the criteria is chosen to represent the data. In the case where the full model, the threesegment linear function, is voted according to the criteria, the value of from the last segment and the subset of where this value is computed is compared to the next alternative function to determine whether the function overfits the data. If from the two functions differs more than 0.001, then the full model is chosen; otherwise, the alternative function is chosen. The limit we use here is specified based on the precision used in this paper and can be changed based on the desired precision of .
iii.3 Resource complexity
To compare and select between policies and methods of generating policies, we determine the complexity of designing and implementing a policy using the loopanalysis method in algorithm analysis. We begin this subsection by explaining the time complexities of generating policies. We then explain what the controller does and the requisite resources, quantified by the space and time complexity for executing a policy with particles.
iii.3.1 Design complexity
When an optimization or a learning algorithm is used to generate a policy, there is a time cost associated with the use of the algorithm. The scaling of the upper bound of this cost is called the design complexity. Here we explain the assumptions behind the calculation of this complexity.
We assume that this task is performed on a simulation of the AQEM task, as is common practice in policy generation in quantum control Khaneja et al. (2005), and, therefore, the time cost includes the cost of simulating the AQEM task. We assume that only a single processor is used for the purpose of comparing policies generation methods, although this cost can be reduced by parallelizing the learning or the optimization on multiple CPUs.
The design complexity for the reinforcementlearning policies in §II.2.2 is shown to be through loop analysis Lovett et al. (2013), and this complexity does not change when noise is included. For policies that are devised through analytical optimization, such as the Bayesian feedback, this cost is zero as no algorithm is used.
iii.3.2 Controller complexities
In this subsection, we explain the resource complexity required to use AQEM policies, quantified by the scaling of the space and time costs with the number of particles Vrajitoru and Knight (2014a). We begin by explaining the connection between an AQEM policy and an algorithm by viewing the controller as a computer, allowing us to use the method of algorithm analysis to calculate the complexities Vrajitoru and Knight (2014b). We then define the space and time cost for implementing policies and how these costs are calculated.
Controller
The controller holds a policy (15) and uses this policy to execute decisions based on feedback from measurement of where the particles are detected. To execute this task the controller requires computer memory and sufficient time to affect the policy, which we use to determine the scaling of resource cost.
The controller is essentially a computer that receives input from detectors and transmits a control signal to an actuator that shifts the interferometric phase. In this perspective, the policy is represented as a computer algorithm expressed as a computer program. The candidate policy can be generated by various means including Bayesian feedback ¶II.2.2 and reinforcement learning ¶II.2.2. Space and time costs are discussed in the next two subsubsections.
Space complexity
We determine the upper bound for space cost, which is the worstcase amount of memory used by an algorithm reported as a big O function of the size of the problem Vrajitoru and Knight (2014a). As is executed by an algorithm, this worstcase, or maximumsize, memory corresponds to how much space is required to hold the critical information required to execute the feedback.
For , the computer’s space cost for memory depends on the type of policy. For Bayesian feedback, the size of the stored policy is , as shown in §IV.3 and specifically in Table 2, and the size of the stored policy for reinforcement learning is Lovett et al. (2013). This linear scaling for policies obtained by reinforcement learning is due to the generalizedlogarithmicsearch heuristic, leading to the size phaseadjustment vector (32). The information used by the policy are the policies parameters, where there are for an AQEM scheme that uses particles.
Time complexity
Time complexity is the scaling for the upper bound in time cost for implementing a single shot of AQEM. This cost is calculated by assuming the time a particle takes to pass through an interferometer is constant, mimicking the physical implementation of the control procedure, and use loop analysis, which counts the number of loops that perform operations, which we assume all take the same constant time Vrajitoru and Knight (2014b). For a shot of AQEM task using particles, is computed times, corresponding to each particle passing through the interferometer and being detected. For each particle, there can be loops nested in the computation of according to the policy . The complexity is reported as the scaling of this time cost with .
We determine the implementation cost for reinforcementlearning policies by recognizing that the update of according to Eq. (31) is constant in time. Therefore, the adaptive phase estimation procedure consisted of only one loop over the number of particles, and the implementation complexity is . Bayesian feedback, on the other hand, has nested loops for updating the quantum state, which is in time complexity for every computation of . Hence, the implementation complexity is .
Iv Results
In this section, we report results for the robustness test and the resources based on sampling from simulated adaptive phase estimation. We present and compare (28) obtained by reinforcement learning discussed in ¶II.2.2 and by Bayesian feedback discussed in ¶II.2.2. Our analysis considers all four types of noise discussed in §II.3. We report values of (1) from the regression procedure discussed in §III.2 and resource complexities discussed in §III.3 for both the reinforcementlearning policies and the Bayesian feedback.
iv.1 Variance vs number of particles
In this subsection, we present results for as a function of number of particles. Specifically, we present plots of vs from 4 to 100 particles, which is enough to determine scaling as discussed in §III.3. Both cases of using reinforcementlearning policies and Bayesian feedback are presented as loglog plots and compared to the SQL, with these plots obtained by computing from simulations of noiseless phase estimation using a product state following the notation of Eq. (22). The HL is generated based on the intercept of the SQL data using the scaling of to provide a benchmark.
iv.1.1 Reinforcement learning
Here we present the loglog plots of vs from adaptive phase estimations using reinforcementlearning policies, as shown in Fig. 5.
Subfigures 5(a–d) present when normaldistribution noise, random telegraph noise, skewnormal noise, lognormal noise are included, respectively.
Figure 5(a) also includes from noiseless interferometry. This locus appears as a straight line in the plot, indicating a powerlaw relationship between and . As the noise variance increases, this powerlaw relationship breaks into two parts, clearly visible in the vs plot from . This trend also appears at as the the model selection procedure III.2.3 selects the twosegment model for this data set. The observation that the powerlaw relationship fails when noise is included is also evident in Figs. 5(b–d). In these cases, the plots are fit to two or threesegment linear equations as appears to have a bump at low as increases.
The increase in phase noise also results in an increase in the intercepts of powerlaw lines; however, the rate of change appears to depend on the noise model. The difference can be seen in Fig. 5(a–b), both include symmetric noise distributions but with different spacing of the intercepts. The same observation holds for Fig. 5(c–d), which are from asymmetric distributions. Comparing the four plots shows that the intercepts appear to increase slower for asymmetric distributions than the symmetric distributions, being close to 1 for in the former and for the latter.
iv.1.2 Bayesian feedback
Loglog plots of as a function of , shown in Fig. 10, are computed from simulations of adaptive phase estimation controlled by Bayesian feedback.
Figures 10(a–d) present in the presence of normaldistribution noise, random telegraph noise, skewnormal noise, and lognormal noise respectively.
Similar to Fig. 5, the trend of vs in Fig. 10 shows that the powerlaw relationship also breaks into parts. Instead of a bump, from Bayesian feedback exhibits noise for low . For this reason, the modelselection procedure III.2.3 favours model with linear interpolation in the first segment. The subsequent segment appears straight in the loglog plots, although some, such as the in Fig. 10(c), shows a break into two linear segments.
The intercepts of the vs plots increase with the increase of , and the observation of the changes are similar to when reinforcementlearning policies are used (§IV.1.1). The asymmetric noise show slow increase in intercept when compared to symmetric noise, and the rate of change depends on the noise model.
iv.2 Powerlaw scaling
In this subsection, we present values of , summarized in Table. 1, that are estimated by fitting plots in §IV.1.
SQL  1  1  
HL  2  2  
No noise  1.459  0.9998  1.957  0.9993  
1  0  0.9999  0.9985  
Normal  2  0  0.9999  –  –  
3  0  0.9992  0.9997  
4  0  –  –  0.9948  
1  0  0.9999  0.9991  
Random telegraph  2  0  0.9997  0.9967  
3  0  0.9993  0.9892  
1  0.8509  0.9999  –  –  
Skewnormal  3  0.8509  0.9999  0.9987  
5  0.8509  0.9998  0.9927  
7  0.8509  0.9996  0.9964  
1  0.8509  0.9999  –  –  
Lognormal  3  0.8509  0.9998  0.9919  
5  0.8509  0.9997  0.9961  
7  0.8509  0.9994  0.7965 
These ’s are from the last segment of the selected piecewise linear models (§III.2.1), which changes with the increase in . We also include (43) to show the goodness of fit.
The powerlaw scaling for reinforcementlearning policy shows a decrease as the noise level increases, starting from the noiseless phase estimation at =1.459. The reinforcementlearning policies fail to deliver when for the symmetric noise distributions. This limit increases with asymmetric noise models to in lognormal noise. The skewnormal noise only shows a scaling at approaches the SQL but does not breach it at all.
Similar trends are observed for the Bayesian feedback. The scaling from noiseless interferometer closely approximates the HL at and approaches SQL when for normaldistribution noise. This limit drops to when random telegraph noise is included. This limit also appears at for lognormal noise, whereas the same noiselevel only lead to approaching SQL when skewnormal noise is present. These trend, aside from the case of normaldistributed noise, is the same as the trend for the reinforcementlearning policies.
The goodnessoffit for these fits are reported in term of , where indicates a perfect fit. The values of the goodness for delivered by reinforcementlearning policies and for Bayesian feedback except for when a lognormal noise of is present. Overall, the models chosen using the method in §III.2.3 provide good fits to the data and the reinforcementlearning policies always deliver fits with .
iv.3 Bounds on time and space costs
The result from the calculation of space and time complexities for both generating and implementing the reinforcementlearning policies and the Bayesian feedback is shown in Table 2.
Complexity  RL  BF 

Design time  –  
Policy space  
Implementation time 
Here we compare how these results.
The time complexity for generating policies, called here the design time, is of high polynomial when reinforcementlearning algorithm is used. Bayesian feedback, which is designed through an analytical process, incurs no time cost for the design . When the implementation time is compared, the time complexity of Bayesian feedback is two order above the reinforcementlearning policies. The space complexity, which shows the memory used to store feedback information, is also larger for Bayesian feedback than are reinforcementlearning policies by an polynomial degree, specifically, going from linear scaling in the reinforcementlearning policy to quadratic for Bayesian feedback.
V Discussion
In this section, we discuss the robustness and its threshold for AQEM policies and possible reasons behind the high level of noiseresistant. We explain the difference between the attained by reinforcementlearning policies and Bayesian feedback using the space complexity of the policies. We opt for policies and methods for generating policies based on the resource complexity of the generated policies.
v.1 Robustness of AQEM policies
In this subsection, we discuss robustness for the AQEM policies and the robustness threshold based on in Table 1. Here we discuss the robustness threshold and propose explanations for this high threshold.
Both the reinforcementlearning policies and the Bayesian feedback are able to deliver for all four noise models until when the scaling obtained in the present of random telegraph noise fails to exceed the SQL. For noise models that are asymmetric, quantumenhanced precision is observed up to at the skewness of . This high level of robustness is unexpected, and especially so for the Bayesian feedback as the dynamic of the interferometer no longer matches the noiseless assumption.
One possible reason behind this high robustness threshold is the sine state (II.2.1), which is already known to be robust against loss Hentschel and Sanders (2011b). The structure of the sine state may also contribute to robustness against phase noise as well, although AQEM policies also play a role in the robustness of the AQEM scheme.
The effect of the feedback policy is highlighted by the threshold for normaldistribution noise, where the threshold for reinforcementlearning policies is at as opposed to the Bayesian feedback at . This result, however, does not imply that robustness against noise of unknown distribution is improved as the thresholds for all other noise models are the same for both the reinforcementlearning policies and the Bayesian feedback.
v.2 Space cost and powerlaw scalings
Table 1 shows that the powerlaw scaling delivered by Bayesian feedback is consistently superior to those delivered by the reinforcementlearning policies before the robustness threshold is reached. Here we use the space complexity of the policies in Table 2 to explain the reason behind this difference.
Table 2 shows that the Bayesian feedback has a space cost that scales a polynomial degree higher than for the reinforcementlearning policies, which indicates that the Bayesian feedback utilizes more information and hence is more complex than the reinforcementlearning policies. By using a quantumstate model, Bayesian feedback effectively uses the history of measurement outcomes to determine instead of the current outcome , which is the approach used by the reinforcementlearning policies in (31). As such, reinforcementlearning policies are restricted by the generalizedlogarithmsearch strategy (§II.2.2) and so cannot deliver a value of that approaches the HL. Improvement of reinforcementlearning policies can be done by changing the update rule so that the policy uses a part of the measurement history.
v.3 Choosing an AQEM policy
In this subsection, we explain how the space and implementation time complexity (§III.3.2) can be used to decide between competing policies and method of generating the policies. In particular, we discuss choosing between the reinforcementlearning policies and the Bayesian feedback.
The consideration of the space and time complexity of the policies comes after ascertaining that the candidate policies are able to deliver the target performance. In the case of AQEM, the target is to attain , which both the reinforcementlearning policies and the Bayesian feedback are able to deliver. Both methods also have the same robustness threshold against phase noise of unknown distribution. Based on these comparison, both policies appears equally suitable.
When comparing space and implementation complexity (§IV.3), reinforcementlearning policies shows an advantage as the scaling of both costs are linearly bound, whereas the Bayesian feedback is quadratic in space complexity and cubic in time complexity. For this reason, we favour the reinforcementlearning policies for robust adaptive phase estimation.
We do not consider the design complexity, which is used in the generation of the policies, as the scaling of the cost can be improved by parallelizing the training. This cost may be of interest when the learning occurs in a physical setup where parallelizing is not possible and one shot of the experiment is expensive. In this case, there maybe an upper bound in number of experiments and hence time that can be invested in training a policy, and the design complexity can be used to infer the method that generate a policy within this bound.
Vi Conclusion
We have tested AQEM policy robustness based on Bayesian feedback and reinforcement learning and compared the resource complexities for implementing the policies. We find that both the reinforcementlearning policies and Bayesian feedback are robust against phase noise up to noise level of . Although the imprecision scaling delivered by Bayesian feedback is superior to the policies from reinforcement learning, the latter policies use less resource than does Bayesian feedback with respect to both space and time complexity.
Although we develop a robustness test and method of comparing policies based on resource complexity with AQEM procedures in mind, these ideas can be applied to other cases in QEM and quantum control. Robustness is a desirable property of any QEM schemes and the test presented here can be adapted to quantify robustness of nonadaptive procedures and investigate the role of the input state to the robustness of QEM generally. Quantifying the resource used by control policies can be used to show efficacy of the policies not only in AQEM but in other quantum control tasks, and the comparison of the resource complexity can be used to select a policy that is most efficient in accomplishing a task.
Acknowledgements.
The computational work is enabled by the support of WestGrid (www.westgrid.ca) through Compute Canada Calcul Canada (www.computecanada.ca) and by NSERC and AITF.References
 Giovannetti et al. (2004) V. Giovannetti, S. Lloyd, and L. Maccone, Science 306, 1330 (2004).
 Giovannetti et al. (2011) V. Giovannetti, S. Lloyd, and L. Maccone, Nat. Photon. 5, 222 (2011).
 Braunstein et al. (1996) S. L. Braunstein, C. M. Caves, and G. J. Milburn, Ann. Phys. (N. Y.) 247, 135 (1996).
 Tóth and Apellaniz (2014) G. Tóth and I. Apellaniz, J. Phys. A: Math. Theor. 47, 424006 (2014).
 Vrajitoru and Knight (2014a) D. Vrajitoru and W. Knight, “Introduction,” in Practical Analysis of Algorithms (Springer, Cham, Switzerland, 2014) Chap. 1, pp. 1–7.
 Hollenhorst (1979) J. N. Hollenhorst, Phys. Rev. D 19, 1669 (1979).
 Caves et al. (1980) C. M. Caves, K. S. Thorne, R. W. P. Drever, V. D. Sandberg, and M. Zimmermann, Rev. Mod. Phys. 52, 341 (1980).
 Caves (1981) C. M. Caves, Phys. Rev. D 23, 1693 (1981).
 Bollinger et al. (1996) J. J. . Bollinger, W. M. Itano, D. J. Wineland, and D. J. Heinzen, Phys. Rev. A 54, R4649 (1996).
 Borregaard and Sørensen (2013) J. Borregaard and A. S. Sørensen, Phys. Rev. Lett. 111, 090801 (2013).
 Danilin et al. (2018) S. Danilin, A. V. Lebedev, A. Vepsäläinen, G. B. Lesovik, G. Blatter, and G. S. Paraoanu, npj Quantum Inf. 4, 29 (2018).
 Zhang and Duan (2014) Z. Zhang and L. M. Duan, New J. Phys. 16, 103037 (2014).
 Matthews et al. (2016) J. C. F. Matthews, X.Q. Zhou, H. Cable, P. J. Shadbolt, D. J. Saunders, G. A. Durkin, G. J. Pryde, and J. L. O’Brien, Npj Quantum Inf. 2, 16023 (2016).
 Wiseman (1995) H. M. Wiseman, Phys. Rev. Lett. 75, 4587 (1995).
 Armen et al. (2002) M. A. Armen, J. K. Au, J. K. Stockton, A. C. Doherty, and H. Mabuchi, Phys. Rev. Lett. 89, 133602 (2002).
 Berry and Wiseman (2000) D. W. Berry and H. M. Wiseman, Phys. Rev. Lett. 85, 5098 (2000).
 Rosolia et al. (2018) U. Rosolia, X. Zhang, and F. Borrelli, Annual Review of Control, Robotics, and Autonomous Systems 1, 259 (2018).
 Roy et al. (2015) S. Roy, I. R. Petersen, and E. H. Huntington, New J. Phys. 17, 063020 (2015).
 Wiseman and Killip (1997) H. M. Wiseman and R. Killip, Phys. Rev. A 56, 944 (1997).
 Wiseman and Killip (1998) H. M. Wiseman and R. Killip, Phys. Rev. A 57, 2169 (1998).
 Hentschel and Sanders (2010) A. Hentschel and B. C. Sanders, Phys. Rev. Lett. 104, 063603 (2010).
 Lovett et al. (2013) N. B. Lovett, C. Crosnier, M. PerarnauLlobet, and B. C. Sanders, Phys. Rev. Lett. 110, 220501 (2013).
 Palittapongarnpim et al. (2017a) P. Palittapongarnpim, P. Wittek, E. Zahedinejad, S. Vedaie, and B. C. Sanders, Neurocomputing 268, 116 (2017a).
 Escher et al. (2011) B. M. Escher, R. L. de Matos Filho, and L. Davidovich, Nat. Phys. 7, 406 (2011).
 DemkowiczDobrzanski et al. (2012) R. DemkowiczDobrzanski, J. Kołodynski, and M. Guţă, Nat. Commun. 3, 1063 (2012).
 Berry et al. (2001) D. W. Berry, H. M. Wiseman, and J. K. Breslin, Phys. Rev. A 63, 053804 (2001).
 Bobroff (1987) N. Bobroff, Appl. Opt. 26, 2676 (1987).
 Sinclair et al. (2014) L. C. Sinclair, F. R. Giorgetta, W. C. Swann, E. Baumann, I. Coddington, and N. R. Newbury, Phys. Rev. A 89, 023805 (2014).
 Severini (2005a) T. A. Severini, “Central limit theorems,” in Elements of Distribution Theory, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2005) Chap. 12, pp. 365–399.
 Breitenberger (1963) E. Breitenberger, Biometrika 50, 81 (1963).
 Newman (1968) D. S. Newman, Ann. Math. Stat. 39, 890 (1968).
 Azzalini and Capitanio (2014) A. Azzalini and A. Capitanio, in The SkewNormal and Related Families, Institute of Mathematical Statistics Monograph, edited by D. R. Cox, A. Agresti, B. Hambly, S. Holmes, and X.L. Meng (Cambridge University Press, Cambridge, UK, 2014) Chap. 2, pp. 24 – 56.
 Limpert et al. (2001) E. Limpert, W. A. Stahel, and M. Abbt, BioScience 51, 341 (2001).
 Pourahmadi (2007) M. Pourahmadi, Commun. Stat. Theory Methods 36, 1803 (2007).
 Jouini (2011) W. Jouini, IEEE Signal Process. Lett. 18, 423 (2011).
 Rezaie and Eidsvik (2014) J. Rezaie and J. Eidsvik, Comput. Stat. Data Anal 75, 1 (2014).
 Vrajitoru and Knight (2014b) D. Vrajitoru and W. Knight, “Deterministic analysis of algorithms,” in Practical Analysis of Algorithms (Springer, Cham, Switzerland, 2014) Chap. 5, pp. 169–293.
 Palittpongarnpim et al. (2016) P. Palittpongarnpim, P. Wittek, and B. C. Sanders, in Proc. SPIE Quantum Communications and Quantum Imaging XIV, Vol. 9980 (SPIE, Bellingham, WA, 2016) pp. 99800H–99800H–11.
 Helstrom (1969) C. W. Helstrom, J. Stat. Phys. 1, 231 (1969).
 Humphreys et al. (2013) P. C. Humphreys, M. Barbieri, A. Datta, and I. A. Walmsley, Phys. Rev. Lett. 111, 070403 (2013).
 Yue et al. (2014) J.D. Yue, Y.R. Zhang, and H. Fan, Sci. Rep. 4, 5933 (2014).
 Higgins et al. (2007) B. L. Higgins, D. W. Berry, S. D. Bartlett, H. M. Wiseman, and G. J. Pryde, Nature 450, 393 (2007).
 DemkowiczDobrzański and Maccone (2014) R. DemkowiczDobrzański and L. Maccone, Phys. Rev. Lett. 113, 250801 (2014).
 Hotta et al. (2005) M. Hotta, T. Karasawa, and M. Ozawa, Phys. Rev. A 72, 052334 (2005).
 Sbroscia et al. (2018) M. Sbroscia, I. Gianani, L. Mancino, E. Roccia, Z. Huang, L. Maccone, C. Macchiavello, and M. Barbieri, Phys. Rev. A 97, 032305 (2018).
 Landsman (2017) K. Landsman, “Quantum physics on a general hilbert space,” in Foundations of Quantum Theory: From Classical Concepts to Operator Algebras (Springer International, Cham, Switzerland, 2017) Chap. 4, pp. 103–123.
 Holevo (2012) A. S. Holevo, in Quantum Systems, Channels, Information: A Mathematical Introduction, De Gruyter Studies in Mathematical Physics, Vol. 16 (Walter de Gruyter, Berlin, Germany, 2012) Chap. Quantum evolutions and channels, pp. 103–131.
 Yurke et al. (1986) B. Yurke, S. L. McCall, and J. R. Klauder, Phys. Rev. A 33, 4033 (1986).
 Hayashi (2006) M. Hayashi, “Mathematical formulation of quantum systems,” in Quantum Information: An Introduction (Springer, Berlin, Germany, 2006) pp. 9–25.
 Wiseman and Milburn (2009) H. M. Wiseman and G. J. Milburn, Quantum Measurement and Control (Cambridge University Press, Cambridge, MA, 2009).
 James and Nurdin (2015) M. R. James and H. I. Nurdin, in 2015 IEEE Conference on Control Applications (CCA), IEEE (IEEE CCA, Sydney, NSW, 2015) pp. 1–12.
 Schreppler et al. (2014) S. Schreppler, N. Spethmann, N. Brahms, T. Botter, M. Barrios, and D. M. StamperKurn, Science 344, 1486 (2014).
 Kołodyński and DemkowiczDobrzański (2013) J. Kołodyński and R. DemkowiczDobrzański, New J. Phys. 15, 073043 (2013).
 Maccone (2013) L. Maccone, Phys. Rev. A 88, 042109 (2013).
 Kay (1993) S. M. Kay, “CramerRao lower bound,” in Fundamentals of Statistical Signal Processing: Estimation Theory, edited by A. V. Oppenheimer (PrenticeHall, Upper Saddle River, NJ, 1993) Chap. 3.
 Braunstein (2005) S. L. Braunstein, Phys. Rev. A 71, 055801 (2005).
 Wootters (1998) W. K. Wootters, Philos. Trans. Royal Soc. A: Math. Phys. Eng. Sci. 356, 1717 (1998).
 Jacobs (2014) K. Jacobs, “Metrology,” in Qauntum Measurement Theory and its Applications (Cambridge University Press, Cambridge, UK, 2014) Chap. 6, pp. 303–322.
 Zwierz et al. (2012) M. Zwierz, C. A. PérezDelgado, and P. Kok, Phys. Rev. A 85, 042112 (2012).
 Bondurant and Shapiro (1984) R. S. Bondurant and J. H. Shapiro, Phys. Rev. D 30, 2548 (1984).
 Wiseman et al. (2009) H. M. Wiseman, D. W. Berry, S. D. Bartlett, B. L. Higgins, and G. J. Pryde, IEEE J. Sel. Top. Quantum Electron. 15, 1661 (2009).
 Hentschel and Sanders (2011a) A. Hentschel and B. C. Sanders, Phys. Rev. Lett. 107, 233601 (2011a).
 Armstrong (1966) J. A. Armstrong, J. Opt. Soc. Am. 56, 1024 (1966).
 Hariharan and Sanders (1996) P. Hariharan and B. C. Sanders, in Progress in Optics, Vol. XXXVI, edited by E. Wolf (Elsevier, Amsterdam, Netherlands, 1996) Chap. 2, pp. 49–128.
 Hentschel and Sanders (2011b) A. Hentschel and B. C. Sanders, J. Phys. A: Math. Theor. 44, 115301 (2011b).
 Summy and Pegg (1990) G. Summy and D. Pegg, Opt. Commun. 77, 75 (1990).
 Mishchenko (2014) M. I. Mishchenko, “Wigner dfunctions,” in Electromagnetic Scattering by Particles and Particle Groups: An Introduction (Cambridge University Press, 2014) Chap. Appendix F, pp. 385–389.
 Sanders and Milburn (1995) B. C. Sanders and G. J. Milburn, Phys. Rev. Lett. 75, 2944 (1995).
 Riley et al. (2006a) K. F. Riley, M. P. Hobson, and B. S. J., “Probability,” (Cambridge University Press, Cambridge, UK, 2006) Chap. 30, pp. 1119–1220, 3rd ed.
 Leigh (2004) J. R. Leigh, “Mathematical modelling,” in Control Theory (2nd Edition) (Institution of Engineering and Technology, London, UK, 2004) Chap. 6, pp. 61–81, 2nd ed.
 Wang et al. (2016) S. Wang, J. M. Simkoff, M. Baldea, L. H. Chiang, I. Castillo, R. Bindlish, and D. B. Stanley, in 11th IFAC Symposium on Dynamics and Control of Process Systems Including Biosystems 2016, Vol. 49 (Elsevier, Amsterdam, Netherlands, 2016) pp. 25–30.
 Hou and Wang (2013) Z.S. Hou and Z. Wang, Information Sciences 235, 3 (2013).

Sutton and Barto (2017)
R. S. Sutton and A. G. Barto, “Introduction,” in Reinforcement Learning: An Introduction
, Adaptive Computation and Machine Learning (MIT, Massachusetts, 2017) Chap. 1, pp. 3–24, 2nd ed.
 Sutton et al. (1992) R. S. Sutton, A. G. Barto, and R. J. Williams, IEEE Control Systems 12, 19 (1992).
 Degris et al. (2012) T. Degris, P. M. Pilarski, and R. S. Sutton, in 2012 American Control Conference (ACC), IEEE (IEEE, Montreal, Canada, 2012) pp. 2177–2182.

Severini (2005b)
T. A. Severini, “Moments and cumulants,” in
Elements of Distribution Theory, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2005) Chap. 4, pp. 94–131.  Kirkup (2012) L. Kirkup, “Data distributions i,” in Data Analysis for Physical Scientists: Featuring Excel® (Cambridge University Press, Cambridge, UK, 2012) Chap. 3, pp. 90–145, 2nd ed.
 Boncelet (2005) C. Boncelet, in Handbook of Image and Video Processing, edited by A. Bovik (Elsevier, Amsterdam, Netherlands, 2005) pp. 397–409.
 Riley et al. (2006b) K. F. Riley, M. P. Hobson, and B. S. J., “Special functions,” (Cambridge University Press, Cambridge, UK, 2006) Chap. 18, pp. 577–647, 3rd ed.
 Kish et al. (2015) E. A. Kish, C.G. Granqvist, A. Dér, and L. B. Kish, Cogn. Neurodyn. 9, 459 (2015).
 Kai et al. (1987) S. Kai, S. Higaki, M. Imasaki, and H. Furukawa, Phys. Rev. A 35, 374 (1987).
 Montgomery et al. (2012a) D. C. Montgomery, E. A. Peck, and G. G. Vining, “Introduction,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 1, pp. 1–11, 5th ed.
 Chatterjee and Simonoff (2013a) S. Chatterjee and J. S. Simonoff, “Model building,” in Handbook of Regression Analysis (John Wiley & Sons, Hoboken, NJ, 2013) Chap. 2, pp. 23–52.
 Chatterjee and Hadi (2012) S. Chatterjee and A. S. Hadi, “Introduction,” in Regression Analysis by Example, Wiley Series in Probablity and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 1, pp. 1–24, 5th ed.
 Montgomery et al. (2012b) D. C. Montgomery, E. A. Peck, and G. G. Vining, “Simple linear regression,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 2, pp. 12–66, 5th ed.
 Montgomery et al. (2012c) D. C. Montgomery, E. A. Peck, and G. G. Vining, “Variable selection and model building,” in Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics (John Wiley & Sons, Hoboken, NJ, 2012) Chap. 10, pp. 327–371, 5th ed.
 Chatterjee and Simonoff (2013b) S. Chatterjee and J. S. Simonoff, “Multiple linear regression,” in Handbook of Regression Analysis (John Wiley & Sons, Hoboken, NJ, 2013) Chap. 1, pp. 3–22.
 Palittapongarnpim et al. (2017b) P. Palittapongarnpim, P. Wittek, and B. C. Sanders, in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE (IEEE Systems, Man, and Cybernetics Society, Banff, AB, 2017) pp. 294–299.
 Jekel (2017) C. Jekel, “Fitting a piecewise linear function to data,” https://jekel.me/2017/Fitapiecewiselinearfunctiontodata/ (2017).
 Khaneja et al. (2005) N. Khaneja, T. Reiss, C. Kehlet, T. SchulteHerbrüggen, and S. J. Glaser, J. Magn. Reson. 172, 296 (2005).
Comments
There are no comments yet.