Repository for the Control of Stochastic Quantum Dynamics with Differentiable Programming paper.
Controlling stochastic dynamics of a quantum system is an indispensable task in fields such as quantum information processing and metrology. Yet, there is no general ready-made approach to design efficient control strategies. Here, we propose a framework for the automated design of control schemes based on differentiable programming (∂P). We apply this approach to state preparation and stabilization of a qubit subjected to homodyne detection. To this end, we formulate the control task as an optimization problem where the loss function quantifies the distance from the target state and we employ neural networks (NNs) as controllers. The system's time evolution is governed by a stochastic differential equation (SDE). To implement efficient training, we backpropagate the gradient information from the loss function through the SDE solver using adjoint sensitivity methods. As a first example, we feed the quantum state to the controller and focus on different methods to obtain gradients. As a second example, we directly feed the homodyne detection signal to the controller. The instantaneous value of the homodyne current contains only very limited information on the actual state of the system, covered in unavoidable photon-number fluctuations. Despite the resulting poor signal-to-noise ratio, we can train our controller to prepare and stabilize the qubit to a target state with a mean fidelity around 85 solutions found by the NN to a hand-crafted control strategy.READ FULL TEXT VIEW PDF
We propose two numerical methods for the optimal control of McKean-Vlaso...
For complex real-world systems, designing controllers are a difficult ta...
Quantum control is valuable for various quantum technologies such as
In this paper, we propose a framework to repair a pre-trained feed-forwa...
Time-reversal symmetry, which requires that the dynamics of a system sho...
Use of Control theory within process control industries has changed rapi...
Robust control design for quantum systems has been recognized as a key t...
Repository for the Control of Stochastic Quantum Dynamics with Differentiable Programming paper.
The ability to precisely prepare and manipulate quantum degrees of freedom is a prerequisite of most applications of quantum mechanics in sensing, computation, simulation and general information processing. Many relevant tasks in this area can be formulated as optimal control problems, and therefore,quantum control is a rich and very active research field, see [1, 2] for two recent textbooks that provide theoretical background and  for a recent review of important issues.
A typical goal of quantum control is to find a sequence of operations or parameter values (e.g. external field amplitudes) such that the quantum system under consideration is maintained in a certain target state or evolves in a desired fashion subject to additional boundary conditions (e.g. most rapid evolution for a prescribed maximal strength of the control fields). In the case of feedback control the control sequence is determined based on a signal coming from the system [3, 4]. The control task and its boundary conditions are typically specified by a loss function. To cast the optimal control problem into an optimization task, one then introduces a parametric ansatz for feedback schemes, also known as controllers, and explores the parameter space to minimize the loss function.
Reinforcement learning (RL) [5, 6] has been proposed as a suitable framework to develop control strategies. In this framework, the controller (or agent), optimizes its strategy (the policy) based on the loss function (the rewards) obtained from repetitive interactions with the system to be controlled. In the context of quantum physics, RL has proven useful, e.g. for reducing the error rate in quantum gate synthesis , for autonomous preparation of Floquet-engineered states , and for optimal manipulation of many-qubit systems . RL is a black-box setting, i.e. the agent has no prior knowledge on the structure of the system it interacts with, and has to develop its policy (explore the space of control parameters) only relying on its past interactions with the system. This makes RL very versatile but requires a large number of training episodes to find a good strategy.
Optimal control design is rarely done on an (unknown) system in situ. Instead one trains the controller on a physical model of the real system. This means that rather than learning how to interact with an unknown environment, one actually starts with a lot of prior scientific knowledge about the system, namely its precise dynamical model. In the simplest cases, one can even solve the optimal control problem for this model analytically. Generally, using prior information leads to more data-efficient training [10, 11]. In the context of the present paper, we use the physical model of the system to efficiently compute the loss function’s gradients with respect to the parameters of the controller ansatz. Naturally, having access to the gradients of the loss function can streamline the optimal control design tremendously.
In the case of quantum control the model consist of a quantum state space and a parametric equation of motion. The dynamics of the system can be solved for fixed values of the parameters of the model and of the controller. Usually, this is done numerically. Then, the most naive way to obtain the loss function’s gradients is to solve the dynamics for a set of parameter values in the neighborhood of each point and use finite difference quotients. Yet, this method is unfeasible if the number of parameters is large, it suffers from floating-point errors, and it may be numerically unstable. To circumvent these issues, automatic differentiation (AD) has been proposed as another approach to calculate gradients numerically [12, 13], a paradigm also known as differentiable programming (P) [14, 15]. By backpropagating the loss function’s sensitivity through the numerical simulation, one can compute the gradients with similar time complexity as solving the system’s dynamics . Recently, these techniques have been merged with deep neural networks (NNs) as ansatz for the controllers [10, 17, 18]
. This is possible because the training of deep NNs is based on stochastic gradient descent through a large parameter space and becomes efficient when used in conjunction withP-compatible solvers for the system’s dynamics.
In this work, we develop such a physics-informed reinforcement learning framework based on P and NNs to control the stochastic dynamics of a quantum system under continuous monitoring [2, 19]. Continuous measurements, such as photon counting and homodyne detection, allow one to gain information on the random evolution of a dissipative quantum system 
. This information can be used to estimate the state of the quantum system[20, 21, 22], to implement feedback protocols [23, 24, 25, 26, 27], to generate nonclassical states [28, 29, 30, 31], and to implement teleportation protocols [32, 33]. Continuous homodyne detection can be realized experimentally in the microwave [34, 35] and optical regime [20, 36]. The time evolution of a monitored quantum system is described by quantum trajectories, which are solutions of differential equations driven by a Lévy process.
To illustrate our framework, we focus on a qubit subjected to continuous homodyne detection [34, 37] described by a stochastic Schrödinger equation. We engineer a controller which provides a control scheme to fulfil a given state preparation task based on the measured homodyne current. This situation extends an earlier study , where it has been demonstrated that
P can be efficiently used for quantum control in the context of (unitary) closed-system dynamics, i.e. when the dynamics follows an ordinary differential equation. The stochastic nature of the problem analyzed here renders the control task more challenging because the controller must adapt to the random evolution of the quantum state in each trajectory. Moreover, the instantaneous value of the homodyne current does not determine the actual state of the qubit. It is correlated only to the projection of the state onto the-axis and this signal is hidden in the noise which dominates the measured homodyne current [38, 39]. Thus, the information about the state of the qubit at a given time must be filtered out from the time series of measurement results.
This paper is organized as follows: In Section 2, we describe the proposed setup of a qubit in a leaky cavity subjected to homodyne detection and derive the stochastic Schrödinger equation that describes its dynamics. We then discuss two ways to use the record of the homodyne detection signal in a feedback scheme to engineer a drive that can be applied to the qubit to perform a desired control task, e.g. state preparation or stabilization. We also introduce the concept of adjoint sensitivity methods that we use to efficiently compute gradients with respect to solutions of SDEs. Section 3 describes the first feedback scheme in detail: Here, we assume that the controller has direct access to the quantum state, e.g. through an appropriate filtering procedure applied to the measurement records. We compare three strategies, viz. a hand-crafted control scheme, one in which a neural network continuously updates the control drive based on the knowledge of the state, and a numerically less demanding one with a piecewise-constant control drive. Section 4 presents the second feedback scheme where directly the measurement record of the homodyne current is fed to the NN representing the controller. In this setup, the NN must first learn how to filter the data to obtain information on the state of the system. Only after that it can propose an efficient control strategy. We conclude in Section 5 and discuss potential future applications.
We consider a driven two-level system with states and . In a rotating frame, its Hamiltonian reads
where are Pauli matrices, is the Rabi frequency of the drive laser, and is the detuning between the qubit and the laser. The qubit can spontaneously decay into a photon field via the interaction Hamiltonian
where is the decay rate. The field operators and satisfy the commutation relation . We assume that the field is initially in the vacuum state, . Physical examples of such a system are a two-level atom inside a leaky single-mode cavity that can be adiabatically eliminated, or an artificial atom, e.g. a superconducting qubit, coupled to a waveguide. The radiation emitted from the two-level system is monitored with a continuous homodyne measurement, as depicted in Fig. 1(a).
where denotes an unnormalized qubit state. The instantaneous value of the measured homodyne current
is a random variable satisfying
Here, is the expectation value of at time , and
is a stochastic white-noise term satisfying
, which stems from the shot noise of the local oscillator. Heuristically,can be considered as the derivative of a stochastic Wiener increment, , such that the contribution of the noise to the current integrated over a short time interval is described by a Wiener process,
The ensemble averages of the stochastic Wiener increment satisfy and . Several remarks are in order to better understand the dynamics of the system.
First, Eq. (5) implies that the value of the current over a short interval contains only very little information about the state of the qubit. This can be seen from the (heuristically stated) vanishing signal-to-noise ratio
If was a constant signal, one could simply integrate the current over a time interval longer than to increase the signal-to-noise ratio. However, this is not possible because relaxation will change the state of the two-level system on the time scale . Thus, the low signal-to-noise ratio is an intrinsic feature of this homodyne detection scheme. This is illustrated in Figs. 1(c) and (d), where we show a simulation of the homodyne current together with the respective value for a single quantum trajectory.
Second, Eq. (3) shows that the infinitesimal time evolution and thus the quantum trajectory is fully determined by the record of the measured homodyne current and the values of the applied drive
, which are vectors containing the respective values ofand from the start time until the time . In A.4, we derive a closed-form expression of the operator , which gives the mapping
between the states of the qubit at times and . The operator can be interpreted as a filter determining the state of the qubit at time from the values of the measured homodyne current and the applied drive.
Finally, Eq. (3) does not preserve the norm of the state , as indicated by the tilde. This is no problem if one integrates Eq. (3) numerically, since one can renormalize the state after each time step. For analytical calculations, it is useful to add some correction terms (see A.5 or [2, 19]) such that the norm of the state is preserved up to second order in ,
Here, the nonlinear drift and diffusion terms are
Note that Eq. (8) is a stochastic Schrödinger equation in the Itô form with multiplicative scalar noise.
In our control protocol, the results of the homodyne detection measurement determine the drive to be applied to the qubit. We will consider two different schemes which are sketched in Fig. 1(b).
In the first scheme, discussed in Section 3, we filter the homodyne signal to extract the system’s exact state at time . Subsequently, the controller receives this state as an input and determines the drive to be applied next. Equations (7) and (55) give an explicit filtering procedure to determine the state of the qubit at time from the records of homodyne measurements and the drive , which are known in the experiment. Since we train the agent on simulated trajectories, we know the system’s state at each time step. Therefore, we can skip the filter in Fig. 1(b) and directly feed back the solution of the SDE solver at each time step to our controller. In other words we assume a perfect filtering at any time. Thus, this situation corresponds to the feedback used in Ref.  but the deterministic evolution is replaced by an SDE. We will use this control scheme in Section 3 to test different backpropagation methods and to compare different control strategies.
In the second scheme, discussed in Section 4, the controller obtains at time the homodyne current record measured over some time interval . Now, the NN forming the controller must simultaneously learn how to filter the signal from the noise and to predict the next action . Such an implementation of the control protocol based only on is a challenging task because the signal of the system quadrature is hidden in the noise as discussed in Section 2.1.
The learning scheme to control the stochastic dynamics of the continuously monitored qubit based on P consists of three building blocks as sketched in Fig. 2: a parametrized controller , which will be formed by a NN, a model of the dynamics, expressed as an SDE, and a loss function.
At the beginning of each run, we initialize the system in an arbitrary state on the Bloch sphere,
where and are the excited and ground states of the qubit in the -basis, respectively. To ensure that the controller will perform optimally for any initial state, we sample the angles and uniformly from their intervals and , respectively.
Depending on the chosen control scheme, cf. Fig. 1(b), the controller receives as an input either the quantum state (Section 3) or the last homodyne detection record in form of a vector gathered over some time interval (Section 4). Moreover, in Section 4 the controller also receives a vector of the last control actions applied prior to the time .
The controller then maps this input to the next value of the drive, . Given and , we use the Runge-Kutta Milstein solver from the StochasticDiffEq.jl package [40, 41, 42] to calculate the next state according to the SDE (8). This loop between control agent and SDE solver is iterated for all time steps. We store the quantum states and the drive values at uniformly spaced time steps to be able to evaluate the loss function. Hereafter, the set is called checkpoints.
where the terms encode case-specific objectives of the optimization process, see, e.g. Ref. . Their relative importance can be controlled by the weights . We choose the weights
empirically but, if necessary, they could also be tuned by means of hyperparameter optimization techniques. To enforce a large fidelity with respect to the target state over the whole control interval, we include into the loss function the average infidelity of the checkpoints with respect to the target state ,
This form of the loss function also leads to a time-optimal performance of the controller. To focus on specific time intervals, like the last few steps for example, the sum in Eq. (13) can be straightforwardly adjusted. We will consider the target state in the following. In addition to , we include the term
in the loss function to favor smaller amplitudes of the drive . In Section 4, we will find that this term is important to suppress the collapse of the NN towards a strategy where constant maximal pulses are applied during the training.
At this stage, we have calculated a quantum trajectory and evaluated its value of the loss function. In the next step, we must update the control strategy to decrease the value of the loss function. The derivative of the loss function with respect to the parameters of the neural network provides a meaningful update rule towards a better control strategy. Thus, we need to calculate the gradient . This can be computed efficiently using sensitivity methods for SDEs discussed next.
The loss function , defined in Eq. (12), is a scalar function which explicitly depends on the checkpoints and implicitly on the parameters of the controller. In contrast to score-function estimators [44, 45, 46], such as the REINFORCE algorithm , we will incorporate the physical model into the gradient computation as sketched in Fig. 2.
Automatic differentiation (AD) is a powerful tool to evaluate gradients of numeric functions at machine precision . A key concept to AD is the computational graph , also known as Wengert trace, which is a directed acyclic graph that represents the sequence of elementary operations that a computer program applies to its input values to calculate its output values. The nodes of the computational graph are the elementary computational steps of the program called primitives. The outcome of each primitive operation, called an intermediate variable, is evaluated in the forward pass through the graph.
In forward-mode AD, one associates with each intermediate variable the value of the derivative
with respect to a parameter of interest. The derivatives are calculated together with the associated intermediate values in the forward pass, i.e. the gradient is pushed forward through the graph. This procedure must be repeated for each parameter , therefore, forward-mode AD scales poorly in computation time with increasing number of parameters .
In contrast, reverse-mode AD traverses the computation graph backwards from the loss function to the parameters by defining an adjoint process:
which is the sensitivity of the loss function with respect to changes in the intermediate variable . Reverse-mode AD is very efficient in terms of the number of input parameters because one needs just a single backward pass after the forward pass to obtain the gradient with respect to all parameters . Thus, we always implement the AD of the NN in reverse mode. However, reverse-mode AD might be very memory expensive because all intermediate variables from the forward pass need to be stored for the backward pass. Therefore, reverse-mode AD scales poorly in memory if the number of steps and parameters of the SDE solver increases. Whether forward-mode or reverse-mode AD is more efficient to calculate gradients of the loss function depends on the specific details of the control loop shown in Fig. 2.
In fact, it is possible to combine different AD methods on different parts of the computational graph as we illustrate now. In Secs. 3.3 and 4, we will use Algorithm 1, where the drive is piecewise constant, i.e. a constant value of is applied over successive time steps between two checkpoints. In this case, the memory consumption of the reverse-mode AD of the NN is moderate since the number of parameters grows only with the number of checkpoints rather than with the number of time steps. In contrast, in the SDE solver the evaluation of the time evolution between two checkpoints and only depends on the single parameter , which makes forward-mode AD very efficient . Therefore, we nest both methods and use an inner forward-mode AD through the SDE solver and an outer reverse-mode AD for the remaining parts, i.e. the NN and the computation of the loss function.
Restricting oneself to a piecewise-constant control drive, however, prevents the controller from reacting instantaneously to changes of the state. To implement a fast control loop, the controller has to be placed in the drift term of the SDE (8). Thus, the parameters entering the solver will be the NN parameters , see Algorithm 2. The continuous adjoint sensitivity method [51, 52, 53] circumvents the resulting memory issues by introducing a new primitive for the whole SDE in the backward pass of the code. This new primitive is determined by the solution of another SDE problem, the adjoint SDE. Formally, defining the adjoint process as
the adjoint SDE problem in the Itô sense satisfies the differential equation
with the initial condition
where the standard conversion factor in Eq. (18) accounts for the required transformation from the Itô to the Stratonovich sense and vice versa . The gradients of the loss function with respect to are then determined by the integration of
with the initial condition . This continuous adjoint sensitivity method will be used in Section 3.2. Further details regarding the adjoint method and our Julia implementation [55, 56] within the SciML ecosystem [11, 16, 40] are discussed in C.2.
We first investigate the scenario in which the controller maps the quantum state to a new control parameter, . From a learning perspective, this is a major simplification because the NN does not need to learn how to filter the homodyne current to determine the state . From a practical perspective of controller design, this approach assumes that there is already a filter module, which allows one to predict the state of the system from the measurement record and past control actions. We discuss in Section 2.1 and A.4 the implementation of such a filter in our case of a qubit subjected to homodyne detection. Note that if the initial state of the qubit is unknown, only a mixed state can be obtained if the detection record is too short. Nevertheless, it is always possible to obtain a pure state estimate and recover our scenario, e.g. by projecting onto the surface of the Bloch sphere. Alternatively, one can straightforwardly generalize our approach to the case of a stochastic quantum master equation describing the evolution of the system’s density matrix in the presence of homodyne detection, cf. A.5.
For all numerical experiments we fix the parameters of the physical model as and .
The control operator in Eq. (1) induces a rotation about the -axis. Therefore, a simple but very intuitive strategy to move the state upwards in each time step is to compute the expectation value and to choose the direction of rotation depending on its sign. Specifically, if (or ), the controller should rotate (counter-) clockwise about the -axis, as summarized in Algorithm 3.
Figure 3(a,b) shows the fidelity and the associated drive for this control scheme. Though conceptually straightforward, this hand-crafted control function is very efficient. The mean fidelity over the whole control interval is . We visualize the control strategy in Fig. 3(c), where we show the applied drive as a function of the state on the Bloch sphere. The Bloch sphere [with spherical coordinates , see also Eq. (11)] is mapped onto the tangential plane at , described by polar coordinates , using a stereographic projection
Apparently, the stereographic projection maps the south pole to and the north pole to . Throughout this paper we truncate the value of to an interval , so we plot the applied drive values on the states of the southern hemisphere, cf. Fig. 3(c,d).
Now we apply a NN as the controller. We use a controller which changes in every time step of the solver based on the current state . This implements a high-frequency feedback loop but renders discrete AD methods very inefficient since all parameters of the NN contribute to the feedback loop. Consequently, we use the continuous adjoint sensitivity methods described in Section 2.4 to calculate gradients of the loss function.
Figure 4(a) shows the smooth evolution of the loss function throughout the training of the (fully-connected) neural network, which converges to a configuration able to reach quickly fidelities with mean value of about with modest standard deviation, see Fig. 4(b). The mean fidelity over the whole control interval is . Figure 4(c) illustrates the applied drive during this time evolution. The inset of Fig. 4(a) visualizes the control strategy on the Bloch sphere. The same is depicted in Fig. 3(d) using the stereographic projection, see Eq. (21). The controlled evolution of the initial state , corresponding to the black lines in Fig. 4(b,c), is marked by the yellow points. These points show how the controller first transfers the state from the south pole to the north pole region and then stabilizes it in vicinity of the target state .
Now we reduce the control frequency and assume that the controller changes the action only every time steps. This case is crucial in many physical situations, where the control loop is not fast enough to follow high-frequency changes in the physical system. In practice this means that we evolve the state for substeps between the checkpoints with fixed value of . The resulting piecewise-constant control scheme allows us to use the discrete forward-mode adjoint sensitivity method through the SDE solver combined with an outer reverse-mode AD through the rest of the control loop, as described in Section 2.4. Although we restricted the rate at which can change, we find that the NN converges to a similar learning behavior with similarly large fidelities as in Section 3.2, see Fig. 4(e). The mean fidelity over the whole control interval is .
Based on the results from a set of 256 trajectories, all three control strategies perform nearly equally well in terms of their average fidelities. The piecewise-constant controller slightly outperforms the other two approaches by having the smallest relative dispersion of the mean fidelity , which we attribute to the larger number of training epochs and the larger NN, see Table 1.
The hand-crafted strategy achieves a large average fidelity, but generates sudden jumps in the drive , as shown in Fig. 3(b). Such a drive is experimentally hardly feasible. We find that NNs with moderate depths provide a smooth mapping between the input states and the drive, see Figs. 3(c) vs. 3(d), while keeping high fidelity in the control interval. The signals generated by these protocols are experimentally more accessible, see Figs. 4(d) and (f). When necessary, specific terms can be added to in Eq. (12) to strengthen various requirements on the controller’s performance (e.g. the smoothness of the drive or bounds on the power input) as discussed in Section 2.2. This is not the case for the hand-crafted strategies where an efficient implementation of these constraints might be impossible. Furthermore, in some cases, e.g. the mountain car problem, it is not straightforward to come up with any hand-crafted strategy to start with. In contrast, the RL and P approach is easily adjustable to different physical systems.
In this section, we will construct a controller which directly obtains the (noisy) measurement record of the homodyne current and determines the optimal control field in each time interval . We consider a controller formed by a slightly augmented NN architecture with fully connected layers (see Fig. 6). The acquisition of input data for the NN consists of the following steps:
The controller generates a piecewise-constant drive between two checkpoints at times and , as described in Section 3.3. In the time window , we integrate the SDE (8) using substeps of length as sketched in Fig. 5(a). We label these substeps with . The input for the NN to predict the next is a vector where is the length of the time interval over which we gather the data. Experimentally, can be interpreted as the detection time window of the photodetectors. According to Eqs. (4) and (5), the homodyne measurement in the th substep is
The first term corresponds to the quadrature signal measured over the detection window , which we assume to be approximately constant on time scales of the order of . The second term is the Wiener increment during the th substep. Note that the quantity is dimensionless and solely represents the number of measured detected photons, as discussed in A.
In addition, we provide the NN with the information on the last control parameters . This equips the NN with a “memory” of its own actions, such that it can take into account how a sequence of the last control drive amplitudes affected the performance. We empirically choose such that the length of the input vector corresponds to of the length of .
Given these inputs, the task for the controller is to provide an optimal mapping . Figure 5(b) shows the evolution of the loss function during the learning phase as a function of the training epochs. Note that there are two distinct plateaus. First, after a couple of hundred epochs, the NN develops a general strategy consisting of the application of a periodic drive to prevent the qubit from decaying to the ground state. Figures 5(c,d) show examples of the performance of the NN at epoch to illustrate this phase of the training. This strategy is state-independent, yet it manages to keep the mean fidelity of the simulated trajectories around over the whole control interval. Around epoch , after some transition phase where oscillates substantially, the NN starts to provide state-sensitive control fields. The final performance of the controller in Fig. 5(e,f) reaches the mean fidelity over the whole control interval. During the last 50 time steps in the control interval, which we specifically focus on during the training using the adjusted loss function term of the type (13) for these steps, the average fidelity is .
At this stage, we infer that the NN has learnt how to extract the signal from the noisy data before it attains a similar learning strategy as seen in Section 3. In contrast to the filter mentioned in Section 2.1 and A.4, this universal approach based on P will also work if some parameters of the model were a priori unknown. For example, if the detuning was unknown, one could train the controller on an ensemble of randomly chosen parameters , such that it learns how to deal with the general situation of arbitrary detuning. In this case a straightforward filtering of the signals to obtain the state is unfeasible, as this would require to solve the filter for all possible values of model parameters, cf. Eq. (55). Other options for signal filtering would be (recursive) backward filtering methods [57, 58]
or recurrent neural networks because their structure allows them to capture temporal correlations in the data. Note that such filter methods are compatible with the two-step control approach described in Section 3.
In this work we proposed a framework based on differentiable programming (P) to automatically design feedback control protocols for stochastic quantum dynamics. As a test bed, we used a qubit subjected to homodyne detection, whose dynamics is given by a stochastic Schrödinger equation. Note, however, that our method can straightforwardly be applied to different physical systems and that it can be generalized to the case of stochastic quantum master equations.
In Section 3, we demonstrated that a controller, formed by a NN, can be trained to prepare and stabilize a target state starting from an arbitrary initial state of the system when the NN obtains the full knowledge of the instantaneous state at any time. The method generates a smooth drive while maintaining a high fidelity with the target state over the whole control interval. Additional constraints on the performance of the controller can be implemented by adding further terms to the loss function. This makes the P approach more versatile than tailoring control functions manually, which requires a unique approach for every new system and can even be infeasible for large quantum systems.
The key feature of our P framework is to include an explicit model of the system’s dynamics into the computation of the gradients of the loss function with respect to the parameters of the NN, i.e. the controller. Specifically, in Section 3.2
, we employed the recently developed continuous adjoint sensitivity method for the gradient estimation through an SDE solver, which is memory efficient and, thus, allows us to study a high-frequency controller. F.S. implemented these new continuous adjoint sensitivity methods in the DiffEqSensitivity.jl package within the open-source SciML ecosystem.
In Section 4, we showed that the feedback control scheme can be based directly on providing the NN with a record of homodyne current measurements without the need to filter the information on the actual state beforehand. Therefore, the NN must first learn how to filter the input data (with poor signal-to-noise ratio) before it can predict optimal state-dependent values of the control drive. Ultimately, the trained NN was able to reach fidelities above in a target time interval for random initial states.
In future studies, the optimization of the loss function based on stochastic trajectories using adjoint sensitivity methods could be compared to alternative approaches. First, the solution to the stochastic optimal control problem in the specific case of Markovian feedback (as in Section 3) is a Hamilton-Jacobi-Bellman equation [11, 54]
. The solution of this partial differential equation, with same dimension as the original SDE, may directly give the optimal drive. However, solving this partial differential equation with a mesh-based technique is computationally demanding and mesh-free methods, e.g. based on NNs also require a (potentially costly) training procedure . Second, the expected values of the loss functions could be optimized by leveraging the Koopman expectation for direct computation of expected values from stochastic and uncertain models 
. Additionally, one could approach this control problem by using an SDE moment expansion to generate ordinary differential equations for the moments and apply a closure relationship. Additional research is required to ascertain the efficiency of these approaches in comparison to our method.
The results reported in this paper imply that
P is a powerful tool for the automated design of quantum control protocols. Further experimental needs, e.g. finite time lag between the measurement and the applied drive, finite-temperature effects, or imperfect homodyne detection, can be incorporated straightforwardly into this method. Thus, our work introduces a new perspective on how prior physical knowledge can be encoded into machine learning tools to construct a universal control framework. Besides the control application demonstrated here, theP paradigm can be also adopted to solve other inverse problems such as estimating model parameters from experimental data [63, 64, 65]. An interesting perspective for future work is to extend our framework to control-assisted quantum sensing and metrology.
We would like to thank Niels Lörch, Eliska Greplova, Moritz Schauer, and Chris Rackauckas for helpful discussions. We acknowledge financial support from the Swiss National Science Foundation (SNSF) and the NCCR Quantum Science and Technology. Parts of the computations were performed at sciCORE (scicore.unibas.ch) scientific computing core facility at University of Basel.
Consider a two-level atom interacting with a free photon. In the case of discrete photon modes, the interaction is given by the usual Hamiltonian with and being the bosonic creation and annihilation operators, respectively, fulfilling the commutation relation , and being a coupling constant with dimension of energy. In contrast to the notation used in the main text, in this appendix, we will mark operators by a hat to distinguish them from scalars. This Hamiltonian results from the dipole interaction written in the form and application of the rotating wave approximation. However, when addressing a decay to the continuum, the photons are represented by quantum field operators with commutation relation , thus having units of square root of energy (or, equivalently, inverse square root of time). The interaction Hamiltonian takes the form
where is the decay rate, again, with dimension of energy like .
The field propagator in the interaction picture can be formally written as , where is the time-ordering operator. Expanding for short times , we find
Here, the second-order terms give an important contribution. Higher-order terms only contribute as and can be safely neglected, and we also assume that is short on the timescale of internal qubit dynamics. Assuming the free field is originally in the vacuum state we can write
is a properly normalized state of the field, and denotes the excited qubit state.
The simplest way to monitor the qubit state
is then to continuously measure the intensity of the outgoing field. This defines a Poissonian process, as the probability to detect a photon in a time intervalreads
and the probability of no detection is respectively. Thus, we obtain the standard unraveling for the spontaneous decay process
An alternative to photon-counting detection is to mix the outgoing light with coherent light from a local-oscillator laser on a beamsplitter, and to measure the intensities at both output ports of the beamsplitter. As a warm-up we first consider the situation where the intensity of the local-oscillator field is comparable to the emitted light. It is given by the expansion of a coherent state using only vacuum and single-photon states
where the single-photon state of the -mode is defined identically to the mode . Mixing the two states on a 50:50 beamsplitter gives
Introducing the photocurrent variable , which measures the intensity difference of the two outputs, we see that there are three possibilities, . The respective probabilities of the three outcomes are given by the norms of the three different terms in Eq. (29),
The three possible states of the system after the measurement are proportional to
By setting (i.e. no mixing with a local-oscillator field), we recover the previous case where the qubit simply has a chance to decay
and . The only difference is that the emitted photon is randomly split between the two detectors, with no information about the state of the qubit contained in the sign of .
Finally, we consider the situation treated in the main text. The standard homodyne measurement requires mixing the signal with a strong coherent local-oscillator field
Recall that here the Fock states are defined with the creation operator , describing the field impinging on the lower port of the beamsplitter during the interval , see Fig. 1(a). For the modes and to match, the local-oscillator laser has to be in resonance with the drive laser because, in the lab frame, the field picks up the phase of rotating at the frequency of the drive laser.
When the modes match, the 50:50 beamsplitter transformation takes the form
In particular, this implies , i.e. a coherent state of the mode is split into two coherent states of half the intensity when mixed with the vacuum state of the mode on a 50:50 beamsplitter. Using the last two expressions, we can write down the overall state of the qubit, the spontaneously emitted radiation and the homodyne laser field after the beamsplitter. To do so, let us compute
which is the operator mapping the state of the qubit at time to the state of the qubit and the two detected modes at time . We can further simplify this expression by noting that
where is the photon number operator. Labeling the photon number operator for the mode as , we can then rewrite Eq. (35) in a very intuitive form
Again, directly gives us the state of the qubit and the detected modes, while
gives the conditional state of the qubit together with the probability to detect and photons respectively. In the measurement process, superpositions of states with different photon numbers collapse such that the state after the post-measurement is a classical statistical mixture of the possible outcomes,
One easily sees from Eq. (37) that , i.e. only the difference of the photon counts reveals information on the state of the qubit, while the sum only describes the shot noise of the local oscillator. It is therefore sufficient to keep the difference and discard the sum, which defines the state
In a slight abuse of notation we can formally introduce the joint quantum state of the qubit and the count difference as with
, which gives rise to the same post-measurement state. Here,
where is the modified Bessel function of the first kind.
Next, we use that the distribution of on the right-hand side of Eq. (42
) is well approximated by the normal distributionin the limit . Therefore, we can replace the state of an integer-valued in Eq. (41) with
of a continuously valued and normally distributed , and we also introduce a continuously valued operator . In the regime of interest the actual value of does not play a role. To get rid of it, recall that is the intensity of the local-oscillator laser in the time window , so it is more convenient to work with the laser power , which is independent of the choice of the time window . Then, we can rescale the photon count difference to
to get rid of the laser intensity. Equation (41) now reads
where the initial state of the rescaled can be obtained form Eq. (43) and satisfies
This also allows us to define the homodyne current of the main text , as the photon count difference per time. At this point note that Eq. (45) already implies Eq. (3). Let us now show how the measured value of is distributed. The marginal distribution of after the measurement reads