The central question in control theory is how to regulate the behavior of an evolving system with state that is perturbed by a disturbance by dynamically adjusting a control action . Traditionally, this question has been studied in two distinct settings: in the setting, we assume that the disturbance is generated by a stochastic process and seek to select the control so as to minimize the expected control cost, whereas in the setting we assume the noise is selected adversarially and instead seek to minimize the worst-case control cost.
Both and controllers suffer from an obvious drawback: they are designed with respect to a specific class of disturbances, and if the true disturbances fall outside of this class, the performance of the controller may be poor. Indeed, the loss in performance can be arbitrarily large if the disturbances are carefully chosen .
This observation naturally motivates the design of adaptive controllers, which dynamically adjust their control strategy as they sequentially observe the disturbances instead of blindly following a prescribed strategy. This problem has attracted much recent attention in the online learning community (e.g. [2, 15, 7]), mostly from the perspective of regret minimization. In this framework, the online controller is designed to minimize regret against the best controller selected in hindsight out of some class of controllers; the comparator class is often taken to be the class of state-feedback controllers or the class of disturbance-action controllers introduced in . The resulting controllers are adaptive in the sense that they seek to minimize cost irrespective of how the disturbances are generated.
In this paper, we take a somewhat different approach to the design of adaptive controllers. Instead of designing an online controller to minimize regret against the best controller selected in hindsight from some specific class, we instead focus on designing an online controller which minimizes regret against a clairvoyant noncausal controller, which knows the full sequence of disturbances in advance. The cost incurred by this noncausal controller is a lower bound on the cost incurred by any controller, since the noncausal controller selects the globally optimal sequence of control actions with full knowledge of the disturbances. This formulation of regret minimization is much more general: instead of imposing a priori some finite-dimensional, parametric structure on the controller we learn (e.g. state-feedback policies, disturbance action policies, etc), which may or may not be appropriate for the given control task, we compete with the globally optimal clairvoyant controller, with no artificial constraints. We ask: how closely can an online controller approximate the performance of a clairvoyant, noncausal controller? We design a new regret-optimal controller which approximates the performance of the optimal clairvoyant controller as closely as possible, and bound the worst-case difference in control costs between these two controllers (the regret) in terms of the energy of the disturbances.
We also consider online estimation (filtering) in dynamical systems from the perspective of regret minimization. Given a choice between a smoothed estimator and a filter, it is natural to prefer the smoothed estimator, since the smoothed estimator has access to noncausal measurements and can hence potentially outperform the filter. However, in many real-world settings we have only causal access to measurements and must generate estimates online using a filter. We ask: how closely can a filter approximate the performance of a smoothed estimator? We design a new regret-optimal filter which approximates the optimal smoothed estimator as closely as possible, and bound the worst-case difference in estimation error between these two estimators (the regret) in terms of the energy of the disturbances.
Our approach to regret minimization in estimation and control is similar in outlook to a series of works in online learning (e.g. [17, 16, 18, 11, 26]), which seek to design online learning algorithms which compete with a globally optimal, dynamic sequence of actions instead of the best fixed action selected in hindsight from some class (e.g. the best fixed arm in the Multi-Armed Bandit problem). This notion of regret is called “dynamic regret” and is natural when the reward-generating process encountered by the online algorithm varies over time. Unlike these prior works, we consider online optimization in settings with dynamics. This setting is considerably more challenging to analyze through the lens of regret, because the dynamics serve to couple costs across rounds; the estimates or control actions generated by a learning algorithm in one round affect the costs incurred in all subsequent rounds, making counterfactual analysis difficult.
1.1 Contributions of this paper
In this paper, we consider finite-horizon estimation and control in linear dynamical systems from the perspective of regret minimization. We obtain two main results. First, we derive a new causal estimator (a filter) which minimizes regret against a noncausal estimator which receives all the full sequence of observations at once (i.e. the optimal smoothing estimator). This filter is drop-in replacement for standard filters such as the Kalman filter and the filter. We present a state-space model of the regret-optimal filter by constructing a new linear system such that the filter in the new system is the regret-optimal filter in the original system. The new system has states, where is the number of states in the original system. Second, we derive a new causal controller which minimizes regret against a noncausal controller which receives the full sequence of disturbances in advance; this controller can be used in place of standard and controllers. Given an -dimensional linear control system and a corresponding cost functional, we show how to construct a -dimensional linear system and a new cost functional such that the -optimal controller in the new system minimizes regret against the noncausal controller in the original system. Our results easily extend to settings where the controller has access to predictions of the next disturbances, or only affects the system dynamics after a delay of timesteps (Section 8) We also present tight data-dependent regret bounds for both the regret-optimal filter and the regret-optimal controller in terms of the energy of the disturbances. Our results can be viewed as extending traditional estimation and control, which focuses on minimizing worst-case cost, to minimizing worst-case regret.
We next consider nonlinear systems and describe how to obtain regret-optimal filters and controllers via iterative linearization; these schemes can be viewed as regret-optimal analogs of the classic Model Predictive Control (MPC) and Extended Kalman Filter (EKF) algorithms. We present numerical experiments which show that our regret-optimal algorithms can significantly outperform standard and algorithms across a wide variety of input disturbances. All of the algorithms we obtain are computationally efficient, and run in time linear in the time horizon.
1.2 Related work
Regret minimization in control has has attracted much recent attention across several distinct settings. A series of papers (e.g. [1, 5, 4, 20]) consider a model where a linear system with unknown dynamics is perturbed by stochastic disturbances; an online learner picks control actions with the goal of minimizing regret against the optimal stabilizing controller. In the “non-stochastic control” setting proposed in , the learner knows the dynamics, but the disturbance may be generated adversarially; the controller seeks to minimize regret against the class of disturbance-action policies. An regret bound was given in ; this was improved to in  and in .
We emphasize that all of these works focus on minimizing regret against a fixed controller from some parametric class of control policies (policy regret), while we focus on competing against the clairvoyant noncausal controller; similar problems were also studied in, e.g. [9, 12, 23]. Regret minimization was studied in LTI, infinite horizon estimation  and control , and in finite-horizon measurement-feedback control in . Gradient-based algorithms with low dynamic regret against the class of disturbance-action policies were obtained in [13, 25].
We consider estimation and control in linear time-varying dynamical systems over a finite-horizon . An estimator is causal if its estimate at time depends only on the observations ; otherwise we say the estimator is noncausal. Causal estimation is often called filtering; noncausal estimation is often called smoothing. Similarly, a controller is causal if the control action it selects at time depends only on the disturbances ; otherwise the controller is noncausal. A strictly causal estimator or controller is defined analogously, except that its output at time depends only on the inputs up to time . For simplicity we focus on causal estimation and control, but we emphasize that our results easily extend to the strictly causal setting as well; we describe how to adjust our proofs in the Appendix. We often think of estimators or controllers as being represented by linear operators mapping disturbances to control actions ; in this framework, a (strictly) causal estimator or controller is precisely one whose associated operator is (strictly) lower-triangular.
In the estimation setting, a system with state evolves according to the dynamics
where and and is an unknown external disturbance. We assume for simplicity that the initial state is , though it is easy to extend our results to arbitrary . At each timestep, we receive a noisy linear observation of the state:
where is a measurement matrix and is an unknown measurement disturbance. The matrix is potentially sparse or low-rank, so that contains only limited information about the underlying state. Our goal is to estimate
where . We formulate estimation as an optimization problem where we seek to design an estimator which minimizes the error
We assume that are known, so the only uncertainty in the evolution of the system comes from the disturbances .
In the control setting, we consider a Linear Quadratic Regulator model where a system evolves according to the linear dynamics
Here is a state variable we seek to regulate, is a control variable which we can dynamically adjust to influence the evolution of the system, and is an external disturbance. We assume for simplicity that the initial state is , though it is easy to extend our results to arbitrary . We formulate the problem of controlling the system as an optimization problem, where the goal is to select the control actions so as to minimize the quadratic cost
where for . We assume that the dynamics and costs are known, so the only uncertainty in the evolution of the system comes from the disturbance .
As is standard in the input-output approach to estimation and control, we encode estimators and controller as linear transfer operators mapping the disturbances to the signal we wish to regulate. Let and define analogously. We define the energy in a signal to be
where is an appropriately defined operator depending on . Similarly, (3) can be written as
where the operator is block-diagonal with entries , and the error (4) is simply
In the control setting, we wish to regulate the signal , while simultaneously minimizing the energy in the control signal. The dynamics (5) are captured by the relation
where and are strictly causal operators encoding , and the LQR cost (6) can be written as
We note that this may involve re-parameterizing the original dynamics (5); we refer the reader to  for more background on transfer operators and the input-output approach to estimation and control.
2.1 Robust estimation and control
Our results rely heavily on techniques from robust estimation and control; in particular, we show that the problem of finding a regret-optimal estimator can be reduced to an estimation problem. Similarly, the problem of finding a regret-optimal controller can be reduced to an control problem. In this section, we review the formulation of estimation and control problems, along with their state-space solutions.
Problem 1 (-optimal estimation).
Find a causal estimator that minimizes
This problem has the natural interpretation of minimizing the worst-case gain from the energy in the disturbances and to the error of the estimator. In general, it is not known how to derive a closed-form for the -optimal estimator, so is it common to consider a relaxation:
Problem 2 (Suboptimal estimation at level ).
Given , find a causal estimator such that
for all disturbances , or determine whether no such estimator exists. This problem can also be expressed in terms of transfer operators; an equivalent formulation is to find such that
where is the unique causal operator such that
This problem has a well-known state-space solution:
Theorem 1 (Theorem 4.2.1 and Lemma 4.2.9 in ).
Given , a causal estimator at level exists if and only if the matrices
have the same inertia, where is the solution of the Riccati recursion
In this case, one possible estimator is given by
where is recursively computed as
where we initialize and define
We emphasize that an -optimal estimator is easily obtained from a solution of the suboptimal estimation problem by bisection on .
Problem 3 (-optimal control).
Find a causal controller that minimizes
This problem has the natural interpretation of minimizing the worst-case gain from the energy in the disturbance to the cost incurred by the controller. In general, it is not known how to derive a closed-form for the -optimal controller, so instead is it common to consider a relaxation:
Problem 4 (Suboptimal control at level ).
Given , find a causal controller such that
for all disturbances , or determine whether no such controller exists.
This problem has a well-known state-space solution:
Theorem 2 (Theorem 9.5.1 in ).
Given , an controller at level exists if and only if
for all , where
we define to be the solution of the backwards-time Riccati equation
with initialization , and we define
In this case, the suboptimal controller has the form
As in the estimation setting, an -optimal controller is easily obtained from the solution of the suboptimal control problem by bisection on .
2.2 The noncausal estimator and the noncausal controller
We briefly state a few facts about the clairvoyant noncausal estimator and the clairvoyant noncausal controller (sometimes called the offline optimal controller). The optimal noncausal estimate of given the observation is
and the corresponding error is
The optimal noncausal control action in response to the input disturbance is given by
and the corresponding cost incurred by the noncausal controller is
2.3 Regret-optimal estimation and control
In this paper, instead of minimizing the worst-case cost as in traditional estimation and control, our goal is to design estimators and controllers that minimize the worst-case regret. The regret of a causal estimator on inputs is the difference in the cost it incurs and the cost that a clairvoyant noncausal estimator would incur. In light of (8), the regret is
Similarly, the regret of a causal controller on an input disturbance is the difference in the cost it incurs and the cost that a clairvoyant noncausal controller would incur:
where and is the sequence of control actions selected by the causal controller. It is natural to formulate regret-minimization analogously to the problems considered in Section 2.1:
Problem 5 (Regret-optimal estimation).
Find a causal estimator that minimizes
This problem has the natural interpretation of minimizing the worst-case gain from the energy in the disturbances to the regret incurred by the estimator.
Problem 6 (Regret-suboptimal estimation at level ).
Given , find a causal estimator such that
for all disturbances , or determine whether no such estimator exists.
Problem 7 (Regret-optimal control at level ).
Find a causal controller that minimizes
This problem has the natural interpretation of minimizing the worst-case gain from the energy in the disturbance to the regret incurred by the controller. As in the setting, we consider the relaxation:
Problem 8 (Regret-suboptimal control).
Given a performance level , find a causal controller such that
for all disturbances , or determine whether no such policy exists.
We emphasize that, as in the setting, if we can solve the regret-suboptimal estimation and control problems, we can easily recover solutions to the regret-optimal estimation and control problems by bisection.
3 Regret-optimal estimation
We seek to design a causal estimator (a filter) such that the error residual has the smallest possible energy relative to the error residual associated with the noncausal estimator. Our approach is to reduce the regret-suboptimal estimation problem (Problem 6) to the suboptimal estimation problem (Problem 2).
For any operator , the corresponding transfer operator
is given by Recall that the optimal noncausal estimator is
and the regret-suboptimal estimation problem at level is to find a causal estimator such that
or to determine whether no such estimator exists. Condition (12) can be neatly expressed in terms of transfer operators as
Define the unitary operator
where we define causal operators such that
Notice that for every estimator , we have
and in particular, we have
Since is unitary, condition (13) is equivalent to
Expanding the right-hand side and using the the Schur complement, we see that this condition is equivalent to
where is the unique causal operator such that
This condition is equivalent to
Left-multiplying by and right-multiplying by , we obtain
Adding to both sides we obtain
Let be the unique causal operator such that
Left-multiplying (15) by and right-multiplying by , we obtain
Comparing with Problem (2), we see that condition (17) is in the form (7) with , , , and . The main technical challenge to deriving the regret-suboptimal estimator in state-space form is to obtain the factorization (16); with this in hand, we obtain the following theorem.
The regret-optimal filter is given by
where the state estimates can be recursively computed as
and we initialize . We define
and is the solution of the Riccati recursion
where we initialize . The matrices and are defined in (19) and (20). The regret-optimal filter is the regret-suboptimal filter at level , where is the smallest value of such that has the same inertia as the matrix . The regret incurred by the regret-optimal filter is at most .
See the Appendix.
4 Regret-optimal Control
We now turn to the problem of deriving the regret-optimal controller. Our approach is to reduce the regret-suboptimal control problem (Problem 8) to the suboptimal problem (Problem 4); once the regret-suboptimal controller is found, the regret-optimal controller can be easily obtained by bisection on . Recall that the regret-suboptimal problem at level is to find, if possible, a causal control policy such that for all disturbances ,
where . Since is strictly positive definite for all
, there exists a unique causal, invertible matrixsuch that
Letting and , we have . With this change of variables, the regret-suboptimal problem (18) takes the form of finding, if possible, a causal controller such that for all ,
Comparing with Problem 4, we see that this is a suboptimal problem at level in the system .
The main technical challenge to deriving a state-space model for the regret-suboptimal controller is to obtain an explicit factorization of as ; once we have obtained it is straightforward to recover a state-space description of the regret-optimal controller using the state-space description of the controller (Theorem 2). We obtain the following theorem:
The regret-suboptimal controller at level is given by
where we define
and the state variables and evolve according to to the dynamics
where we initialize . We define to be the solution of the forwards Riccati recursion
where we initialize , and define to be the solutions of the backwards Riccati recursions
where we initialize .
The regret-optimal causal controller is the regret-suboptimal controller at level , where is the smallest value of such that
for all . Furthermore, the regret incurred by the regret-optimal controller is at most .
See the Appendix. We note that our results easily extend to settings where the controller has access to predictions of the next disturbances, or only affects the system dynamics after a delay of timesteps; we refer the reader to Section 8 for details.
5 Numerical Experiments
We benchmark our regret-optimal controller in the classic inverted pendulum model. This system has two scalar states, and , representing angular position and angular velocity, respectively, and a single scalar control input . The states evolve according to the nonlinear evolution equation
where is an external disturbance, and and are physical parameters. Although these dynamics are nonlinear, we can benchmark the regret-optimal controller against the -optimal, -optimal, and clairvoyant noncausal controllers using Model Predictive Control (MPC). In the MPC framework, we iteratively linearize the model dynamics around the current state, compute the optimal control signal in the linearized system, and then update the state in the original nonlinear system using this control signal. In our experiments we take and assume that units are scaled so that all physical parameters are 1. We set the discretization parameter and sample the costs at times as ranges from 1 to 100. We initialize both the angular position and the angular velocity to zero. In our first experiment, the disturbance
is drawn from a standard Gaussian distribution (Figure1). The -optimal controller incurs the lowest cost; this is unsurprising, since the controller is designed to minimize the expected cost when the disturbances are stochastic. We note that the regret-optimal controller closely tracks the performance of the controller, and significantly outperforms the controller. In our second experiment, is a sawtooth signal (Figure 2); the regret-optimal controller achieves an order of magnitude less cost than the -optimal controller and also outperforms the controller. In our third experiment, is a sinusoidal signal. We select (Figure 3) and (Figure 4). In both settings, the regret-optimal controller closely tracks the performance of the clairvoyant noncausal controller and achieves two orders of magnitude better performance than the controller.
We next benchmark the regret-optimal filter. We consider frequency modulation, a classic setting in communications theory where a message is passed through an integrator to phase modulate a carrier signal. In this problem, there are two states, and . The dynamics are linear and time-invariant, and are given by
The observations are nonlinear in the state:
While the observation model is nonlinear, we can apply the regret-optimal filter by iteratively linearizing the dynamics around the current estimate, computing the regret-optimal filter in the linearized system, and forming a new estimate using this filter; this is the same approach used in the classic Extended Kalman filter (EKF) algorithm (see  for background on the EKF). We assume that both and are initialized to zero and set . We take the disturbance covariances in our EKF implementation. As in our control experiments, we set the discretization parameter and sample the costs at times as ranges from 1 to 100. We benchmark the regret-optimal filter against the EKF across a variety of input disturbances. In our first experiment, the disturbances are both selected i.i.d from a standard Gaussian distribution (Figure 5). We see that the EKF outperforms the regret-optimal filter; this is unsurprising, since the EKF is tuned for stochastic noise. In our second set of experiments, we generate sinusoidal disturbances. First, we take ; we see that the regret-optimal filter incurs roughly half the squared estimation error of the EKF (Figure 6). We next consider a more complex set of input disturbances, namely , and obtain similar results (Figure 7).
Together, our experiments show that by minimizing regret against the noncausal estimators and controllers, our regret-optimal algorithms are able to adapt to many different kinds of input disturbances.
We propose regret against a clairvoyant noncausal policy as a criterion for estimator and controller design, and show that regret-optimal estimators and controllers can be found by extending estimation and control to minimize regret instead of just cost. We give a complete characterization of regret-optimal estimators and controllers in state-space form, allowing efficient implementations whose computational cost scales linearly in the time horizon. We also give tight bounds on the regret incurred by our algorithms in terms of the energy of the disturbances. Numerical benchmarks in nonlinear systems show that our regret-optimal algorithms are able to adapt to many different kinds of disturbances and can often outperform standard -optimal and -optimal algorithms.
We identify several promising directions for future research. First, it is natural to consider minimizing regret against a controller which in each timestep has access to predictions of future disturbances, instead of the full sequence of disturbances; such a controller is more easily implemented in real-world systems, and the resulting regret-minimizing controller may demonstrate better performance. Second, it would be interesting to use our regret-minimization techniques to design decentralized controllers with access to local information which compete against a centralized controller with global information; such a distributed regret-optimal controller may have applications in network optimization and control.
7.1 Proof of Theorem 3
A state-space model for is given by
so a state-space model for is
Let be such that and , and define . Notice that . Suppose we can find an causal operator such that where
is a random variable with mean zero such that. Then necessarily . Using the backwards-time Kalman filter, we obtain a state-space model for :