1 Introduction
Recent high profile successes and the resulting hype in machine learning and reinforcement learning are generating renewed interest in adaptive control and system identification, which have their own decadeslong histories
[aastrom2013adaptive, ljung1998system]. Classical work on adaptive control and system identification largely focused on asymptotics, including stability, consistency, asymptotic variance, etc. Emerging research at the intersection of learning and control shifts focus to nonasymptotic statistical analyses, including regret and sample efficiency in various adaptive control and learning algorithms [abbasi2011regret, dean2018regret, Dean2019].In both classical and emerging work in learning and control, robustness has been a key issue. The classical focus on asymptotics led to a strong emphasis on certaintyequivalent adaptive control, where inevitable uncertainty in model estimates is ignored for control design and unsurprisingly can lead to serious lack of robustness. It remains poorly understood how to best interface both asymptotic and nonasymptotic uncertainty descriptions of model estimates with robust control design methods. One of the main difficulties is a mismatch in uncertainty descriptions: system identification almost universally uses stochastic data models^{1}^{1}1A notable exception is set membership identification [milanese1991optimal], which maintains a set of models that could have produced the data within some error bound and interfaces somewhat more naturally with certain traditional robust control design methods., whereas robust control traditionally uses setbased descriptions and worstcase design. This can lead to unnecessary conservatism when the assumed model uncertainty sets are poorly aligned with inherent statistical model uncertainty.
The contributions of the present work are as follows:

We propose a robust adaptive control algorithm where the model uncertainty description and robust control design method both use stochastic uncertainty representations.

We show via numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.
The algorithm has three components: (1) a leastsquares nominal model estimator; (2) a bootstrap resampling method that quantifies nonasymptotic variance of the nominal model estimate; and (3) a nonconventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. This approach provides a natural interface between two widely used and highly effective methods from statistics and optimal control theory (namely, bootstrap sample variance and LQR). It is known that certainty equivalent adaptive control can achieve asymptotic optimality and statistical efficiency with an order optimal rate [chen2012identification, kumar2015stochastic, mania2019certainty]. However, neither of these imply anything about nonasymptotic optimality or robustness.
2 Problem Formulation
We consider adaptive control of the discretetime linear dynamical system x_t+1 = A x_t + B u_t + w_t, where is the system state, is the control input, and is i.i.d. process noise with zero mean and covariance matrix . The system matrices are assumed unknown, so an adaptive controller is to be designed based only on stateinput trajectory data , . We consider the linear quadratic adaptive optimal control problem
(1) 
where and are cost matrices, and the optimization is over (measureable) history dependent feedback policies with . The constant in the stage costs represents the optimal infinitehorizon average steady state cost when the system matrices are known, which results from a static linear state feedback, , whose gain matrix can be computed via several known methods, including value iteration, policy iteration, and semidefinite programming. This constant gives the stage cost an interpretation as regret.
The finite horizon objective in aoc emphasizes the nonasymptotic
performance of the adaptive controller. This stands in contrast to a majority of classical work on adaptive control, which tends to focus on asymptotic performance and stability. Note that regret is a random variable that ideally should be small, and there are various ways to measure its size, including expected regret, regret variance, and measures of regret risk, such as value at risk or conditional value at risk.
It has been long known that this problem can be solved in principle by redefining the state as the (infinitedimensional) joint conditional distribution over the original state and unknown model parameters and applying dynamic programming [bellman1961adaptive]. However, this approach is intractable even for the most trivial instances. Since computing the optimal policy exactly appears to be intractable, we instead aim to design a computationally implementable controller with good performance and robustness properties, i.e., one that achieves both small expected regret and small regret risk. In particular, our algorithm accounts for uncertainty in various directions by modeling them as multiplicative noises, in contrast to the isotropic robustness afforded by the systemlevel synthesis in [dean2018regret, Dean2019]. We compare with a certainty equivalent adaptive controller, where uncertainty is ignored and a controller is designed as if the point model estimates were exact.
3 Preliminaries: Multiplicative Noise LQR
Here we will represent parameter uncertainty in model estimates stochastically using covariance matrices estimated from bootstrap resampling of finite data records. This representation interfaces quite naturally with a variant of the linear quadratic regulator that incorporates multiplicative noise, which has a long history in control theory but is far less widely known than its additive noise counterpart [Wonham1967, gravell2019learning]. Consider the linear quadratic regulator problem with dynamics perturbed by multiplicative noise
2
&π∈Πminimize && E_x_0,{¯A_t}, {¯B_t } ∑_t=0^∞(x_t^Q x_t + u_t^R u_t),
&subject to && x_t+1 = (A + ¯A_t) x_t + (B + ¯B_t) u_t,
where and are i.i.d. zeromean random matrices with a joint covariance structure over their entries governed by the covariance matrices and , which quantify uncertainty in the nominal system matrices and can be estimated from data using bootstrap methods.
Just as in additive noise LQR problems, dynamic programming can be used to show that the optimal policy is linear state feedback . Given the problem data , the optimal quadratic cost matrix is given by the solution of the generalized Riccati equation P & = Q + A^P A + ∑_i=1^n^2 α_i A_i^P A_i  A^P B (R + B^P B + ∑_j=1^nm β_j B_j^P B_j)^1 B^P A, and the associated optimal gain matrix is , where and
are the eigenvalues and reshaped eigenvectors of
and , respectively. The solutions are denoted . The optimal cost and policy can be computed via value iteration, policy iteration, or semidefinite programming ([ElGhaoui1995, gravell2019learning]). Note that like traditional robust control but unlike additive noise LQR, the optimal cost matrix and control gain depend explicitly on the model uncertainty.4 Robust Adaptive Control via Bootstrapping and Multiplicative Noise
Our robust adaptive control algorithm is summarized in Figure LABEL:fig:block_diagram and Algorithm 1. The algorithm has three main components: (1) a leastsquares nominal model estimator; (2) a bootstrap resampling method that quantifies nonasymptotic variance of the nominal model estimate; and (3) a nonconventional robust control design method using an optimal LQR with multiplicative noise.
4.1 Least Squares Estimation for the Nominal Model
The first component of the algorithm is a standard leastsquares estimator for the unknown system matrices from stateinput trajectory data. In particular, at time from data , we form the estimate
(2) 
More explicitly, defining the data matrices , then the least squares estimate can be written as
(3) 
A nondegenerate model estimate is obtained only when is invertible, so learning is divided into a pure exploration phase until a userspecified time and subsequently an explorationexploitation phase, where the estimated model is used to design a control policy. The exploration component of the input signal is iid Gaussian noise with userspecified covariance matrix ; it is well known that for any the leastsquares estimator is consistent under our modeling assumptions. The exploration noise is designed to fade out with the bootstrapestimated model uncertainty, which yields asymptotic optimality. Under mild assumptions, the leastsquares estimator can be implemented recursively to significantly simplify the repeated computation in eq:lse as data arrives (e.g. [simon2006optimal]).
4.2 Bootstrap Resampling to Quantify NonAsymptotic Model Uncertainty
There are inevitably errors in the leastsquares estimate obtained from any finite data record, due to the process noise affecting the system dynamics. Due to dependence in the time series data, unfortunately it is not straightforward to analytically characterize nonasymptotic uncertainty in the leastsquares estimate using standard statistical techniques. Therefore, to quantify nonasymptotic uncertainty in the model estimate, we propose a time series bootstrap resampling procedure. There are three broad bootstrap methods for time series [hardle2003bootstrap]: (1) Parametric bootstrap; (2) Semiparametric bootstrap with resampled residuals; (3) Nonparametric bootstrap with block resampling. In parametric and semiparametric methods, bootstrap data are simulated from the nominal model with the process noise sampled iid with replacement either from an assumed distribution or from residuals calculated with the nominal model. Dependence in the data is preserved by construction. In nonparametric methods, overlapping time blocks of consecutive data are sampled from the original data to preserve dependence. For definiteness, a semiparametric bootstrap with resampled residuals for the leastsquares estimator discussed above is summarized in Algorithm 2.
4.3 Multiplicative Noise LQR
The least squares estimator and boostrap provide both a nominal estimate of the system model and an estimate of the covariance of the nominal model error. These quantities provide precisely the input data needed to compute an optimal policy for the LQR problem with multiplicative noise from the generalized Riccati equation eq:genriccati. This policy is known to provide robustness to uncertainties in the parameters of the nominal model [bernstein1986robust]. Furthermore, the uncertainty in the nominal model estimate used in this control design method is richly structured and derived directly from the finite available data.
We introduce a parameter which provides a fixed scaling of the model uncertainty. Note that corresponds to certainty equivalent adaptive control, and as increases, more weight is placed on uncertainty in the nominal model. Existence of a solution to the generalized Riccati equation eq:genriccati depends not just on stabilizability of the nominal system , but also on the meansquare stabilizability of the multiplicative noise system. When the multiplicative noise variances are too large, it may be impossible to stabilize the system in the meansquare sense. In this case, we scale down the model variances at each time step if necessary to compute a meansquare stabilizing control gain via bisection; see Algorithm 3.
In particular, we verify the system with specified is meansquare stabilizable by checking whether the generalized Riccati equation in eq:genriccati admits a positive semidefinite solution; if not, we find the upper limit via bisection (e.g. [burden1978numerical]) on a scaling .
5 Numerical Experiments
For brevity, we abbreviate “certaintyequivalent” as “CE” and “robustness via multiplicative noise” as “RMN”. To evaluate the performance of the proposed RMN algorithm relative to CE control, we performed Monte Carlo sampling to estimate the distribution of several key quantities: instantaneous regret, model error, and multiplicative noise variances.
The instantaneous regret is heavytailed in the sense that the effect of outliers with nonnegligible probability is significant: some exceptionally poor sequences of model estimates induce extremely high costs relative to the median. For this reason, to facilitate the most direct comparison of CE and the proposed RMN approach, we train the models using both control schemes on identical offline training data. This way the effect of outlier model estimates is applied uniformly to both algorithms, since at each time the algorithms are faced with exactly the same model estimates. In an online adaptive control setting, where the training data are the actual state trajectories experienced under adaptive control, a direct comparison of the approaches is more difficult. While we observe qualitatively similar benefits of the proposed approach, our future work will study this setting with a greater number of Monte Carlo samples to reduce the effect of outliers.
The training data are generated by initializing the state at the origin, applying random controls distributed according to a standard Gaussian distribution (zeromean, identitycovariance), and simulating the evolution of the state with the additive process noise specified by the problem data
. The resulting training data are a set of state trajectories and input trajectories . Model and uncertainty estimates are generated at time according to Algorithm 1 using training data only up to time . The optimal cost is empirically calculated by averaging over all Monte Carlo samples the cost incurred by trajectories under optimal control , for each additive noise realization, i.e. c^*_t &= 1Ns ∑_k=0^N_s x_t^*,(k)^Q x_t^*,(k) + u_t^*,(k)^R u_t^*,(k)where x_t+1^*,(k) &= A x_t^*,(k) + B u_t^*,(k) + w_t^(k), u_t^*,(k) = K^* x_t^*,(k) The empirical cost under adaptive control is calculated similarly without averaging over Monte Carlo samples. Instantaneous regret is calculated by subtracting the empirical cost under optimal control from the empirical cost under adaptive control using each scheme i.e. .
We evaluated the CE and RMN algorithms on a scalar system with true system and cost parameters , , , , . The level of additive process noise is significant enough that an appreciable number of model estimates remain poor for many timesteps; this is necessary to observe a difference between CE and RMN control. We simulated the system over a time horizon of steps. We drew independent Monte Carlo samples and bootstrap samples at each time step for uncertainty estimation. We used unity scaling of the multiplicative noise () and a tolerance of for bisection to find the largest scaling of multiplicative noise variance in the multiplicative noise LQR algorithm. We used an exploration time of which ensures the leastsquares estimate is nondegenerate. The figures have xaxis limits truncated to .
In Figure LABEL:fig:plot_instant_regret
we plot statistics of instantaneous regret using CE control and using RMN control. We are chiefly interested in performance in terms of expected regret and upper quantiles, which correspond to regret risk. Figure
LABEL:fig:plot_instant_regret demonstrates that the multiplicative noise control achieves much lower instantaneous regret in terms of both the mean and upper quantiles. In particular, we see that the performance of the multiplicative noise control is clearly many orders of magnitude better between the start and , , for the 99.9th, 99th and 95th quantiles, respectively. After several time steps the model estimates improve and uncertainty estimates become sufficiently small that the difference between CE and RMN control becomes insignificant. The heaviness of regret tails is shown by the massive difference between the mean and median.In Figure LABEL:fig:plot_ABerr we plot statistics of the nominal model estimate errors, which are applicable to both control schemes. This shows that the leastsquares estimator provides models of increasing accuracy as time goes on, as expected. In Figure LABEL:fig:plot_alphabeta we plot statistics of the multiplicative noise variances using RMN control. This shows that the multiplicative noise variances accurately reflect the true model error, i.e., the boostrap model uncertainty estimator gives reasonable estimates.
The benefits of RMN control over CE control more generally obviously cannot be inferred from this single instance. Indeed other preliminary numerical results on higher dimensional examples indicate that on some problem data (, , , , ) the benefits of RMN control and how best to select the algorithm parameters are unclear, especially in initial stages when there is very high uncertainty around the nominal model estimates. However, there are at least some systems, like the one shown here, that are controlled with significantly lower risk using RMN control. We expect that by explicitly incorporating model uncertainty into the adaptive control design, it should be possible to realize the observed robustness benefits more broadly, which motivates further theoretical study.
Code which realizes the algorithms of this paper and generates the reported results is available from
https://github.com/TSummersLab/robustadaptivecontrolmultinoise.
6 Conclusions
We proposed a robust adaptive control algorithm that uses the bootstrap to estimate model estimate covariances and a nonconventional multiplicative noise LQR robust control method. Ongoing and future work will go towards providing finitetime theoretical performance guarantees using tools from highdimensional statistics, finding algorithm parameters that ensure uniform improvements over certaintyequivalent control for any system, and implementing model uncertainty estimates using recursive leastsquares to alleviate computational burden.