Robust Learning-Based Control via Bootstrapped Multiplicative Noise

02/24/2020 ∙ by Benjamin Gravell, et al. ∙ The University of Texas at Dallas 0

Despite decades of research and recent progress in adaptive control and reinforcement learning, there remains a fundamental lack of understanding in designing controllers that provide robustness to inherent non-asymptotic uncertainties arising from models estimated with finite, noisy data. We propose a robust adaptive control algorithm that explicitly incorporates such non-asymptotic uncertainties into the control design. The algorithm has three components: (1) a least-squares nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. We show through numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent high profile successes and the resulting hype in machine learning and reinforcement learning are generating renewed interest in adaptive control and system identification, which have their own decades-long histories

[aastrom2013adaptive, ljung1998system]. Classical work on adaptive control and system identification largely focused on asymptotics, including stability, consistency, asymptotic variance, etc. Emerging research at the intersection of learning and control shifts focus to non-asymptotic statistical analyses, including regret and sample efficiency in various adaptive control and learning algorithms [abbasi2011regret, dean2018regret, Dean2019].

In both classical and emerging work in learning and control, robustness has been a key issue. The classical focus on asymptotics led to a strong emphasis on certainty-equivalent adaptive control, where inevitable uncertainty in model estimates is ignored for control design and unsurprisingly can lead to serious lack of robustness. It remains poorly understood how to best interface both asymptotic and non-asymptotic uncertainty descriptions of model estimates with robust control design methods. One of the main difficulties is a mismatch in uncertainty descriptions: system identification almost universally uses stochastic data models111A notable exception is set membership identification [milanese1991optimal], which maintains a set of models that could have produced the data within some error bound and interfaces somewhat more naturally with certain traditional robust control design methods., whereas robust control traditionally uses set-based descriptions and worst-case design. This can lead to unnecessary conservatism when the assumed model uncertainty sets are poorly aligned with inherent statistical model uncertainty.

The contributions of the present work are as follows:

  1. We propose a robust adaptive control algorithm where the model uncertainty description and robust control design method both use stochastic uncertainty representations.

  2. We show via numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.

The algorithm has three components: (1) a least-squares nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. This approach provides a natural interface between two widely used and highly effective methods from statistics and optimal control theory (namely, bootstrap sample variance and LQR). It is known that certainty equivalent adaptive control can achieve asymptotic optimality and statistical efficiency with an order optimal rate [chen2012identification, kumar2015stochastic, mania2019certainty]. However, neither of these imply anything about non-asymptotic optimality or robustness.

2 Problem Formulation

We consider adaptive control of the discrete-time linear dynamical system x_t+1 = A x_t + B u_t + w_t, where is the system state, is the control input, and is i.i.d. process noise with zero mean and covariance matrix . The system matrices are assumed unknown, so an adaptive controller is to be designed based only on state-input trajectory data , . We consider the linear quadratic adaptive optimal control problem

(1)

where and are cost matrices, and the optimization is over (measureable) history dependent feedback policies with . The constant in the stage costs represents the optimal infinite-horizon average steady state cost when the system matrices are known, which results from a static linear state feedback, , whose gain matrix can be computed via several known methods, including value iteration, policy iteration, and semidefinite programming. This constant gives the stage cost an interpretation as regret.

The finite horizon objective in aoc emphasizes the non-asymptotic

performance of the adaptive controller. This stands in contrast to a majority of classical work on adaptive control, which tends to focus on asymptotic performance and stability. Note that regret is a random variable that ideally should be small, and there are various ways to measure its size, including expected regret, regret variance, and measures of regret risk, such as value at risk or conditional value at risk.

It has been long known that this problem can be solved in principle by redefining the state as the (infinite-dimensional) joint conditional distribution over the original state and unknown model parameters and applying dynamic programming [bellman1961adaptive]. However, this approach is intractable even for the most trivial instances. Since computing the optimal policy exactly appears to be intractable, we instead aim to design a computationally implementable controller with good performance and robustness properties, i.e., one that achieves both small expected regret and small regret risk. In particular, our algorithm accounts for uncertainty in various directions by modeling them as multiplicative noises, in contrast to the isotropic robustness afforded by the system-level synthesis in [dean2018regret, Dean2019]. We compare with a certainty equivalent adaptive controller, where uncertainty is ignored and a controller is designed as if the point model estimates were exact.

3 Preliminaries: Multiplicative Noise LQR

Here we will represent parameter uncertainty in model estimates stochastically using covariance matrices estimated from bootstrap resampling of finite data records. This representation interfaces quite naturally with a variant of the linear quadratic regulator that incorporates multiplicative noise, which has a long history in control theory but is far less widely known than its additive noise counterpart [Wonham1967, gravell2019learning]. Consider the linear quadratic regulator problem with dynamics perturbed by multiplicative noise 2 &π∈Πminimize  && E_x_0,{¯A_t}, {¯B_t } ∑_t=0^∞(x_t^Q x_t + u_t^R u_t),
&subject to  && x_t+1 = (A + ¯A_t) x_t + (B + ¯B_t) u_t, where and are i.i.d. zero-mean random matrices with a joint covariance structure over their entries governed by the covariance matrices and , which quantify uncertainty in the nominal system matrices and can be estimated from data using bootstrap methods.

Just as in additive noise LQR problems, dynamic programming can be used to show that the optimal policy is linear state feedback . Given the problem data , the optimal quadratic cost matrix is given by the solution of the generalized Riccati equation P & = Q + A^P A + ∑_i=1^n^2 α_i A_i^P A_i - A^P B (R + B^P B + ∑_j=1^nm β_j B_j^P B_j)^-1 B^P A, and the associated optimal gain matrix is , where and

are the eigenvalues and reshaped eigenvectors of

and , respectively. The solutions are denoted . The optimal cost and policy can be computed via value iteration, policy iteration, or semidefinite programming ([ElGhaoui1995, gravell2019learning]). Note that like traditional robust control but unlike additive noise LQR, the optimal cost matrix and control gain depend explicitly on the model uncertainty.

4 Robust Adaptive Control via Bootstrapping and Multiplicative Noise

Our robust adaptive control algorithm is summarized in Figure LABEL:fig:block_diagram and Algorithm 1. The algorithm has three main components: (1) a least-squares nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method using an optimal LQR with multiplicative noise.

fig:block_diagram

Figure 1: Block diagram of our robust adaptive control algorithm.
0:  exploration time , input excitation covariance , number of bootstrap resamples , model uncertainty scaling parameter , cost matrices
1:  
2:  for  do
3:     if  then
4:        
5:        
6:     else
7:        
8:        
9:        
10:        
11:        
12:        
13:     end if
14:  end for
Algorithm 1 Robust Adaptive Control

4.1 Least Squares Estimation for the Nominal Model

The first component of the algorithm is a standard least-squares estimator for the unknown system matrices from state-input trajectory data. In particular, at time from data , we form the estimate

(2)

More explicitly, defining the data matrices , then the least squares estimate can be written as

(3)

A non-degenerate model estimate is obtained only when is invertible, so learning is divided into a pure exploration phase until a user-specified time and subsequently an exploration-exploitation phase, where the estimated model is used to design a control policy. The exploration component of the input signal is iid Gaussian noise with user-specified covariance matrix ; it is well known that for any the least-squares estimator is consistent under our modeling assumptions. The exploration noise is designed to fade out with the bootstrap-estimated model uncertainty, which yields asymptotic optimality. Under mild assumptions, the least-squares estimator can be implemented recursively to significantly simplify the repeated computation in eq:lse as data arrives (e.g. [simon2006optimal]).

4.2 Bootstrap Resampling to Quantify Non-Asymptotic Model Uncertainty

There are inevitably errors in the least-squares estimate obtained from any finite data record, due to the process noise affecting the system dynamics. Due to dependence in the time series data, unfortunately it is not straightforward to analytically characterize non-asymptotic uncertainty in the least-squares estimate using standard statistical techniques. Therefore, to quantify non-asymptotic uncertainty in the model estimate, we propose a time series bootstrap resampling procedure. There are three broad bootstrap methods for time series [hardle2003bootstrap]: (1) Parametric bootstrap; (2) Semi-parametric bootstrap with resampled residuals; (3) Non-parametric bootstrap with block resampling. In parametric and semi-parametric methods, bootstrap data are simulated from the nominal model with the process noise sampled iid with replacement either from an assumed distribution or from residuals calculated with the nominal model. Dependence in the data is preserved by construction. In non-parametric methods, overlapping time blocks of consecutive data are sampled from the original data to preserve dependence. For definiteness, a semi-parametric bootstrap with resampled residuals for the least-squares estimator discussed above is summarized in Algorithm 2.

0:  trajectory data , nominal model estimate , residuals , , number of bootstrap resamples
1:  
2:  
3:  for  do
4:     Generate data , , where is an iid resample with replacement from residuals
5:     
6:  end for
6:  Bootstrap sample covariance    Bootstrap sample covariance
Algorithm 2 Semi-Parametric Bootstrap

4.3 Multiplicative Noise LQR

The least squares estimator and boostrap provide both a nominal estimate of the system model and an estimate of the covariance of the nominal model error. These quantities provide precisely the input data needed to compute an optimal policy for the LQR problem with multiplicative noise from the generalized Riccati equation eq:genriccati. This policy is known to provide robustness to uncertainties in the parameters of the nominal model [bernstein1986robust]. Furthermore, the uncertainty in the nominal model estimate used in this control design method is richly structured and derived directly from the finite available data.

We introduce a parameter which provides a fixed scaling of the model uncertainty. Note that corresponds to certainty equivalent adaptive control, and as increases, more weight is placed on uncertainty in the nominal model. Existence of a solution to the generalized Riccati equation eq:genriccati depends not just on stabilizability of the nominal system , but also on the mean-square stabilizability of the multiplicative noise system. When the multiplicative noise variances are too large, it may be impossible to stabilize the system in the mean-square sense. In this case, we scale down the model variances at each time step if necessary to compute a mean-square stabilizing control gain via bisection; see Algorithm 3.

In particular, we verify the system with specified is mean-square stabilizable by checking whether the generalized Riccati equation in eq:genriccati admits a positive semidefinite solution; if not, we find the upper limit via bisection (e.g. [burden1978numerical]) on a scaling .

0:  Nominal model matrices , , cost matrices , , multiplicative noise scaling and covariances , , bisection tolerance
1:  Find largest via bisection such that there exists a feasible solution to eq:genriccati
1:  Cost matrix , gain matrix
Algorithm 3 Multiplicative Noise LQR

5 Numerical Experiments

For brevity, we abbreviate “certainty-equivalent” as “CE” and “robustness via multiplicative noise” as “RMN”. To evaluate the performance of the proposed RMN algorithm relative to CE control, we performed Monte Carlo sampling to estimate the distribution of several key quantities: instantaneous regret, model error, and multiplicative noise variances.

The instantaneous regret is heavy-tailed in the sense that the effect of outliers with non-negligible probability is significant: some exceptionally poor sequences of model estimates induce extremely high costs relative to the median. For this reason, to facilitate the most direct comparison of CE and the proposed RMN approach, we train the models using both control schemes on identical offline training data. This way the effect of outlier model estimates is applied uniformly to both algorithms, since at each time the algorithms are faced with exactly the same model estimates. In an online adaptive control setting, where the training data are the actual state trajectories experienced under adaptive control, a direct comparison of the approaches is more difficult. While we observe qualitatively similar benefits of the proposed approach, our future work will study this setting with a greater number of Monte Carlo samples to reduce the effect of outliers.

The training data are generated by initializing the state at the origin, applying random controls distributed according to a standard Gaussian distribution (zero-mean, identity-covariance), and simulating the evolution of the state with the additive process noise specified by the problem data

. The resulting training data are a set of state trajectories and input trajectories . Model and uncertainty estimates are generated at time according to Algorithm 1 using training data only up to time . The optimal cost is empirically calculated by averaging over all Monte Carlo samples the cost incurred by trajectories under optimal control , for each additive noise realization, i.e. c^*_t &= 1Ns ∑_k=0^N_s x_t^*,(k)^Q x_t^*,(k) + u_t^*,(k)^R u_t^*,(k)
where x_t+1^*,(k) &= A x_t^*,(k) + B u_t^*,(k) + w_t^(k),  u_t^*,(k) = K^* x_t^*,(k) The empirical cost under adaptive control is calculated similarly without averaging over Monte Carlo samples. Instantaneous regret is calculated by subtracting the empirical cost under optimal control from the empirical cost under adaptive control using each scheme i.e. .

We evaluated the CE and RMN algorithms on a scalar system with true system and cost parameters , , , , . The level of additive process noise is significant enough that an appreciable number of model estimates remain poor for many timesteps; this is necessary to observe a difference between CE and RMN control. We simulated the system over a time horizon of steps. We drew independent Monte Carlo samples and bootstrap samples at each time step for uncertainty estimation. We used unity scaling of the multiplicative noise () and a tolerance of for bisection to find the largest scaling of multiplicative noise variance in the multiplicative noise LQR algorithm. We used an exploration time of which ensures the least-squares estimate is non-degenerate. The figures have x-axis limits truncated to .

In Figure LABEL:fig:plot_instant_regret

we plot statistics of instantaneous regret using CE control and using RMN control. We are chiefly interested in performance in terms of expected regret and upper quantiles, which correspond to regret risk. Figure

LABEL:fig:plot_instant_regret demonstrates that the multiplicative noise control achieves much lower instantaneous regret in terms of both the mean and upper quantiles. In particular, we see that the performance of the multiplicative noise control is clearly many orders of magnitude better between the start and , , for the 99.9th, 99th and 95th quantiles, respectively. After several time steps the model estimates improve and uncertainty estimates become sufficiently small that the difference between CE and RMN control becomes insignificant. The heaviness of regret tails is shown by the massive difference between the mean and median.

In Figure LABEL:fig:plot_ABerr we plot statistics of the nominal model estimate errors, which are applicable to both control schemes. This shows that the least-squares estimator provides models of increasing accuracy as time goes on, as expected. In Figure LABEL:fig:plot_alphabeta we plot statistics of the multiplicative noise variances using RMN control. This shows that the multiplicative noise variances accurately reflect the true model error, i.e., the boostrap model uncertainty estimator gives reasonable estimates.

The benefits of RMN control over CE control more generally obviously cannot be inferred from this single instance. Indeed other preliminary numerical results on higher dimensional examples indicate that on some problem data (, , , , ) the benefits of RMN control and how best to select the algorithm parameters are unclear, especially in initial stages when there is very high uncertainty around the nominal model estimates. However, there are at least some systems, like the one shown here, that are controlled with significantly lower risk using RMN control. We expect that by explicitly incorporating model uncertainty into the adaptive control design, it should be possible to realize the observed robustness benefits more broadly, which motivates further theoretical study. Code which realizes the algorithms of this paper and generates the reported results is available from
https://github.com/TSummersLab/robust-adaptive-control-multinoise.

fig:plot_instant_regret [CE]  [RMN]

Figure 2: Instantaneous regret vs time for the example system using using certainty-equivalent (a) and multiplicative noise (b) control.

6 Conclusions

We proposed a robust adaptive control algorithm that uses the bootstrap to estimate model estimate covariances and a non-conventional multiplicative noise LQR robust control method. Ongoing and future work will go towards providing finite-time theoretical performance guarantees using tools from high-dimensional statistics, finding algorithm parameters that ensure uniform improvements over certainty-equivalent control for any system, and implementing model uncertainty estimates using recursive least-squares to alleviate computational burden.

fig:plot_ABerr []  []

Figure 3: Absolute error in estimated (a) A and (b) B matrices vs time for the example system.

fig:plot_alphabeta []  []

Figure 4: (a) State-dependent and (b) control-dependent multiplicative noise variances vs time for the example system using multiplicative noise control.
Figure 5: Scaling of multiplicative noise scale parameter vs time for the example system using multiplicative noise control.

References