## 1 Introduction

In model-based control theory, nominal system models are used to rigorously generate optimal policies and provide stability guarantees. In the presence of model uncertainties and measurement noise, model-based controllers may show poor performance or even instability. Data-driven control design techniques [heess2017emergence, lillicrap2015continuous] address this issue by taking away the need for a system model and directly learning viable policies through repeated interactions with the real system. Nevertheless, the stability and robustness analysis under these techniques are limited to the visited state space. Moreover, to avoid hardware damages, training may require conservative state constraints, consequently limiting the capabilities of the learning technique [ARGALL2009469, beaudoin2021structured].

Bayesian learning (BL) offers a way to simultaneously utilize the learning framework of data-driven techniques and address the uncertainties in model-based techniques. A common approach is Bayesian model learning, which infers a stochastic dynamical model from a series of interactions with the real system, from which a controller is designed [sadigh2015safe, shen2022online, pmlr-v54-linderman17a]. Other frameworks take this approach one step further, by using BL to simultaneously quantify model uncertainties and learn viable policies while imposing Lyapunov stability constraints [fan2020bayesian]. Such techniques have become common with safety critical systems, such as space exploration vehicles and drones, that need to interact and make decisions in an uncertain environment with low risk of damage.

The robustness properties of BL are also unmatched; it provides a principled way of adding prior information to learn probability distributions over the parameters of the model, while taking into account the uncertainty in the learning process

[murphy2012machine]. The results in [cardelli2019robustness] and [cardelli2019statistical]have shown that in contrast to deterministic models, whose parameters are given by point estimates, Bayesian models are more robust against input and parameter perturbations. Moreover, Bayesian adversarial learning techniques intentionally add uncertainties to consider the potential effects of adversaries in the learning process

[wicker2021bayesian]. Such techniques have shown superiority over point estimates in adversary accommodation and robust performance [ye2018bayesian].In this work, we bypass the need to learn a stochastic dynamical model and directly employ BL on the control search task. The goal is to learn robust controllers under system parameter and measurement uncertainties. We provide a comparison study on the robustness properties of deterministic and Bayesian solutions to the optimal control problem. The study examines error sensitivity of the deterministic solution, which inherently assumes perfect knowledge of the system parameters and measured states. We also explore the advantages of reasoning about system parameter and measurement uncertainties into learning a stochastic optimal controller via Bayesian learning technique.

## 2 Theoretical Justification for the Robustness Properties of Bayesian Learning

In this section, we demonstrate the improved robustness properties of Bayesian learning over point-estimates of a policy. We start with an open-loop unstable -dimensional affine control system, over which we close the loop with a linear controller whose sole parameter is to be determined in an optimal manner. We solve and analyze two stochastic optimal control problems: (i) optimal control of the nominal system subject to only parameter uncertainty, where the only system parameter

is distributed according to a Gaussian distribution, (ii) optimal control of the nominal system subject to both parameter and measurement uncertainties, where the control signal is corrupted due to imperfect measurement of the state

.We compare the optimal controller found by assuming perfect knowledge of the system parameter to one that finds a posterior probability distribution that takes into account system uncertainties. We show that as the uncertainty grows, the two solutions may be quite far apart, and that marginalizing the probability distribution over the posterior distribution produces more robust controllers; that is, controllers that stabilize a wider range of system parameters. Our results also show that the risk of instability increases drastically as the measurement error increases, further favoring the use of Bayesian learning to infer a posterior distribution before making controller decisions.

### 2.1 Optimal Control under Parameter Uncertainty

Let us consider the first-order scalar control system, whose system parameter is uncertain:

(1) |

We assume that where designates our best prior point estimate of the system parameter and quantifies the uncertainty in the knowledge of the system parameter. The controller is set to be linear in the state with its only parameter to be determined through optimization. Without loss of generality, we will take the initial condition . The performance index to be optimized for determining the best control parameter is

(2) |

where is the control horizon and and are design parameters. We solve the control system (1) to find and plug this into the performance index (2) along with the form selected for the controller. Performing the integration over time and letting , assuming that then yields the infinite-horizon optimal cost functional

(3) |

The optimal control parameter may be found as the appropriate root of .

(4) |

The fact that

implies that the optimal control parameter has the probability density function

where is the Gaussian probability density function with mean

and variance

.We can further eliminate the control parameter from the expression for the optimal cost function by substituting for from equation (4), yielding

Hence, the distribution of the optimal cost conditioned on the system parameter is

Notice that the distribution of both the optimal control parameter and the optimal cost are elements of the exponential family that are not Gaussian.

In order to derive some quantitative results, let us assign some numerical values to the parameters that define the optimal cost function , our best guess of the system parameter

and its standard deviation

. The optimal control parameter and cost derived for this system whose model is assumed to be known perfectly are given by with the corresponding estimated cost . This deterministic performance estimate turns out to be overconfident when uncertainties in the system parameter are present. For example, if the prior knowledge on the distribution of the system parameter is utilized, the expected value of the controller parameter is found as and the corresponding expected cost is . The controller from the deterministic training/optimization is not only overconfident about its performance; but also is less robust against modeling errors, as the Bayesian learning yields a closed-loop stable system for a wider range of values of .Finally, Figure 1 shows the optimal control parameter distribution given that the system parameter is normally distributed with mean , standard deviation . This figure also shows the mean values of the optimal control distribution with the black arrow and the optimal control parameter a deterministic approach would yield in red. We notice that the Bayesian learning that yields the optimal control parameter distribution is more concerned about system stability due to the uncertainty in the parameter , a feat that the deterministic training may not reason about.

### 2.2 Optimal Control under Parameter Uncertainty and Measurement Noise

Consider the scenario in which the system (1) is also subject to measurement errors; that is, our measurement model for the state is probabilistic and is distributed according to the Gaussian . Since the controller uses this measurement to determine its action, the closed-loop system has to be modelled as a stochastic differential equation (SDE), given by

(5) |

where denotes the Wiener process [evans2012introduction]. The initial state is assumed deterministic and is set to unity for simplicity. The unique solution to this SDE is given by

(6) |

###### Lemma 1

The conditional expectation of the performance index (2) given the system parameter is

###### Proof 1

It is easily shown that this quantity is positive for all . Furthermore, it blows up as the horizon is extended to infinity. This is not surprising since a nonzero measurement noise causes the state to oscillate around the origin, rather than asymptotically converging to it, incurring nonzero cost all the while.

We have kept the system parameter constant in this analysis so far. Uncertainty over this variable can be incorporated by taking a further expectation

of over , which must be accomplished numerically as it does not admit a closed-form expression.

We can then minimize over the control parameter in order to study the effects of both kinds of uncertainties on the optimal controller. Such a study is provided in Figures 3 and 3, where we have plotted the optimal control parameter and the minimal expected cost as a function of the standard deviations of the measurement noise and the system parameter . The constants we used to generate the data are given by and . Our first observation is that the magnitude of the optimal control parameter is an increasing function of system parameter uncertainty and a decreasing function of measurement uncertainty. Our second observation is that if the measurement noise is small, then the optimal control parameter is insensitive to system parameter uncertainty as long as this uncertainty is small. The optimal cost shares this insensitivity for an even wider range of values of . In a similar vein, if the uncertainty in the system parameter is large, then the optimal control parameter is insensitive to the magnitude of the measurement noise. However, the optimal cost is still sensitive to this quantity.

#### Key Takeaways

There are several advantages, to be inferred from the analyses in this section, of employing Bayesian learning to find the optimal control parameter . Note that the optimal control parameter and cost derived for this system whose model is assumed to be known perfectly without measurement noise are given by and . This deterministic performance estimate is greatly overconfident when uncertainties in the system parameter and measurement are present. For example, if and then the expected cost with this controller parameter is, in fact, and the overconfidence is an increasing function of both kinds of uncertainties.

The deterministic optimal controller certainly yields a controller parameter that yields a stable system for a range of values for the system parameter . In the numerical example, even if our best belief of is erroneous by , this controller will stabilize the system assuming no measurement noise.

However, consider the situation where we are pretty certain about the system parameter. Then, in the presence of measurement noise, this controller parameter induces a much larger expected cost than the optimal controller parameter, that Bayesian learning yields (which has smaller magnitude, i.e., ). On the flip side, if the system parameter uncertainty is large, (e.g. ), then, again, Bayesian learning is able to account for this fact to yield a controller parameter that is more robust. In this case, the situation is reversed and we find out . Clearly, stabilizes the system for a wider range of the true values of the system parameter than does. When both measurement and system parameter uncertainties are present, Bayesian learning is able to precisely strike the trade-off between these competing uncertainties and yield controller parameter that results in a much better expected cost than a deterministic optimization does.

Finally, Figure 4 shows the Laplace (Gaussian) approximation to the optimal control parameter distribution given that the measurement noise has standard deviation , and the system parameter is distributed according to a truncated normal with mean , standard deviation and lower and upper bounds . This figure also shows the mean values of the optimal control distribution with the black arrow and the optimal control parameter a deterministic approach would yield in red. We notice that the Bayesian learning that yields the optimal control parameter distribution is more concerned about system stability due to the uncertainty in the parameter , a feat that the deterministic training may not reason about.

## 3 Conclusion

We solve the optimal control problem under the deterministic and Bayesian settings. The optimal control parameter changes sharply as a function of uncertainties in system parameters and measurements. This encourages the use of the Bayesian approach, which takes this variation into account, coming up with controllers that yield closed-loop stable systems for a wider range of system parameters. These findings motivate the use of Bayesian solutions to the control search problems, paving the way for the integration of Bayesian learning in data-driven control design techniques.