Parameter estimation in complex dynamical systems is a challenging task. In the autonomous sector, accurate parameter estimation is crucial for reliable operation and control in many safety-critical applications, e.g., search and rescue, transport delivery [autonomous_applications1], [autonomous_applications2]. Model uncertainties and inaccuracies lead to undesirable behavior in such systems with the potential of endangering health and lives. The problem is further compounded due to the evolving nature of dynamical systems. Regressor based methods, framed as inverse problems, are often used for estimating unknown parameters [parameter_inverse_aster2018]
. However, they require a lot of data and are typically limited only to offline settings. Classical approaches in control theory typically use a model reference and estimate unknown parameters online using adaptation laws as ordinary differential equations[adaptive_control_aastrom2013]. However, parameter drift is a key bottleneck in such techniques. In this paper, we propose a novel method of estimating parameters online based on a Bayesian optimization framework using Gaussian processes (GPs).
Parameter estimation is ubiquitous in science. A popular approach for estimating unknown parameters relies on inverse modeling [parameter_inverse_aster2018]. Generally, in these methods a fitting optimization is performed based on numerical data for estimating unknown parameters. Inverse modeling techniques were used to estimate soil parameters as a multiobjective optimization problem [param_soil_mertens2006] and for parameters determining flexible hydraulic functions [param_hydraulic_durner1999]. However, these methods require a lot of data (which can be expensive), result in poor fitting without sufficient data, do not guarantee any global optima, or suffer from numerical instability. Moreover, techniques are limited to the offline setting which hinders their applicability for safety-critical systems.
Classical methods in control theory estimate parameters online using a reference model. With a nominal model of the plant running in the background, either controller gains can be updated (direct method) or plant parameters can be estimated first followed by updating the control laws (indirect method) [adaptive_control_aastrom2013]. A composite approach combines both the methods giving better plant and controller estimates [adaptive_quadrotor_dydek2012]. A problem arises when adaptation gains need to be very fast and high to deal with uncertainties. This leads to high frequency oscillations in the system amplifying noise or disturbances. adaptive control deals with fast adaptation by decoupling robustness and performance using a low-pass filter [l1_adaptive_hovakimyan2010]. A key challenge well known in adaptive control is parameter drift during estimation [adaptive_control_failures_anderson2005]. To address this challenge, persistency of excitation (PE) is required. Dynamical regressor extension and mixing was proposed for estimating parameters with convergence without requiring PE [param_dynamic_regressor_aranovskiy2016]
. However, this has many tunable degrees of freedom for ensuring convergence under strict conditions, such as, non-square integrable determinant (for linear regressors) and monotonicity (for nonlinear regressors).
With recent advances in machine learning, optimization algorithms have been developed that require no manual tuning. Bayesian optimization (BO) finds the global optima for a given cost function with minimal evaluations without making any gradient assumptions[tutorial_bo_brochu2010]
. It models the underlying cost function as a GP, and exploits its posterior variance for informative samples to reach the global optima. Another major advantage is that it explicitly models noise, which is pervasive in any practical system, without compromising the ability to reach the global optima. BO is most popularly used for hyperparameter tuning of learning algorithms[practical_bo_ml_snoek2012]. BO also found its applicability in dynamical settings for gait optimization of legged robots [bo_gait_lizotte2007], and controller optimization in snake-robot [bo_snake_matthew2011] and quadrotor [bo_quadrotor_berkenkamp2016] with very few evaluations. However, only controller gains were optimized instead of identifying unknown parameters which may contain important diagnostic information. For instance, estimating an unexpected change in mass allows not only controller modification but also necessary changes in path planning if desired, versus simply optimizing the controller gain while being agnostic to the root cause of the change. As far as we know, BO was used for estimating parameters only in arterial haemodynamic setting [bo_haemodynamics_paris2016]; but restricted to offline case and systems which are not safety-critical.
Our key contributions
are the following. First, we perform parameter estimation online for safety-critical systems using GPs. The proposed paradigm leverages Bayesian statistics for training correlated surrogates of the underlying cost function while modeling noise to reach the global optima. To the best of our knowledge, BO was not used for parameter estimation online for safety-critical systems. Second, we model noise while reaching the global optima with very few evaluations without making gradient approximations. This addresses realistic scenarios since noise is pervasive and computing the gradient is always not possible. Third, the parameters estimated from the optimization routine are iteratively used to reconfigure the controller for a dynamically evolving system. This eliminates parameter drifting, since convergence is achieved iteratively to the global optima, while not having to invoke any PE condition.
Ii Background Preliminaries
In this section, we provide important preliminaries surrounding GP regression and BO that is used throughout the paper.
Ii-a Gaussian Process Regression (GPR)
We are interested in learning an underlying latent function , for which we assume noisy observations given by, , where . Given a set of
training points, with input vectors, and scalar noisy observations , we compose the dataset: , where and . A GP places a distribution on the function
, treating it as random variables associated with different values of
, any finite number of which produces a consistent joint Gaussian distribution[Rasmussen2003_gpml].
A GP can be fully specified by its mean function and covariance function . The latter is also called the kernel, measuring similarity between any two inputs . GPs can be used to predict the function value, , for an arbitrary query point , by conditioning on previous observations. The posterior predictive mean and variance are then given by [Rasmussen2003_gpml]:
where is the covariance between the input points in and query point , has entries , is the covariance matrix between pairs of input points, and
is the identity matrix. The hyperparameters of a GP depend on the kernel choice and can be problem-dependent. We refer the reader to[Rasmussen2003_gpml] for a review of different kernels. The values of the hyperparameters that best suit the particular dataset can be derived by maximizing the log marginal likelihood using quasi-Newton based methods , e.g., L-BFGS [Rasmussen2003_gpml].
Ii-B Bayesian Optimization (BO)
Now consider another unknown function , whose global optima we are interested in. The objective of BO is to find the global optima of such an unknown function. The function could represent the cost function or the performance measure of a system which could be expensive to evaluate. Since is unknown, finding its global optima is an impossible task. Hence, several assumptions are in order.
The objective function is typically considered to be a black box function; we may not have its analytical expression or derivatives. Bounds are placed on the possible search space to find the optima. Evaluation of the function is restricted to querying at a sample and observing a (noisy) estimate. With these assumptions in mind, a surrogate predictor replaces the expensive function (see II-A
) . GPs are popular surrogate predictors among several candidates, such as neural networks or radial basis functions, due to its Bayesian non-parametric nature. GPs also offer a flexible prior over function, robust analytical tractability, and high probabilistic guarantees[Rasmussen2003_gpml].
In general, BO models the unknown function using the GP posterior mean given by (1). GP’s probabilistic structure enables an efficient search strategy for the next sampling point by exploiting its predictive variance (2
). This gives rise to what is called an acquisition function. By efficiently searching over the space using the acquisition function, exploration-exploitation trade-off is addressed to find the global optima. There are several acquisition functions to choose from, e.g., probability of improvement (PI), expected improvement (EI), lower-confidence bound (LCB). We refer the reader to[tutorial_bo_brochu2010] for detailed analysis of different acquisition functions and their tradeoffs. Figure 1 demonstrates the sequential nature of BO for finding the global minima of an unknown function using EI acquisition function.
Iii Parameter Estimation using Gaussian Process
Here, we describe our proposed strategy for estimating unknown parameters for a dynamical system using GPs and BO.
Iii-a Bayesian Optimization Formulation
We consider a nonlinear, continuous-time system,
where is the state, is the control input, and are the system parameters at time . We assume a nominal control policy exists, where are the controller gains and are the nominal system parameters at time . is designed to drive to the zero-equilibrium stable point under nominal conditions. Ideally, is designed to be nominally close to . However, can change, thereby degrading the controller performance since it depends on . Hence, the controller’s must be updated to better approximate .
The goal is to estimate the time-varying and reconfigure the nominal controller’s with the new estimate online. To learn the unknown estimate , we construct a model of the dynamical system to match a target response . This translates to the following optimization problem,
in some suitable norm, where are the estimated parameters. Here is an unknown and complex nonlinear map representing the error response surface. To alleviate these complexities, we use GPs as surrogate predictors to construct approximating the true unknown surface . We incorporate knowledge of the dynamics and noise as seen in (4) for more accurate posterior constructions of the response surface using GPs. Here, represents the noisy measurements of the system dynamics .
For a complex dynamical system, it is not feasible to perform numerous optimization steps requiring immense data. This is because the system is already deployed and reaching the feasible solution with minimal evaluations is crucial. Hence, the sampling strategy needs to be efficient and exploitative in its approach to reach the global optima for a time-varying dynamical system.
Iii-B Gaussian Process Modeling
Now that the minimization objective has been formulated, we next look at the design of the GP surrogate and acquisition function. The surrogate agent is responsible for modeling the underlying unknown function , whereas the acquisition function is responsible for guiding the sampling strategy at new query points.
Since GPs act as an intermediate agent to model in (4), a kernel function is required for describing GPs. In this work, we choose from the family of Matérn kernels. Owing to its flexible parameterization and simplicity, this family of kernels is commonly employed in the literature [Rasmussen2003_gpml]. An attractive property of the Matérn kernel is in its ability to model a wide spectrum of responses ranging from infinitely differentiable functions to rough Ornstein-Uhlenbeck paths. The Matérn kernel is given by,
where and are any two samples, is process variance, is the Euclidean distance between the samples weighted by which is a diagonal matrix with elements, , corresponding to separate length scales for each . Hence, the kernel used is the automatic relevance determination Matérn kernel.
Next, the acquisition function is to be determined which is key in navigating the sampling strategy for finding the global minima in (4). The construction of the acquisition function uses the GP posterior prediction (1-2) parameterized by a kernel of choice (5). The acquisition function then informatively samples new query points in order to balance the exploration-exploitation divide, i.e., searching globally unexplored places and minimizing uncertainty versus exploiting a local search expecting the global optima to reside nearby. In this work, we focus on the expected improvement (EI) function which is one of the most commonly used acquisition functions addressing the exploration-exploitation mindset. The improvement function originally proposed in [expected_improvement_mockus1994] is,
where is the next evaluation, is the best posterior mean from GP on past points or the best observation so far. Intuitively, this evaluates to a positive improvement when the prediction is higher than the current best value, else it is set to zero. By maximizing the improvement function, the next sample’s location is then found,
where represents all the observations collected thus far. During exploration phase, the function would choose points where the surrogate variance is large. During exploitation phase, the function would choose points where the surrogate mean is large.
Iii-C Controller Reconfiguration
The primary objective is to estimate unknown parameters in an online fashion and update the controller for a dynamical system. While BO reaches the global optima in few evaluations, which is very attractive for our interests, waiting for it to converge to the global optima overrules the online requirement. Hence, it is imperative to update the controller iteratively while the optimizer converges to the global optima. The nominal controller then takes on the form to keep the system operating under nominal conditions. Here, the nominal operating behavior can be quantified by bounding below an error threshold given by, , where is a desired reference trajectory, and are the states of the system. If the threshold is exceeded, then the optimizer would solve for finding the unknown estimates.
We construct the termination criteria of the optimizer in terms of number of evaluations, . Due to the explorative-exploitative approach to reach the global optima, before evaluations are completed, it is possible to return parameter estimates that are critical to the system since the controller is reconfigured at every step. Therefore, this may allude one to posing the following question: As new estimates returned from the optimizer are used to reconfigure the controller, would the system remain safe?
This problem is reasonably alleviated given fast computations since each evaluation time will be faster (several ) than the physical time constant of the system. Since the EI function samples for locations with the improvement criteria, the controller progressively moves to better parameter estimates. Hence, it does not use (potentially) unsafe parameters for a duration longer than the physical time constant of the system. Finally, after steps, the controller eventually commits to the best parameter estimates, rendering the system to a better condition than the nominal case.
Iii-D Algorithm Overview
We summarize our proposed framework of finding unknown parameters online for safety-critical systems as shown in Algorithm 1. The algorithm is initialized with a domain for unknown parameters, reference trajectory for a time period, and error threshold with maximum iteration budget for initiating or terminating the optimizer.
The proposed framework’s main actions happen from lines . As seen from the algorithm, if the trajectory error does not exceed an error threshold, then the nominal controller designed on line suffices. If the threshold is exceeded, then the GP posterior is used to efficiently find the globally optimized unknown parameters iteratively (lines ).
An initial dataset is constructed using random parameters and their responses (lines ). Using the GP mean and variance (1-2) in line , the EI function (6) is used to query the next parameter estimate with the highest improvement (line ). While (6) is also an optimization problem, it only relies on the GP model and does not require any evaluations on the real dynamical system. This corresponds to cheap computations and can be optimized quickly. The new parameter estimate is then used to reconfigure the controller in line . Noisy observations are embedded while observing the system states (lines and ). An interation budget of helps tune the number of function evaluations to reach the global optima.
Iv Simulation Results
Here, we aim to demonstrate the effectiveness of our method through two systems. In the first scenario, we study the case of an actuated planar pendulum with changing parameters, with the intent of providing a pedagogical example highlighting the main aspects of our work. In the second scenario, we take a more realistic example of a safety-critical quadrotor. The quadrotor is an ideal candidate for validating the efficacy of our method due to its inherently underactuated structure and highly nonlinear dynamics evolving in the tangent bundle to the Special Euclidean group. As the parameters change in the quadrotor, it easily deviates from the reference trajectory, or worse, performs erratically and becomes unstable. Hence, it is important to accurately estimate the unknown parameters as quickly as possible for the quadrotor to remain in stable flight operation.
Iv-a Actuated Planar Pendulum
Consider a planar pendulum with friction that has a single degree of actuation. It has two states, which are the angular position and velocity of the pendulum respectively. The dynamics is given by,
where is the state of the system, is the pendulum mass, is the length of the rod, is the friction coefficient, and is the control input. Feedback linearization is used to design the nominal controller as shown below,
where , , and are positive gains. To enable a meaningful visualization of the process, we confine ourselves to the study of a two-dimensional unknown parameter space. Let us consider that the pendulum’s mass and length change at an unknown time. The cost function using knowledge of the dynamics and resulting in the GP surrogate’s response surface is given by,
where is the observed dynamics and are the design parameters. The nominal mass and length are changed from to and to at . The reference is set as . The domains set are for mass and for length, , and . The controller in (9) is iteratively modified with new parameters, at every time step until evaluations are reached.
As seen in Figure 2, just under evaluations, the unknown parameters are estimated very close to the true minima. The deviation can be attributed due to modeling of noise, since the GP surrogates try to estimate the underlying true function, and is not a true representation. Combinations of masses and lengths are observed in an explorative manner to find the global estimate. Since the evaluations happen quickly, the controller progressively commits to better estimates and leads to a more stable tracking performance as seen in Figure 3. The controller is given parametric combinations far away from the true value for small time steps during the exploration phase. Despite this, the reconfigured controller tracks the reference due to progressively better reconfiguration unlike the nominal case.
Iv-B 3D Quadrotor
Here, we do a more thorough analysis of our proposed solution for a highly nonlinear and complex safety-critical system. We first present the dynamics and design of the nominal controller for the quadrotor. We then validate our parameter estimation framework in a four-dimensional setting where the mass and inertia matrix of the quadrotor are varying with time. This is a particularly challenging setting since a quadrotor is a highly unstable system and is very susceptible to parametric changes. Hence, accurate online parameter estimation is imperative. The simulation was done in MATLAB 2019.
We consider the complete dynamics of a quadrotor evolving in a coordinate-free framework. This framework uses a geometric representation for its attitude given by a rotation matrix in Special Orthogonal group, . The quadrotor has control inputs; thrust
and moments. The dynamical equations of motion are given by [quadrotor_geometric_lee2010]:
where is quadrotor velocity in inertial frame, is mass of the quadrotor, is gravity, is inertia matrix of the quadrotor, , and is body-frame angular velocity. For a complete mathematical treatment for the nominal controller, see [quadrotor_geometric_lee2010]. Here, we simply present the equations for nominal and :
where are positive definite gains, , , , and . The quadrotor’s inertial position and velocity are and . The desired position, velocity, orientation, and angular acceleration are and respectively.
is the skew-symmetric operator satisfying,, and is its inverse, i.e., .
References are sinusoids where position reference is and desired yaw is , for . Nominal parameters are , , gains , , , . The unknown time-varying mass and inertia are given by,
where and are the affected mass and inertia parameters respectively, is the inertial axis offset, , and . The goal is to estimate varying parameters for both slowly-varying and transient changes, since this demonstrates if parameter estimation is robust to different classes of changes. We assume inertial parameters form a diagonal matrix, hence the optimizer solves for an unknown inertial value along each principal axis. The response surface is constructed as follows,
) is incorporated along with Gaussian white noise,, in the surrogate GP modeling. We choose , , along with measurement noise where .
|Time||Optimization||True values||Estimated values||Mass||Inertial||Prediction|
|Instance ||Method||m||J||J||J||m||J||J||J||Error||Error||Time |
We first look at trajectory tracking performance. Applying the framework outlined in III-D, parameters are estimated online using GPs and the controller in (IV-B) is reconfigured iteratively. Estimation performance is compared against benchmark optimizers such as interior-point method (IPM) and sequential-quadratic programming (SQP). As seen in Figure 4, MSE for tracking is the least when reconfigured with parameters estimated using our methodology. Low MSE along and is due to transient changes in the mass which has a higher pronounced affect along . Our framework is robust to modeling of noise while finding the global optima and this gives a competing edge over other methods. It also drastically improves tracking performance as seen in Figure 5. Due to limited space, we present the performance along -dimension only since it incurs the highest MSE. This further validates our approach’s efficacy in safety-critical applications with improved control performance.
Next, we look at the parameter estimation accuracy using BO and other solvers as tabulated in Table I. Each solver had domain bounds of for mass and for each inertial axis. IPM and SQP use the nominal parameters as their initial starting points. We look at five time instances, one instance from each time interval in (13), for evaluating the estimation accuracy, given as norm error, and prediction time. In all instances, parameters estimated using BO have considerably lesser norm error than other solvers. Even in the case when other solvers have a lower prediction error, BO has marginally close prediction errors (, ). An important distinction of our approach over other solvers is robustness to noise and estimating parameters for step changes. IPM and SQP at times get stuck at local minima and is unable to estimate these transient changes. SQP is the fastest as expected due to its quasi-newton Hessian approximations while BO has the highest prediction time due to training of GPs and hyperparameter estimation.
Summarizing our work, we have proposed a novel framework using high probabilistic guarantees from GPs to estimate parameters online for safety-critical systems. Using a data-driven approach, BO allows for an efficient search strategy over a response surface to find the global optima. Leveraging the GP posterior mean and variance, the optimizer navigates towards the global optima with an exploration-exploitation mindset. Our formulation does not require gradient approximations and has the competitive edge of incorporating noisy observations in its pursuit to finding the global optima. Finally, we study two cases:- an actuated planar pendulum, investigating a two-dimensional parametric setting highlighting the main aspects of our work, followed by a safety-critical quadrotor system. For the quadrotor, we estimate parameters in a four-dimensional setting with slowly-varying and transient changes to the mass and inertia. We reconfigure the quadrotor controller online with estimated parameters and compare with other benchmark solvers using interior-point methods and hessian approximations. Due to better parameter estimation, the tracking MSE was the least using BO versus other solvers.
Although, our approach has many advantages, we would like to highlight some closing remarks. BO is limited to low-dimensions with increasing estimation time in higher dimensions. This is an active area of research to enable faster tractability using BO. One could leverage techniques such as adaptive mesh search for BO in higher dimensions to query samples quicker [bads_acerbi2017]. In this work, we focused on presenting the main idea for online parameter estimation using GPs.