Learning-based control introduces new ways for controller synthesis based on large-scale databases providing cumulated system knowledge. This allows for previously intense tasks, such as system modeling and controller tuning, to eventually be fully automated. For example, deep reinforcement learning provides prominent results, with applications including control of humanoid robots in complex environments and playing Atari Arcade video-games .
Despite the advances in research-driven applications, the results can often not be transferred to industrial systems that are safety-critical, i.e. that must be guaranteed to operate in a given range of physical and safety constraints. This is due to the often complex functioning of learning-based methods rendering their systematic analysis difficult.
By introducing a model predictive safety certification (MPSC) mechanism for any learning-based controller, we aim at bridging this gap for linear systems with additive uncertainties that can, e.g., result from a belief representation of an unknown non-linear system. The proposed MPSC scheme estimates safety of a proposed learning-based input in real-time by searching for a safe back-up trajectory for the next time step in the form of generating a feasible trajectory towards a known safe set. Allowing the MPSC scheme to modify the potentially unsafe learning-based input, if necessary, provides safety for all future times. The result can be seen as a ‘safety filter’, since it only filters proposed inputs that drive the system out of what we call the safe set. The resulting online optimization problem can be efficiently solved in real-time using established model predictive (MPC) solvers. Partially unknown larger-scale systems can therefore be efficiently enhanced with safety certificates during learning.
Contributions: We consider linear systems with additive disturbances, described in Section II, that encode the current, possibly data-driven, belief about a safety-critical system to which a potentially unsafe learning-based controller should be applied. A model predictive safety certification scheme is proposed in Section III, which allows for enhancing any learning-based controller with safety guarantees111Even human inputs can be enhanced by the safety certification scheme, which relates e.g. to the concept of electronic stabilization control from automotive engineering.. The concept of the proposed scheme is comparable to the safety frameworks presented in [3, 4] by providing an implicit safe set together with a safe backup controller that can be applied if the system would leave the safe set using the proposed learning input. A distinctive advantage compared to existing methods is that the MPSC scheme can build on any system behavior that is known to be safe, i.e. a known set of safe system states can be easily incorporated in our scheme such that it will only analyze safety outside of the provided safe set.
The approach relies on scalable offline computations and online optimization of a robust MPC problem at every sampling time, which can be performed using available real-time capable solvers that can deal with large-scale systems (see e.g. ). While we relate the required assumptions and design steps to tube-based MPC in Section IV, we present an automated, parametrization free, and data-driven design procedure, that is tailored to the context of learning the system dynamics. The design procedure and MPSC scheme are illustrated in Section V using numerical examples.
Related work: Making the relevant class of safety critical systems accessible to learning-based control methods has gained significant attention in recent years. In addition to the individual construction of safety certificates for specific learning-based control methods subject to different notions of safety, see e.g. the survey , a discussion of advances in safe learning-based control subject to state and input constraints can be found in . A promising direction that emerged from recent research focuses on what is called a ‘safety framework’ [3, 7, 4, 8], which consists of a safe set in the state space and a safety controller. While the system state is contained in the safe set, any feasible input (including learning-based controllers) can be applied to the system. However, if such an input would cause the system to leave the safe set, the safety controller interferes. Since this strategy is compatible with any learning-based control algorithm, it serves as a universal safety certification concept. The techniques proposed in [3, 7] are based on a differential game formulation that results in solving a min-max optimal control problem, which can provide the largest possible safe set, but offers very limited scalability. The approach described in  uses convex approximation techniques that scale well to larger-scale systems at the cost of a potentially conservative safe set. While these results explicitly consider non-linear systems, we focus on linear model approximations allowing for various improvements. We introduce a new mechanism for generating the safe set and controller using ideas related to tube-based MPC, which enables scalability with respect to the state dimension, while being less conservative than e.g. .
There is a methodological similarity to learning-based MPC approaches, as e.g. proposed in , or more recently in  considering nonlinear Gaussian process models. While such methods are limited to an MPC strategy based on the learned system model, this paper provides a concept that can enhance any learning-based controller with safety guarantees. This allows, e.g., for maximizing black-box reward functions (reward of a sequence of actions is only available through measurements) for complex tasks, see e.g. , which would not be possible within an MPC framework, or to focus on exploration in order to collect informative data about the system, as described in Section V.
Notation: The set of symmetric matrices of dimension is , the set of positive (semi-) definite matrices is () , the set of integers in the interval is , and the set of integers in the interval is . The Minkowski sum of two sets is denoted by and the Pontryagin set difference by . The -th row and -th column of a matrix is denoted by and .
Ii Problem description
We consider dynamical systems, which can be described by linear systems with additive disturbances of the form
with initial condition and where is a compact set. The system is subject to polytopic state constraints , , and polytopic input constraints , , . We assume that the origin is contained in , is stabilizable, and the system state is fully observable. Note that system class (1) allows for modeling nonlinear time-varying systems if for all .
We aim at providing a safety certificate for arbitrary control signals in terms of a safe set and a safe control law. Given the system description (1) and a potentially unsafe learning-based controller , we search for a set of states for which we know a feasible backup control strategy such that input and state constraints will be fulfilled for all future times. Therefore, can be applied as long as it does not cause the system to leave or violate input constraints. Otherwise, a safety controller is allowed to modify the learning input based on the backup controller in order to keep the system safe. Formally this is captured by the following definition of a safe set and controller.
A set is called a safe set for system (1) if a safe backup control law is available such that for an arbitrary (learning-based) policy , the application of the safety control law
guarantees that the system state is contained in for all if .
While this framework is conceptually similar to those in [3, 7, 4, 8], we do not require the safe set to be robust controlled invariant as in [4, Definition II.4], [7, Definition 2], [8, Section 2.2] or [3, Section II.A]. The presented approach is thereby capable of enlarging any given safe set, and can be combined with any of the previously proposed methods.
Iii Model predictive safety certification
The starting point for the derivation of the proposed safety concept is Definition II.1. The essential requirement is that in the safe set , we always need to know a feasible backup controller , that ensures constraint satisfaction in the face of uncertainty for all future times. The idea for constructing such a controller is based on MPC . Given the current system state, we calculate a safe, finite horizon backup controller towards some conservative target set , which is known to be a safe set and therefore provides ‘infinite safety’ after applying the finite-time controller.
The concept is illustrated in Figure 1. Consider a current system state , together with a proposed learning input . In order to analyze safety of , we test if will lead us to a state , for which we can construct a safe backup controller in the form of a feasible input sequence that drives the system to the safe terminal set in a given finite number of steps. If the test is successful, is a safe state and can be applied. At the next time step , we repeat the calculations for and . If it is successful, we can again apply the learning input , otherwise we can simply use the previously calculated backup controller from time . This strategy yields a safe set that is defined by the feasible set of the corresponding optimization problem for planning a trajectory towards the target set.
As the true system dynamics model is often unknown in the context of learning-based control, we employ mechanisms from tube-based MPC to design a safe backup controller for uncertain system dynamics of the form (1).
Iii-a Model predictive safety certification scheme
Similar as in tube-based MPC, see e.g. , a nominal backup trajectory is computed, such that a stabilizing auxiliary controller is able to track it for the real system within a ‘tube’ towards the safe terminal set. We first define the main components and assumptions of the tube-based MPC controller, in order to then introduce the model predictive safety certification (MPSC) scheme, consisting of the MPSC problem and the proposed safety controller. Define with and the nominal system states and inputs, as well as the nominal dynamics
with initial condition . Denote as the error (deviation) between the system state (1) and the nominal system state (2). The controller is then defined by augmenting the nominal input with an auxiliary feedback on the error, i.e.
which keeps the real system state close to the nominal system state if is chosen such that it robustly stabilizes the error with dynamics
resulting from application of (3) to the real system.
There exists a linear state feedback matrix that yields a stable error system (4).
Stability of the autonomous error dynamics (4) implies the existence of a corresponding robust positively invariant set according to the following definition.
A set is a robust positively invariant (RPI) set for the error dynamics (4) if
In order to guarantee that under application of (3), the state and input constraints and are tightened for the nominal system (2), as described e.g. in , to and . There exist various methods in the literature, which can be used in order to calculate a controller and the corresponding RPI set according to Definition III.2, see e.g. .
Different from standard tube-based MPC, the model predictive safety certification (MPSC) uses a terminal set that is only required to be itself a safe set according to Definition II.1, which is conceptually similar to the safe terminal set used in . This allows not only for enlarging any potentially conservative initial safe set, but also for recursively improving the safe set, as will be shown in Section IV-B.
There exists a safe set and a safe control law according to Definition II.1 such that .
Based on these components, the proposed safe backup controller utilizes the following MPSC problem for a given measured state and proposed learning input :
where we denote the planning horizon by and the predicted nominal system states and inputs by and . Let the feasible set of (5) be denoted by
Problem (5) introduces the auxiliary variable , which includes the auxiliary feedback (5f), ensuring safety of the control input , as we will show in the proof of Theorem III.5. The cost (5a) is chosen such that if possible, is equal to , in which case safety of is certified. The controller resulting from (5) in a receding horizon fashion is given by
where , , and is the optimal solution of (5) at state .
It is important to note that (5) may not be recursively feasible for general safe sets . This is due to the fact that the terminal safe set is itself not necessarily invariant or a subset of the feasible set.
To this end, we propose Algorithm 1, which implements a safety controller based on (5). If we can always directly apply (Algorithm 1, line 4). If at a subsequent time , then we know via (5) a finite-time safe backup controller towards using (3), from the trajectory computed to certify (Algorithm 1, line 9), compare also with Figure 1. By Assumption III.3 we can extend this finite-time backup controller after -steps with (Algorithm 1, line 11) in order to obtain a safe backup controller for , which will satisfy constraints at all times in the future. In the case that , (5) can be initially infeasible for . This case can be easily treated by directly applying (Algorithm 1, line 1 and line 11) which ensures safety for all future times. Formalization of the above yields our main result.
If , the terminal safety controller is applied since is initialized to in (Algorithm 1, line 1), which keeps the system safe for all times. We show that is a safe set by first investigating the case that (5) is feasible for all and extending the analysis to cases in which (5) is infeasible for arbitrarily many time steps.
Let and let (5) be feasible for all (Algorithm 1, line 4), i.e. . Condition (5e) implies by Assumption III.1 that and therefore that , which implies by the tightened constraints on the nominal state (5c) that . Therefore is a safe set under the safe backup controller in (7).
Now, consider an arbitrary time for which was feasible for the last time, i.e. for all with . Because of (5e) and (5f) we have that and therefore Assumption III.3 together with (5d) allows for explicitly stating a safe backup control law based on that keeps the system in the constraints for all future times:
The last remaining case follows from the observation, that by (5d) for which we know the safe control law for all future times by Assumption III.3. Once a feasible solution is found again, the counter is set to zero. Consequently we investigated all possible cases in Algorithm 1 and proved that it will always provide a safe control input if , showing the result. ∎
Iii-B A recursively feasible MPSC scheme
By modifying Assumption III.3 and requiring the terminal safe set to be invariant for the nominal system, which is the standard assumption in tube-based MPC, we obtain recursive feasibility of (5) and can thus directly apply the time-invariant control law (7) to system (1) without the need of Algorithm 1. In other words, (7) directly becomes the safety controller according to Definition II.1.
There exists a set and corresponding control law such that for all , .
We begin with showing recursive feasibility under (7). Let (5) be feasible at time . It follows that because of (5e) and (5f). From here, recursive feasibility follows as in standard tube-based MPC by induction, see e.g. . Along the lines of the proof of Theorem III.5, recursive feasibility implies that is a safe set. ∎
Iv Design of and from data
The proposed MPSC scheme is based on two main design components, the robust positively invariant set , which determines the tube, and the terminal safe set .
While can be chosen more generally according to Assumption III.3, we note from Theorem III.7 that we can in principle also use the same design methods proposed for linear tube-based MPC. The computation of the robust invariant set and the nominal terminal set have been widely studied in the literature, see e.g. [12, 14] and references therein.
This section presents a different option for the approximation of a tube and safe terminal set that is tailored to the learning context and aims at a minimal amount of tuning ‘by hand’. We propose to infer a robust control invariant set either directly from data or from a probabilistic model via scenario-based optimization. Secondly, starting from any terminal safe set, e.g. the trivial choice , we show how to enlarge this terminal set iteratively by utilizing feasible solutions of (5) over time.
Iv-a Scenario based calculation of from data
Let be a set of so-called ‘scenarios’, either sampled from a probabilistic belief about the system dynamics (1) or collected from measurements. We restrict ourselves to ellipsoidal robust positively invariant sets with , in order to enable scalability of the resulting design optimization problems to larger scale systems. The corresponding robust scenario-based design problem for computation of the set is given by
where . Problem (8) defines a robust positively invariant set for the error system (4), if the condition is enforced for all , see e.g. . The objective (8a) is chosen such that a possibly small RPI set is obtained, which increases by definition of and the size of the feasible region of (5), and therefore the size of the safe set. A stabilizing linear state feedback matrix according to Assumption III.1 needs to be chosen beforehand, e.g. using LQR or controller design methods.
Iv-B Iterative enlargement of the terminal safe set
In this section we show how to enlarge the terminal safe set based on previously calculated solutions of (5), which is conceptually similar to the data-based terminal set proposed in . Note, that a larger terminal set according to Assumption III.3 or Assumption III.6 typically also leads to a larger feasible set , and therefore to a larger overall safe set according to Theorems III.5 and III.7.
The main idea is to define a safe set based on successfully solved instances of (5) for measured system states
If , convexity of the new terminal set (10) is ensured and we can iteratively enlarge the initial terminal set .
In order to provide a similar result with respect to Theorem III.7 consider the set with as defined above.
Follows similarly to the proof of Theorem IV.2. ∎
Theorem IV.2 also provides an explicit approximation of the safe set given by (10), which is generally only implicitly defined. Such a representation can be used to ‘inform’ the learning-based controller about the safety boundary, e.g. in the form of a feature using a barrier function, in order to avoid chattering behavior, as proposed in .
V Application to numerical examples
We consider the problem of safely acquiring information about the partially unknown system dynamics of a discretized mass-spring-damper system, which is given by with , , and . Assume that an approximate model is given by with mass, spring, and damper parameters, which have a 20% error with respect to the true parameters. We use the results from Section IV in order to calculate without deriving a suitable representation (1), i.e. a suitable , first. Using the approximate model and LQR design, we choose . Based on uniformly sampled measurements from the real (but unknown) system, we generate the robust scenario design problem (8) with scenarios . Solving (8) yields that with fulfills (8b) for all possible with probability according to Proposition IV.1. For the MPSC scheme, we use a horizon and the terminal safe set as described in Remark IV.3.
As learning signal we use with the goal of generating informative measurements according to .
A closed-loop simulation with initial condition under application of Algorithm 1 is illustrated in Figure 2 with the corresponding safe set. As desired, the safety controller modifies the proposed input signal only as the system state approaches a neighborhood of the safe set boundary where the next state would leave the safe set (indicated in red color). The pure learning-based trajectory (dotted line, in Figure 2), in contrast, would have violated state constraints already in the first time steps.
Using a similiar configuration with planning horizon , we now iteratively enlarge the safe set based on previously calculated nominal state trajectories at each time step by following Corollary IV.4. Samples of the nominal and overall terminal set at different time steps are shown in Figure 3. After time steps, a significant portion of the state space is already covered by the safe terminal set.
The paper has addressed the problem of safe learning-based control by means of a model predictive safety certification scheme. The proposed scheme allows for enhancing any potentially unsafe learning-based control strategy with safety guarantees and can be combined with any known safe set. By relying on robust MPC methods, the presented concept is amenable for application to large-scale systems with similar offline computational complexity as e.g. ellipsoidal safe set approximations. Using a parameter-free scenario-based design procedure, it was illustrated how the design steps can be performed based on available data and how to reduce conservatism of the MPSC scheme over time by making use of generated closed-loop data.
-  J. Merel, Y. Tassa, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne, and N. Heess, “Learning human behaviors from motion capture by adversarial imitation,” arXiv preprint arXiv:1707.02201, 2017.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
-  A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin, “Reachability-based safe learning with gaussian processes,” in 53rd IEEE Conference on Decision and Control, Dec 2014, pp. 1424–1431.
-  K. P. Wabersich and M. N. Zeilinger, “Scalable synthesis of safety certificates from data with application to learning-based control,” arXiv preprint arXiv:1711.11417, 2017.
-  A. Domahidi, A. U. Zgraggen, M. N. Zeilinger, M. Morari, and C. N. Jones, “Efficient interior point methods for multistage problems arising in receding horizon control,” in 51st IEEE Conference on Decision and Control (CDC), Dec 2012, pp. 668–674.
J. García and F. Fernández, “A comprehensive survey on safe
Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.
-  J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,” arXiv preprint arXiv:1705.01292, 2017.
-  R. B. Larsen, A. Carron, and M. N. Zeilinger, “Safe learning for distributed systems with bounded uncertainties,” 20th IFAC World Congress, vol. 50, no. 1, pp. 2536 – 2542, 2017.
-  A. Aswani, H. Gonzalez, S. S. Sastry, and C. Tomlin, “Provably safe and robust learning-based model predictive control,” Automatica, vol. 49, no. 5, pp. 1216–1226, 2013.
-  T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration and reinforcement learning,” arXiv preprint arXiv:1803.08287, 2018.
-  H. Mania, A. Guy, and B. Recht, “Simple random search provides a competitive approach to reinforcement learning,” arXiv preprint arXiv:1803.07055, 2018.
-  J. B. Rawlings and D. Q. Mayne, Model predictive control: Theory and design. Nob Hill Pub., 2009.
-  S. V. Rakovic, E. C. Kerrigan, K. I. Kouramas, and D. Q. Mayne, “Invariant approximations of the minimal robust positively invariant set,” vol. 50, no. 3. IEEE, 2005, pp. 406–410.
-  F. Blanchini, “Set invariance in control,” Automatica, vol. 35, no. 11, pp. 1747 – 1767, 1999.
-  M. C. Campi and S. Garatti, “The exact feasibility of randomized solutions of uncertain convex programs,” SIAM Journal on Optimization, vol. 19, no. 3, pp. 1211–1230, 2008.
-  G. C. Calafiore and M. C. Campi, “The scenario approach to robust control design,” IEEE Transactions on Automatic Control, vol. 51, no. 5, pp. 742–753, 2006.
-  U. Rosolia and F. Borrelli, “Learning model predictive control for iterative tasks: a computationally efficient approach for linear system,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 3142–3147, 2017.
-  S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
-  L. Ljung, “System identification,” in Signal analysis and prediction. Springer, 1998, pp. 163–173.