# A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability

The goal of this paper is to understand the impact of learning on control synthesis from a Lyapunov function perspective. In particular, rather than consider uncertainties in the full system dynamics, we employ Control Lyapunov Functions (CLFs) as low-dimensional projections. To understand and characterize the uncertainty that these projected dynamics introduce in the system, we introduce a new notion: Projection to State Stability (PSS). PSS can be viewed as a variant of Input to State Stability defined on projected dynamics, and enables characterizing robustness of a CLF with respect to the data used to learn system uncertainties. We use PSS to bound uncertainty in affine control, and demonstrate that a practical episodic learning approach can use PSS to characterize uncertainty in the CLF for robust control synthesis.

## Authors

• 7 publications
• 3 publications
• 1 publication
• 7 publications
• 75 publications
• 48 publications
• ### Probabilistic robust linear quadratic regulators with Gaussian processes

Probabilistic models such as Gaussian processes (GPs) are powerful tools...
05/17/2021 ∙ by Alexander von Rohr, et al. ∙ 0

• ### Learning-Based Safety-Stability-Driven Control for Safety-Critical Systems under Model Uncertainties

Safety and tracking stability are crucial for safety-critical systems su...
08/08/2020 ∙ by Lei Zheng, et al. ∙ 0

• ### Towards Robust Data-Driven Control Synthesis for Nonlinear Systems with Actuation Uncertainty

Modern nonlinear control theory seeks to endow systems with properties s...
11/21/2020 ∙ by Andrew J. Taylor, et al. ∙ 0

• ### Adaptive Steering Control for Steer-by-Wire Systems

Steer-by-Wire (SBW) systems are being adapted widely in semi-autonomous ...
09/17/2021 ∙ by Harsh Shukla, et al. ∙ 0

• ### Heteroscedastic Uncertainty for Robust Generative Latent Dynamics

Learning or identifying dynamics from a sequence of high-dimensional obs...
08/18/2020 ∙ by Oliver Limoyo, et al. ∙ 0

• ### H_∞ Model-free Reinforcement Learning with Robust Stability Guarantee

Reinforcement learning is showing great potentials in robotics applicati...
11/07/2019 ∙ by Minghao Han, et al. ∙ 0

• ### Understanding and Stabilizing GANs' Training Dynamics with Control Theory

09/29/2019 ∙ by Kun Xu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Properly characterizing uncertainty is a key aspect of robust control [35]. With the increasing use of learning for dynamics modelling and control synthesis [6, 11, 9, 12, 4, 31, 25], it is correspondingly important to develop new tools to reason about the interplay between learning and robust control.

In this paper, we focus on the interplay between learning and robustness for control synthesis using Control Lyapunov Functions (CLFs) [5, 19]. The use of CLFs has seen multiple applications in recent years [20, 15, 24], and one of their primary benefits is to enable control objectives to be represented in a low-dimensional form that can be integrated with optimization methods to yield optimal controllers [3]. This low-dimensional form is also appealing from a learning perspective, as learning is typically more tractable in lower-dimensional spaces [32, 34, 31].

The practical design of CLFs remains challenging. In many cases, extensive tuning upon deployment is necessary [20], and even with this tuning the system is often not able to track a desired state or trajectory perfectly. Other approaches, such as those based on adaptive control [18], can adaptively learn a CLF but are restricted to learning over specific classes of model uncertainty.

We thus build upon ideas in robust control in order to guarantee performance in the presence of model mis-specification. The idea of robust CLFs is not new (cf. [14, 13]), but existing analyses focus on the full-dimensional state dynamics, which can be burdensome for learning.

In this paper, we make two main contributions. First, we propose a novel characterization called Projection to State Stability (PSS), which is a variant of the well-studied Input to State Stability (ISS) property [26, 29, 28, 33, 27], but defined on projected dynamics rather than the original state dynamics. Like ISS, PSS provides a tool to characterize tracking error in terms of the magnitude of the disturbance or uncertainty. Unlike ISS, PSS can characterize dynamic uncertainty directly in the derivative of a CLF, thus allowing a low dimensional representation of the uncertainty. In our second contribution, we demonstrate the practicality of PSS by incorporating it into an episodic learning algorithm.

Our paper is organized as follows. Section II reviews CLFs and ISS. Section III defines Projection to State Stability (PSS), and how PSS enables constructing bounds on the state of a system that depend on a projected disturbance. Section IV defines a broad class of model uncertainty for affine control systems, evaluates how this uncertainty impacts the Lyapunov derivative, and demonstrates how to restrict this uncertainty with data to determine if a system is PSS. Section V discusses how episodic learning can be used to improve PSS guarantees in practice, and presents simulation results with an uncertain inverted pendulum model.

## Ii Preliminaries

This section provides a review of Control Lyapunov Functions (CLFs) and Input to State Stability (ISS). These tools will be used in Section III to define Projection to State Stability. This section concludes with a brief discussion of how these definitions must be modified to hold over a restriction of the domain.

Consider a state space and a control input space . Assume that is path-connected and that . Consider a system governed by:

 ˙x=f(x,u), (1)

for state and its derivative , control input , and dynamics . In this paper we assume is locally Lipschitz continuous. The following definitions, taken from [17], are useful in analyzing stability of (1).

###### Definition 1 (Class K Function).

A continuous function , with , is class , denoted , if it is monotonically (strictly) increasing and satisfies . If the domain of is all of and , then is termed radially unbounded and class .

###### Definition 2 (Class KL Function).

A continuous function , with , is class , denoted , if the function for all , and the function is monotonically non-increasing with as for all .

We note that the strictly increasing nature of Class functions permits an inverse Class function . We also note that the composition of Class () functions is itself a Class () function. Given these definitions, we define Control Lyapunov Functions (CLFs) as in [5], [19].

###### Definition 3 (Control Lyapunov Function).

A continuously differentiable function is a CLF for on if there exist such that:

 α––(∥x∥)≤V(x) ≤¯¯¯¯α(∥x∥) infu∈U˙V(x,u) ≤−α(∥x∥), (2)

for all .

If there exists a CLF for a system, then a state-feedback controller can be selected such that is a globally asymptotically stable equilibrium point. In particular, for all , should be chosen such that . We note that only need to be Class for this definition, but we extend them to to simplify later analysis.

To accommodate disturbances or uncertainties, we consider a disturbance space , and a modified system:

 ˙x=f(x,u,d), (3)

for disturbance and dynamics . We again assume is locally Lipschitz continuous. The disturbance may be time-varying, state-dependent, and/or input-dependent. We assume that the disturbance is bounded for almost all times (essentially bounded in time). This leads to the definition of ISS and ISS-CLFs as formulated in [26], [29].

###### Definition 4 (Input to State Stability).

Given a state-feedback controller , a system is Input to State Stable (ISS) if there exist and such that the solution to (3) satisfies:

 ∥x(t)∥≤β(∥x(0)∥,t)+γ(supτ≥0∥d(τ)∥), (4)

for all .

###### Definition 5 (Input to State Stable Control Lyapunov Function).

A continuously differentiable function is an Input to State Stable Control Lyapunov Function (ISS-CLF) for (3) on if there exist such that:

 α––(∥x∥)≤V(x) ≤¯¯¯¯α(∥x∥) ∥x∥≥ρ(∥d∥)⟹infu∈U˙V(x,u,d) ≤−α(∥x∥), (5)

for all and .

As with CLFs, if there exists an ISS-CLF for a system, then a state-feedback controller can be chosen such that the system is ISS. If the disturbance is input-dependent, it is additionally required that induces essentially bounded disturbances in time.

The condition on the Lyapunov function derivative in (3) or (5) may not be satisfied on the entire state space . In particular it may only be satisfied on a subset . The system may leave during its evolution, implying the desired derivative condition may no longer be satisfiable. We therefore consider the following definition and lemma.

###### Definition 6 (Forward Invariance).

Consider the system governed by (1). A subset is forward invariant if there exists a state-feedback controller such that implies for all .

The definition of forward invariance applies to systems governed by (3), with disturbances appropriately restricted to subsets of if the disturbances are modeled as state-dependent and/or input-dependent. If , we may restrict Definitions 3 and 5 to a forward invariant subset with , provided such a subset exists.

###### Lemma 1.

A sublevel set of an ISS-CLF is a forward invariant set, provided for all and appropriately restricted .

###### Proof.

The condition on the Lyapunov derivative in (5) implies the existence of a state-feedback controller satisfying for all and appropriately restricted . Let for any . If , then for all by Nagumo’s Theorem [23], [1]. Thus, if , then for all . ∎

## Iii Projection to State Stability

Input to State Stability (ISS) requires a bound on the state in terms of the norm of the disturbance as it appears in the state dynamics (see Definition 4 in Section II). This requirement does not easily permit analysis of Input to State behavior when the disturbance is more easily described by its impact in a Lyapunov function derivative. This limitation motivates Projection to State Stability (PSS), which instead relies a bound on the state in terms of a projection of the disturbance.

###### Definition 7 (Dynamic Projection).

A continuously differentiable function is a dynamic projection if there exist satisfying:

 σ––(∥x∥)≤∥Π(x)∥≤¯¯¯σ(∥x∥), (6)

for all .

Let , and let for all . Consider the system governed by (3). The associated projected system is governed by the dynamics:

 ˙y=DΠ(x)f(x,u,0)+DΠ(x)(f(x,u,d)−f(x,u,0))δ, (7)

where denotes the Jacobian of , and is implicitly a function of , , and . For the following definitions, we assume is essentially bounded in time.

We are now ready to state our main definition. The key difference between PSS and ISS (Definition 4) is the use of (7) rather than the native disturbance .

###### Definition 8 (Projection to State Stability).

Given a state-feedback controller , a system is Projection to State Stable (PSS) with respect to the projection if there exist and such that the solution to (3) satisfies:

 ∥x(t)∥≤β(∥x(0)∥,t)+γ(supτ≥0∥δ(τ)∥), (8)

for all , with as defined in (7).

###### Remark 1.

If is an inclusion map with , and the system can be specified as:

 f(x,u,d)=f(x,u,0)+d, (9)

then PSS is equivalent to ISS.

Similarly, we can also construct a Lyapunov function that certifies a system is PSS with respect to a projection.

###### Definition 9 (Projection to State Stable Control Lyapunov Function).

A continuously differentiable function is a Projection to State Stable Control Lyapunov Function (PSS-CLF) for (7) on if there exist satisfying:

 α––(∥Π(x)∥)≤W(Π(x)) ≤¯¯¯¯α(∥Π(x)∥) ∥Π(x)∥≥ρ(∥δ∥)⟹infu∈U˙W(x,u,δ) ≤−α(∥Π(x)∥), (10)

for all .

As with ISS-CLFs, this definition can be restricted to a forward invariant set containing . We now show that a PSS-CLF certifies a system is PSS.

###### Theorem 1.

If the system governed by (7) has a PSS-CLF, then the system governed by (3) is PSS with respect to the projection .

###### Proof.

The bounds in (9) can be weakened to:

 W(Π(x))≥¯¯¯¯α∘ρ(∥δ∥) ⟹infu∈U˙W(x,u,δ)≤−α∘¯¯¯¯α−1(W(Π(x))). (11)

That is, if (III) holds, (9) holds. Therefore, a choice of state-feedback controller exists such that the system governed by (7) is Input to State Stable (ISS) with viewed a disturbance. This implies that there exist and such that:

 ∥Π(x(t))∥≤β(∥Π(x(0))∥,t)+γ(supτ≥0∥δ(τ)∥), (12)

for all . Since satisfies (6) we have:

 ∥x(t)∥≤σ––−1(β(¯¯¯σ(∥x(0)∥),t)+γ(supτ≥0∥δ(τ)∥)). (13)

Finally, define and as:

 β′(r,s) =σ––−1(2β(¯¯¯σ(r),s)) (14) γ′(r) =σ––−1(2γ(r)). (15)

From the weak form of the triangle inequality presented in [26], [16], it follows that:

 (16)

We next show that a CLF for the undisturbed dynamics of a system can be viewed as a projection, thus yielding a PSS-CLF that certifies PSS with respect to .

###### Corollary 1.

Suppose is a CLF on for the system Then the disturbed system governed by (3) is PSS with respect to the projection .

###### Proof.

With the projection we have that:

 δ=∇V(x)⊤(f(x,u,d)−f(x,u,0)). (17)

where is the gradient of the Lyapunov function. The projected system is governed by:

 ˙V(x,u,δ)=∇V(x)⊤f(x,u,0)+δ, (18)

Since is a CLF, there exists a state-feedback controller satisfying:

 ˙V(x,k(x),0)≤−α(∥x∥), (19)

for all . Let satisfy . Then:

 ˙V(x,k(x),δ) ≤−α(∥x∥)+δ ≤−αp(∥x∥)−αq(∥x∥)+|δ|. (20)

Therefore:

 ∥x∥≥α−1q(|δ|)⟹˙V(x,k(x),δ)≤−αp(∥x∥). (21)

Since is a CLF we may weaken the bounds as in the proof of Theorem 1 to:

 V(x)≥¯¯¯¯α∘α−1q(|δ|) ⟹˙V(x,k(x),δ)≤−αp∘¯¯¯¯α−1(V(x)), (22)

noting that and are class . It follows from Definition 9 that the identity map on is a PSS-CLF for (18). Therefore, the system (3) is PSS with respect to the projection by Theorem 1. ∎

## Iv Uncertainty Modeling & Analysis

In this section we consider a structured form of uncertainty present in affine control systems. We analyze the impact of this uncertainty on a Lyapunov function derivative, and on the PSS behavior of the system.

### Iv-a Uncertain Affine Systems

We consider affine control systems of the form:

 ˙x=f(x)+g(x)u, (23)

with drift dynamics and actuation matrix . If and

are unknown, we may consider an estimated model of the system:

 ˙x=^f(x)+^g(x)u, (24)

where and are estimates of and , respectively. In this case, (23) can be expressed as:

 ˙x=^f(x)+^g(x)u+d(g(x)−^g(x)A(x))u+f(x)−^f(x)b(x), (25)

obtaining a representation of the dynamics as in (9). Note that the disturbance is explicitly characterized as time-invariant, state-dependent, and input-dependent, with potentially unknown and for all .

As discussed in [2], [31], CLFs may be constructively formed for affine systems under proper assumptions regarding relative degree and unbounded control. Furthermore, if the true system satisfies the relative degree properties of the estimated model, then the CLF found for the estimated system can be used for the true system.

Assume , , , and are Lipschitz continuous (implying and are Lipschitz continuous), and let be a CLF candidate for (24). The time derivative of is given by:

 ˙V(x,u,d)= +(A(x)⊤∇V(x)a(x))⊤u+b(x)⊤∇V(x)b(x), (26)

for all and . As proposed in [31], we may wish to reduce the estimation error by improving with estimates of and . Given continuous estimators and , (IV-A) may be reformulated as:

 ˙V(x,u,d)=^˙V(x,u)(^f(x)+^g(x)u)⊤∇V(x)+^a(x)⊤u+^b(x) +(A(x)⊤∇V(x)−^a(x)a(x))⊤u+b(x)⊤∇V(x)−^b(x)b(x), (27)

for all and .

Both formulations decompose into an estimated component, , and a residual component. In (IV-A) the residual terms and capture the effect of the unmodeled dynamics on the Lyapunov function derivative. In (IV-A) the residual terms reflect the error in estimating this effect. Additionally, viewing as a projection results in .

### Iv-B Projection to State Stability via Uncertainty Functions

If knowledge on what values and can assume is available, the impact on the Lyapunov derivative can be constrained in a manner permitting PSS analysis of a system. Therefore, we define a function characterizing the possible uncertainties at a given state.

###### Definition 10 (Uncertainty Function).

Let denote the set of all subsets of . An uncertainty function for (IV-A) or (IV-A) is a function with bounded and satisfying for all .

For a given , we refer to as an uncertainty set. Suppose there exists a valid uncertainty function for (IV-A) or (IV-A). Then satisfies:

 ˙V(x,u,δ)≤^˙V(x,u)+sup(a,b)∈Δ(x)(a⊤u+b), (28)

for all and . One major challenge is to define a that is non-vacuous and thus practically relevant. From this point forward we limit our attention to a subset of the state space and make a critical assumption regarding the estimate for a CLF .

###### Assumption 1.

Let be a CLF for the system governed by (24) on a subset with . We assume that:

 infu∈U^˙V(x,u)≤−α(∥x∥). (29)

for all . If is specified as in (IV-A), then this assumption is satisfied by definition. If is specified as in (IV-A), then this assumption states that the addition of the estimators and does not make it impossible to choose a control input such that (29) is satisfied.

If the estimated and true system satisfy the same relative degree property, then this assumption amounts to the addition of estimates and not violating the relative degree property.

###### Assumption 2.

Let and be defined as in (25), and let be defined as in Assumption 1. We assume and are bounded on .

If is compact, this assumption is automatically satisfied since and are assumed continuous. Under Assumption 1, the set of admissible control inputs :

 U(x)={u∈U:^˙V(x,u)≤−α(∥x∥)}, (30)

is non-empty, for all . Then the CLF satisfies:

 α––(∥x∥)≤V(x) ≤¯¯¯¯α(∥x∥) infu∈U(x)˙V(x,u,δ)−sup(a,b)∈Δ(x)(a⊤u+b) ≤−α(∥x∥), (31)

for all . We now develop sufficient conditions on the uncertainty function that certifies (25) as PSS with respect to the CLF (with interpreted as a projection).

###### Theorem 2 (Sufficient Conditions for PSS in Affine Control Systems).

Consider the system in (25), and a CLF for (24) with estimated time-derivative as defined in (IV-A) or (IV-A), satisfying Assumption 1. Let be an uncertainty function and let be a state-feedback controller satisfying for all , with defined as in (30). Suppose there exists with and a sublevel set of satisfying:

 ∥x∥≥sup(a,b)∈Δ(x)α−1q(a⊤k(x)+b), (32)

for all . Then the system governed by (25) is PSS with respect to the projection on .

###### Proof.

First, note that:

 ˙V(x,k(x),δ)−sup(a,b)∈Δ(x)(a⊤k(x)+b)≤−α(∥x∥) =−αp(∥x∥)−αq(∥x∥), (33)

for all . Since (32) holds for all and is monotonically increasing, we have:

 (34)

for all . It follows that:

 ˙V(x,k(x),δ)≤−αp(∥x∥), (35)

for all . This means is forward invariant, with a proof similar to that of Lemma 1. Since is a CLF for (24), Corollary 1 can be restricted to ; that is, the system is PSS with respect to the projection on . ∎

We may want to study a particular set of interest over which the impact of the uncertainty can be bounded. For , let be the open ball around of radius , typically used to define a ball contained in in the subsequent analysis.

###### Corollary 2.

Suppose there is a set and satisfying:

 (36)

for all . If there exists a sublevel set of such that:

 Bα−1q(μ)⊆Ω⊆C∩E, (37)

then the system is PSS with respect to the (CLF) projection on , and the smallest sublevel set of containing is asymptotically stable.

###### Proof.

First, note that:

 ∥x∥≥α−1q(μ)≥sup(a,b)∈Δ(x)α−1q(a⊤k(x)+b), (38)

for all , and the system is PSS on by Theorem 2. The smallest sublevel set of containing is asymptotically stable since:

 ∥x∥≥α−1q(μ)≥⟹˙V(x,k(x),δ)≤−αp(∥x∥). (39)

Improving the uncertainty set (e.g., reducing uncertainty using learning) directly leads to larger sets for a given bound, or tighter bounds on a given set. We state this formally in the next result.

###### Corollary 3 (Uncertainty Function Improvement).

Consider uncertainty functions and , as well as and as defined in Corollary 2.

• Fix and let be defined as:

 Eμ={x∈X:sup(a,b)∈Δ(x)(a⊤k(x)+b)≤μ}. (40)
• Fix and let be defined as:

 μE=sup(a,b)∈Δ(x)x∈Esup(a,b)∈Δ(x)(a⊤k(x)+b). (41)

Suppose for all . Then the associated set and scalar satisfy and .

###### Proof.
 sup(a,b)∈Δ′(x)(a⊤k(x)+b)≤sup(a,b)∈Δ(x)(a⊤k(x)+b). (42)

### Iv-C Uncertainty Function Construction

We now provide a constructive method for creating an uncertainty function from a dataset of of state and control values generated by a system. Assume and are Lipschitz continuous with constants and , respectively. Additionally, assume that and are bounded on by constants and , respectively. Consider a dataset consisting of data-measurement pairs . Such measurements of can be obtained through numerical differentiation of computed values of . For notational convenience, let .

###### Proposition 1.

Given a dataset , an uncertainty function can be constructed as:

 Δ(x)= {(a,b)∈Rm×R:±(a⊤u′+b)≤ϵ(x,x′,u′) for all (x′,u′)∈D0}, (43)

for all , where is continuous.

###### Remark 2.

For all , is a closed, symmetric polyhedron and is bounded given sufficiently diverse control inputs in the dataset. In this case, is a compact, convex set. The supremum present in Theorem 2 and Corollary 2

becomes a linear program (LP) and can be efficiently solved.

###### Proof of Proposition 1.

Define observed error as:

 ℓ(x,u)=∣∣˙V(x,u,δ)−^˙V(x,u)∣∣, (44)

for all . Consider a test point and a data point . Note that satisfies:

 ℓ(x′,u′)= |a(x′)⊤u′+b(x′)| = |a(x)⊤u′+b(x)+(a(x′)−a(x))⊤u′ +b(x′)−b(x)| ≥ |a(x)⊤u′+b(x)| −∥a(x′)−a(x)∥2∥u′∥2−|b(x′)−b(x)|, (45)

where the inequality follows from the reverse triangle inequality, triangle inequality, and Cauchy-Schwarz inequality.

For simplicity we proceed with the construction assuming the estimated Lyapunov function derivative is specified as in (IV-A). The resulting bound will be modified to include estimators as in (IV-A). Note that:

 ∥a(x′)−a(x)∥2= ∥A(x′)⊤∇V(x′)−A(x)⊤∇V(x)∥2 = ∥(A(x′)−A(x))⊤∇V(x′) +A(x)⊤(∇V(x′)−∇V(x))∥2 ≤ LA∥x′−x∥2∥∇V(x′)∥2 +∥A∥∞∥∇V(x′)−∇V(x)∥2, (46)

where the inequality follows from the triangle inequality, submultiplicativity of matrix norms, and Lipschitz and bounded assumptions for . Since it is also true that:

 ∥a(x′)−a(x)∥2= ∥A(x′)⊤(∇V(x′)−∇V(x)) +(A(x′)−A(x))⊤∇V(x)∥2, (47)

then the following bound holds:

 ∥a(x′)−a(x)∥2≤ LA∥x′−x∥2∥∇V(x)∥2 +∥A∥∞∥∇V(x′)−∇V(x)∥2. (48)

Let , and let . Observe that and are continuous functions. Next, note that:

 ∥∥a(x′)−a(x)∥∥2≤ϵL(x,x′)LA+ϵ∞(x,x′)∥A∥∞. (49)

Similarly,

 |b(x′)−b(x)|2≤ϵL(x,x′)Lb+ϵ∞(x,x′)∥b∥∞. (50)

Therefore,

 |a(x)⊤u′+b(x)| +ϵ∞(x,x′)(∥A∥∞∥∥u′∥∥+∥b∥∞). (51)

While and decrease as the test point approaches data points, without estimators as in (IV-A), the observed loss term can remain large. By including such estimators, the observed loss term may be reduced, but the bound must be modified with the following additional continuous function:

 ϵH(x,x′,u′)=|(^a(x)−^a(x′))⊤u′+^b(x)−^b(x′)|, (52)

which accounts for potential error in the estimation at the test point. is then specified as the total upper bound. ∎

###### Corollary 4.

The uncertainty set as specified in (1) is continuous with respect to the Hausdorff metric.

###### Proof.

Note that the inequality constraints in (1) can be rewritten as:

 Δ(x)= {(a,b)∈Rm×R:Ξ[ab]⪯ξ(x)}, (53)

where , and denotes elementwise inequality. The function is continuous since is continuous; therefore, the results established in [8] show that the point-to-set map is continuous. ∎

## V Integration With Learning

We now explore the practical interplay between learning and systematic improvement of PSS properties, in particular by decreasing the upper bound in (IV-C). By decreasing this bound, the uncertainty set in (1) can be made smaller, which in turn can increase the state space region over which PSS properties can be certified and/or achieve reduced degradation (see Corollary 3). As discussed previously, using PSS rather than ISS enables lower-dimensional learning objectives and upper bounds which can be efficiently evaluated during and after learning.

Learning also offers direct ways to decrease the upper bound in (IV-C). As discussed in Section IV, estimators can be used to reduce the observed loss in (44

), which appears directly in the upper bound. We can use supervised learning to train such estimators. One complication is that, using baseline controllers, it may not be possible to collect data in regions we wish to certify PSS properties. As the distances between a point of interest and previously collected data grow,

and can grow larger, weakening uncertainty bounds at the point of interest. By refining the baseline controller using learned models, the system may be controlled towards these regions of interest.

### V-a Episodic Learning Framework

We demonstrate the practicality of PSS by incorporating it into an episodic learning framework based on learning CLF time derivatives [31]. Controller improvement is achieved by alternating between executing a controller to gather data and refining estimates of residual uncertainty. As data collection and learning progresses, the size of uncertainty sets decreases, enabling stronger PSS certifications for the system.

We briefly describe the DaCLyF (Dataset Aggregation for Control Lyapunov Functions) learning approach from [31]. Let and be nonlinear estimator classes, and let be the class of estimators of the form:

 ^˙W(x,u)=^˙V(x,u)+^a(x)⊤u+^b(x), (54)

for all , given estimators and

. Equipped with a loss function

and a dataset obtained from experiments with a baseline controller, we approximately solve the Empirical Risk Minimization (ERM) problem over the class to update the estimate of the Lyapunov function derivative. Then, the controller is updated with an augmenting controller of the form:

 12[u(x)u′]⊤P[u(x)u′]+q⊤[u(x)u′]+r s.t ^˙V(x,u(x)+u′)≤−α(∥x∥) u(x)+u′∈U, (55)

for all , with , , , and denoting the updated estimator. Here denotes the set of positive semidefinite matrices of size . The augmenting controller, weighted by a trust factor, is additively incorporated with the baseline controller. As more data is collected, the trust factor is increased. This entire procedure is outlined in Algorithm 1.

The estimator class can include a wide variety of nonlinear functions. Should and be classes of Lipschitz continuous estimators, the upper bound (52) can be weakened further using the associated Lipschitz constants to permit further analysis of the uncertainty function specified in (1

). Importantly, this motivates the use of spectrally normalized deep neural networks

[21], [7] for this estimation problem.

### V-B Simulation Results

In this section we apply Algorithm 1

to an inverted pendulum model with parametric uncertainty. The pendulum is modeled as a massless rod with torque input at a fixed base. The true mass and the length are perturbed by up to 30% of their estimated values. The baseline controller is a linear proportional derivative (PD) controller to track angle and angle rate trajectories. The estimators are chosen from the class of two layer neural networks with 200 hidden units and ReLU nonlinearities, mapping concatenated state and Lyapunov function gradients to

and . The trust factors are chosen in a sigmoid fashion. Naive exploratory control is introduced as in [31], with perturbations chosen uniformly at random, independently in each coordinate, and scaled by 25% of the norm of the current control input.

A comparison of the baseline controller and final augmented controller demonstrating improved tracking performance is shown in Fig. 1. A comparison of PSS bounds for the model-based QP controller and the final augmented controller is shown in Fig. 2, with observed trajectories superimposed. The QP controller is unable to keep the system in regions in which the bound is shown to be small. On the other hand, the augmented controller keeps the system close to the desired trajectory, consistently near training data. The bounds are small along the observed trajectory, in comparison.

## Vi Conclusion

We presented a novel low-dimensional view of stability for uncertain systems and a method of evaluating PSS behavior using experimental data. This method constructs a bound on disturbances to a CLF derivative, and can be integrated with a machine learning framework to improve PSS behavior. Finally, we validate this procedure on a simulated system.

Future work includes incorporating the upper bounds into online learning settings and developing optimal exploration strategies. Quantifying the impact of learning on PSS provides an objective for deciding how to collect data, also known as the exploration problem in learning literature [22, 6, 10, 9, 30]

. In particular, reductions of the uncertainty bound may be used to formulate regret in online learning settings or reward in imitation and reinforcement learning settings. Additional future work includes extending the notion of PSS to Control Safety and Barrier Functions, more thoroughly studying the benefits of learning low-dimensional representations of the dynamics versus the full-order dynamics, and utilizing PSS to augment controller synthesis for complex real-world robotic systems.

## References

• [1] Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu.

Manifolds, tensor analysis, and applications

, volume 75.
Springer Science & Business Media, 2012.
• [2] Aaron D Ames, Kevin Galloway, Koushil Sreenath, and Jessy W Grizzle. Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics. IEEE Transactions on Automatic Control, 59(4):876–891, 2014.
• [3] Aaron D Ames and Matthew Powell. Towards the unification of locomotion and manipulation through control lyapunov functions and quadratic programs. In Control of Cyber-Physical Systems, pages 219–240. Springer, 2013.
• [4] Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J Zico Kolter. Differentiable mpc for end-to-end planning and control. In Advances in Neural Information Processing Systems, pages 8299–8310, 2018.
• [5] Zvi Artstein. Stabilization with relaxed controls. Nonlinear Analysis: Theory, Methods & Applications, 7(11):1163–1173, 1983.
• [6] Anil Aswani, Humberto Gonzalez, S Shankar Sastry, and Claire Tomlin. Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, 2013.
• [7] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240–6249, 2017.
• [8] Robert G Batson. Combinatorial behavior of extreme points of perturbed polyhedra. Journal of mathematical analysis and applications, 127(1):130–139, 1987.
• [9] Felix Berkenkamp, Riccardo Moriconi, Angela P Schoellig, and Andreas Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 4661–4666. IEEE, 2016.
• [10] Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. Safe controller optimization for quadrotors with gaussian processes. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 491–496. IEEE, 2016.
• [11] Girish Chowdhary, Hassan A Kingravi, Jonathan P How, and Patricio A Vela. Bayesian nonparametric adaptive control using gaussian processes. IEEE transactions on neural networks and learning systems, 26(3):537–550, 2015.
• [12] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4192–4201, 2018.
• [13] Randy Freeman and Petar V Kokotovic. Robust nonlinear control design: state-space and Lyapunov techniques. Springer Science & Business Media, 2008.
• [14] Randy A Freeman and PV Kokotovic. Inverse optimality in robust stabilization. SIAM journal on control and optimization, 34(4):1365–1391, 1996.
• [15] Kevin Galloway, Koushil Sreenath, Aaron D Ames, and Jessy W Grizzle. Torque saturation in bipedal robotic walking through control lyapunov function-based quadratic programs. IEEE Access, 3:323–332, 2015.
• [16] Christopher M Kellett. A compendium of comparison function results. Mathematics of Control, Signals, and Systems, 26(3):339–374, 2014.
• [17] H.K. Khalil. Nonlinear Systems - 3rd Edition. PH, Upper Saddle River, NJ, 2002.
• [18] Miroslav Krstić and Peter V Kokotović. Control lyapunov functions for adaptive nonlinear stabilization. Systems & Control Letters, 26(1):17–23, 1995.
• [19] Yuandan Lin and Eduardo D Sontag. A universal formula for stabilization with bounded controls. Systems & Control Letters, 16(6):393–397, 1991.
• [20] Wen-Loong Ma, Shishir Kolathaya, Eric R Ambrose, Christian M Hubicki, and Aaron D Ames. Bipedal robotic running with durus-2d: Bridging the gap between theory and experiment. In Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control, pages 265–274. ACM, 2017.
• [21] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
• [22] Teodor Mihai Moldovan and Pieter Abbeel. Safe exploration in markov decision processes. arXiv preprint arXiv:1205.4810, 2012.
• [23] Mitio Nagumo. Über die lage der integralkurven gewöhnlicher differentialgleichungen. Proceedings of the Physico-Mathematical Society of Japan. 3rd Series, 24:551–559, 1942.
• [24] Quan Nguyen and Koushil Sreenath. Optimal robust control for bipedal robots through control lyapunov function based quadratic programs. In Robotics: Science and Systems, 2015.
• [25] Guanya Shi, Xichen Shi, Michael O’Connell, Rose Yu, Kamyar Azizzadenesheli, Animashree Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural lander: Stable drone landing control using learned dynamics. In IEEE International Conference on Robotics and Automation (ICRA), 2019.
• [26] Eduardo D Sontag. Smooth stabilization implies coprime factorization. IEEE transactions on automatic control, 34(4):435–443, 1989.
• [27] Eduardo D Sontag. Input to state stability: Basic concepts and results. In Nonlinear and optimal control theory, pages 163–220. Springer, 2008.
• [28] Eduardo D Sontag and Yuan Wang. On characterizations of input-to-state stability with respect to compact sets. In Nonlinear Control Systems Design 1995, pages 203–208. Elsevier, 1995.
• [29] Eduardo D Sontag and Yuan Wang. On characterizations of the input-to-state stability property. Systems & Control Letters, 24(5):351–359, 1995.
• [30] Yanan Sui, Vincent Zhuang, Joel W Burdick, and Yisong Yue. Stagewise safe bayesian optimization with gaussian processes. In International Conference on Machine Learning (ICML), 2018.
• [31] Andrew J Taylor, Victor D Dorobantu, Hoang M Le, Yisong Yue, and Aaron D Ames. Episodic learning with control lyapunov functions for uncertain robotic systems. arXiv preprint arXiv:1903.01577, 2019.