# Assured Neural Network Architectures for Control and Identification of Nonlinear Systems

In this paper, we consider the problem of automatically designing a Rectified Linear Unit (ReLU) Neural Network (NN) architecture (number of layers and number of neurons per layer) with the assurance that it is sufficiently parametrized to control a nonlinear system; i.e. control the system to satisfy a given formal specification. This is unlike current techniques, which provide no assurances on the resultant architecture. Moreover, our approach requires only limited knowledge of the underlying nonlinear system and specification. We assume only that the specification can be satisfied by a Lipschitz-continuous controller with a known bound on its Lipschitz constant; the specific controller need not be known. From this assumption, we bound the number of affine functions needed to construct a Continuous Piecewise Affine (CPWA) function that can approximate any Lipschitz-continuous controller that satisfies the specification. Then we connect this CPWA to a NN architecture using the authors' recent results on the Two-Level Lattice (TLL) NN architecture; the TLL architecture was shown to be parameterized by the number of affine functions present in the CPWA function it realizes.

## Authors

• 8 publications
• 16 publications
04/20/2020

### Two-Level Lattice Neural Network Architectures for Control of Nonlinear Systems

In this paper, we consider the problem of automatically designing a Rect...
11/05/2019

### AReN: Assured ReLU NN Architecture for Model Predictive Control of LTI Systems

In this paper, we consider the problem of automatically designing a Rect...
08/14/2020

### Analytical bounds on the local Lipschitz constants of affine-ReLU functions

In this paper, we determine analytical bounds on the local Lipschitz con...
04/06/2021

### Safe-by-Repair: A Convex Optimization Approach for Repairing Unsafe Two-Level Lattice Neural Network Controllers

In this paper, we consider the problem of repairing a data-trained Recti...
08/12/2021

### On minimal representations of shallow ReLU networks

The realization function of a shallow ReLU network is a continuous and p...
10/31/2018

### Formal Verification of Neural Network Controlled Autonomous Systems

In this paper, we consider the problem of formally verifying the safety ...
12/27/2021

### Sparsest Univariate Learning Models Under Lipschitz Constraint

Beside the minimization of the prediction error, two of the most desirab...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Recent advances in theory and computation have facilitated wide-spread adoption of Rectified Linear Unit (ReLU) Neural Networks (NNs) in conventional feedback-control settings, especially for Cyber-Physical Systems (CPSs) [1]. However, this proliferation of NN controllers has also highlighted weaknesses in state-of-the-art NN design techniques, because CPS systems are often safety critical. In a safety-critical CPS, it is not enough to simply learn a NN controller, i.e. fit data: such a controller must also have demonstrable safety or robustness properties, usually with respect to closed-loop

specifications on a dynamical system. Moreover, a meaningful safety specification for a CPS is often binary: either a controller makes the system safe or it doesn’t. This is in contrast to optimization-based approaches typical in Machine/Reinforcement Learning (ML/RL), where the goal is to optimize a particular cost or reward without any requirements imposed on eventual quantity in question. Examples of the former include stability about an equilibrium point or forward invariance of a particular set of safe states (e.g. for collision avoidance). Examples of the latter include minimizing the mean-squared fit error; minimizing regret; or maximizing a surrogate for a value function; etc.

The distinction between safety specifications and ML/RL objectives is relevant to more than just learning NN weights and biases, though: it is has special relevance to NN architecture design — i.e. deciding on the number of neurons and their connection (or arrangement) in the NN to be trained in the first place. Specifically, (binary) safety specifications beg existential questions about NN architectures in a way that conventional, optimization-focused ML/RL techniques do not: given a particular NN architecture for a controller, it is necessary to ask whether there is there any possible choice of weights and biases that achieve the desired safety specification. By contrast, conventional ML/RL type problems instead take an architecture as given, and attempt to achieve the best training error, reward, etc. subject to that implicit constraint. Thus, in typical ML/RL treatments, NN architectures merely adjust the final cost/reward, rather than leading to ill-posedness, such as can occur with a safety specification.

In this paper, we directly address the issue of whether a ReLU NN architecture is well-posed as a state-feedback controller for a given nonlinear system and closed-loop (safety) specification. That is we present a systematic methodology for designing a NN controller architecture that is guaranteed (or assured) to be able to meet a given closed-loop specification. Our approach provides such a guarantee contingent on the following properties of the system and specification:

1. the nonlinear system’s vector field is Lipschitz continuous with

known Lipschitz constants;

2. there exists a Lipschitz continuous controller that satisfies the closed-loop specification robustly, and the Lipschitz constant of that controller is known (although the controller itself need not be known); and

3. the conjectured Lipschitz continuous controller makes a compact subset of the state space positive invariant.

The need to assume the existence of a controller is primarily to ensure that the specification is well-posed in general – i.e., for any controller, whether it is a NN or not; we will elaborate on the robustness in (ii) subsequently. Importantly, subject to these conditions, our approach can design a NN controller architecture with the following assurance: there exists neuron weights/biases for that architecture such that it can exactly meet the same specification as the assured non-NN controller (albeit non-robustly). Moreover, our proposed methodology requires only the information described above, so it is applicable even without perfect knowledge of the underlying system dynamics, albeit at the expense of designing rather larger architectures.

The cornerstone of our approach is a special ReLU NN architecture introduced by the authors in the context of another control problem: viz. the Two-Level Lattice (TLL) ReLU NN architecture [2]. A TLL NN, like all ReLU NNs, instantiates a Continuous, Piecewise Affine (CPWA) function, and hence implements one of finitely many local linear functions111The term “linear” here is somewhat of a misnomer: “affine” is more accurate. However, we use this terminology for consistency with the literature. at each point in its domain; thus, its domain, like that of any CPWA, can be partitioned into linear regions, each of which corresponds to a different local linear function. However, unlike general ReLU NN architectures, a TLL NN exposes these local linear functions directly in the parameters of the network. As a consequence of this parameterization, then, TLL NNs are easily parameterized by their number of linear regions: after all, each linear region must instantiate at least one local linear function – see [2, Theorem 2]. In fact, this idea also applies to upper bounds on the number of regions desired in an architecture; see [2, Theorem 3]. TLL NNs thus have the special property that they can be used to connect a desired number of linear regions directly to a ReLU architecture.

Thus, to obtain an assured controller architecture, it is enough to obtain an assured upper-bound on the number of linear regions required of a controller. In this paper, we show that such an assured upper bound can be obtained using just the information assumed in (i) - (iii)

; i.e. primarily Lipschitz constants and bounds on the relevant objects. This bound is derived by counting the number of linear regions needed to linearly and continuously interpolate between regularly spaced “samples” of a controller with known Lipschitz constant.

Moreover, this core approach is relevant to end-to-end learning beyond just the design of controller architectures: it can also be used to obtain architectures that are guaranteed to represent the dynamics of a nonlinear system itself – i.e. assured architectures for system identification. Indeed, in this paper, we further show that contingent on information akin to (i) and (iii), it is possible to generate a ReLU architecture that is guaranteed to capture the essence of an unknown nonlinear control system. Specifically, we exhibit a methodology for designing an architecture to represent a nonlinear (controlled) vector field with the following assurance: if the ReLU vector field is sufficiently well trained on data from a compatible – but unknown – controlled vector field, then robustly controlling the ReLU dynamics to a specification will yield a controller that likewise controls the unknown dynamics to the same specification (albeit non-robustly). Providing a guaranteed architecture for system identification has unique value for end-to-end learning: because the ReLU control system can be used as a surrogate for the original nonlinear system, control design can be moved entirely from the unknown system to the known ReLU surrogate, the latter of which can be simulated instead. Furthermore, this system identification can be combined with a guaranteed controller architecture to do in silico ReLU control design.

The contributions of this paper can be summarized thusly.

1. A new notion of Abstract Disturbance Simulation (ADS) to formulate of robust specification satisfaction; ADS unifies and generalizes several related notions of simulation in the literature – see Section 3.2.

2. A methodology to design a ReLU architecture that is assured to be able to control an unknown nonlinear system to meet a closed-loop specification; this is subject to the existence of a (likewise) unknown controller that robustly meets the same specification.

3. A methodology to design a ReLU architecture that can be used in system identification of an unknown nonlinear control system; this architecture, when adequately trained, is assured to be viable as a surrogate for the original nonlinear system in controller design.

A preliminary version of this paper appeared as [3]. Relative to [3], this paper has the following additional novel content: first, it uses new, dramatically improved techniques to obtain a smaller architecture than [3] (see Remark 3); second, it includes full proofs of every claim; and third, it contains the extension to system identification architectures noted above.

### 1.1 Related Work

The literature most directly relevant to this paper is work by the authors: AReN [2] and the preliminary version of this paper, [3]. The former contains an algorithm that generates an architecture assured to represent an optimal MPC controller. The AReN algorithm is fully automatic, and the assurances provided are the same as those for the referent MPC controller; but it is not very generalizable, given the restriction to MPC control. This paper and its preliminary version offer a significantly improved methodology. The architecture design presented herein is likewise fully automatic; however, it is generalizable to any Lipschitz continuous control system, and it provides assurances for a wide variety specification captured by simulation relations.

The largest single class of NN architecture design algorithms is commonly referred to as Neural Architecture Search (NAS) algorithms. These algorithms essentially use an iterative improvement/optimization scheme to design a NN architecture, thus automating something like a ‘guess-train-evaluate” hyperparameter tuning loop. There is a large literature on NAS algorithms;

[4] is a good summary. Typical NASs design architectures within some structured class of architectures such as: “chain” architectures (i.e. a sequence of fully connected layers) [5]; chain architectures with different layer types (e.g. convolution, pooling, etc. in addition to fully connected) [6, 7]; or mini-NN architectures that are replicated and interconnected to form larger networks [8]. They then update these architectures according to a variety of different mechanisms: RL formulations [9, 10, 8]; sequential decision problems formulations where actions corresponding to network morphisms [7]; Bayesian optimization formulations [11, 5]

; and Neuro-evolutionary approaches (relatedly population dynamics or genetic algorithms)

[12, 13]. Different evaluation mechanisms are used to evaluate the “quality” of current architecture iterate: lower fidelity models [8] or weight inheritance morphisms [4, 7]

(a sort of transfer learning); learning curve extrapolation to estimate the final performance of an architecture before training has converged

[6]; and one-shot models that agglomerate many architectures into a single large architecture that shares edges between individual architectures [14]. Notably these algorithms all share the same features: they are highly automatic, even accounting for the need to choose meta-hyperparameters; they are fairly general, since they are data-driven; however, they provide no closed-loop assurances.

At the opposite end of our assessment spectra are control-based methods for obtaining NN controllers with assurances; we regard these methods as implicit architecture design techniques, since exhibiting a NN controller serves as a direct validation of that controller’s architecture. Examples of these methods include: directly approximating a controller by a NN for non-affine systems [15]; adaptively learning NN controller weights to ensure Input-to-State stability for certain third-order affine systems [16, 17]; NN hybrid adaptive control for stabilizing uncertain impulsive dynamical systems [18]. These methods are based on the assertion that a function of interest can be approximated by a sufficiently large (usually shallow) NN: the size of this NN is explicit, which limits their effectiveness as architecture design methods. Even neglecting this shortcoming, these methods generally provide just one meaningful assurance (stability); they are not at all general, since they are based on approximating a specific, hand-designed controller; and they are thus highly manual methods.

A subset of NN verification methods from the control system literature is related to the “guess-train-evaluate” architecture design iterations described above. In particular, some closed-loop NN verifiers provide additional dynamical information about how a NN controller fails to meet a specification. Examples include: using complementary analysis on NN-controlled linear systems, thus obtaining sufficient conditions for stability in terms of LMIs [19]; training and verifying a specific NN architecture as a barrier certificate for hybrid systems [20]; and using adversarial perturbation to verify NN control policies [21]. These methods can be assessed as follows: they are highly automatic (verifiers); they are of limited generalizability, since they require either specific models and/or detailed knowledge of the dynamical model; and each provides one and only one assurance (e.g. stability or a barrier certificate). A similar, but less applicable, subset of the control literature that consists of experimental work that suggests promising NN controller architectures. These works do not do automatic NN architecture design, nor do they contain verification algorithms of the type suggested above. Even so, they experimentally support using some conventional NN architectures as controllers [22]; and using Input Convex NNs (ICNNs) for controllers and system identification [23].

We also acknowledge prior work on system identification using NNs, although they do not emphasize architecture design. These methods suffer from a lack of assurances on the resultant NNs/architectures [24, 25, 26, 27].

Finally, subsequent to the publication of [3], we became aware of other works that use more or less what we describe as TLL NNs [28, 29]. The former is concerned with simplification of explicit MPC controllers rather than NN architecture design; the latter can be regarded as architecture design for a different application (although the architectures are less efficient than the ones presented here – see Remark 4).

## 2 Preliminaries

### 2.1 Notation

We denote by , and the set of natural numbers, the set of real numbers and the set of non-negative real numbers, respectively. For a function , let return the domain of , and let return the range of . For , we will denote by the max-norm of . Relatedly, for and we will denote by the open ball of radius centered at as specified by , and its closed-ball analog. Let denote the interior of a set , and denote its boundary. will denote the column of the identity matrix, unless otherwise specified. Let denote the convex hull of a set of points . For , , and will denote the same but restricted to the set . The projection map over will be denoted by , so that returns the component of the vector (in the understood coordinate system). Finally, given two sets and denote by the set of all functions .

### 2.2 Dynamical Model

In this paper, we will assume an underlying, but not necessarily known, continuous-time nonlinear dynamical system specified by an ordinary differential equation (ODE): that is

 ˙x(t)=f(x(t),u(t)) (1)

where the state vector and the control vector . Formally, we have the following definition:

###### Definition 1 (Control System).

A control system is a tuple where

• is the connected, compact subset of the state space with non-empty interior;

• is the compact set of admissible controls;

• is the space of admissible open-loop control functions – i.e. is a function ; and

• is a vector field specifying the time evolution of states according to (1).

A control system is said to be (globally) Lipschitz if there exists constants and s.t. for all and :

 ∥f(x,u)−f(x′,u′)∥≤Kx∥x−x′∥+Ku∥u−u′∥. (2)

In the sequel, we will primarily be concerned with solutions to (1) that result from instantaneous state-feedback controllers, . Thus, we use to denote the closed-loop solution of (1) starting from initial condition (at time ) and using state-feedback controller . We refer to such a as a (closed-loop) trajectory of its associated control system.

###### Definition 2 (Closed-loop Trajectory).

Let be a Lipschitz control system, and let be a globally Lipschitz continuous function. A closed-loop trajectory of under controller and starting from is the function that solves the integral equation:

 ζx0Ψ(t)=x0+∫t0f(ζx0Ψ(σ),Ψ(ζx0Ψ(σ)))dσ. (3)

It is well known that such solutions exist and are unique under these assumptions [30].

###### Definition 3 (Feedback Controllable).

A Lipschitz control system is feedback controllable by a Lipschitz controller if the following is satisfied:

 Ψ∘ζxΨ∈U∀x∈X. (4)

If is feedback controllable for any such , then we simply say that it is feedback controllable.

Because we’re interested in a compact set of states, , we consider only feedback controllers whose closed-loop trajectories stay within .

###### Definition 4 (Positive Invariance).

A feedback trajectory of a Lipschitz control system, , is positively invariant if for all . A controller is positively invariant if is positively invariant for all .

For technical reasons, we will also need the following stronger notion of positive invariance.

###### Definition 5 (δ,τ Positive Invariance).

Let and . Then a positively invariant controller is , positively invariant if

 ∀x0∈edgeδ(X).ζx0Ψ(τ)∈X∖edgeδ(X) (5)

and is positively invariant with respect to .

For a , positively invariant controller, trajectories that start -close to the boundary of will end up -far away from that boundary after seconds, and remain there forever after.

Finally, borrowing from [31], we define a -sampled transition system embedding of a feedback-controlled system.

###### Definition 6 (τ-sampled Transition System Embedding).

Let be a feedback controllable Lipschitz control system, and let be a Lipschitz continuous feedback controller. For any , the -sampled transition system embedding of under is the tuple where:

• is the state space;

• is the set of open loop control inputs generated by -feedback, each restricted to the domain ; and

• such that iff
both and .

is thus a metric transition system [31].

###### Definition 7 (Simulation Relation).

Let and be two metric transition systems. Then we say that simulates , written , if there exists a relation such that

• ; and

• for all we have

 x1u1(⟶x′1⟹ ∃x′2∈X2.((x′1,x′2)∈≾∧x2u2(⟶x′2). (6)

### 2.3 ReLU Neural Network Architectures

We will consider controlling the nonlinear system defined in (1) with a state-feedback neural network controller :

 NN:X→U (7)

where denotes a Rectified Linear Unit Neural Network (ReLU NN). Such a (-layer) ReLU NN is specified by composing layer functions (or just layers). A layer with inputs and outputs is specified by a matrix of weights, , and a matrix of biases, , as follows:

 Lθ:Ri →Ro z ↦max{Wz+b,0} (8)

where the function is taken element-wise, and for brevity. Thus, a -layer ReLU NN function is specified by layer functions whose input and output dimensions are composable: that is, they satisfy for . Specifically:

 NN(x)=(Lθ(K)∘Lθ(K−1)∘⋯∘Lθ(1))(x). (9)

When we wish to make the dependence on parameters explicit, we will index a ReLU function by a list of matrices 222That is, is not the concatenation of the into a single large matrix, so it preserves information about the sizes of the constituent ..

Fixing the number of layers and the dimensions of the associated matrices specifies the architecture of a fully-connected ReLU NN. Therefore, we will use:

 Arch(Θ)≜((n,o1),(i2,o2),…,(iK,m)) (10)

to denote the architecture of the ReLU NN .

Since we are interested in designing ReLU architectures, we will also need the following result from [2, Theorem 7], which states that a Continuous, Piecewise Affine (CPWA) function, , can be implemented exactly using a Two-Level-Lattice (TLL) NN architecture that is parameterized by the local linear functions in .

###### Definition 8 (Local Linear Function).

Let be CPWA. Then a local linear function of is a linear function if there exists an open set such that for all .

###### Definition 9 (Linear Region).

Let be CPWA. Then a linear region of is the largest set such that has only one local linear function on .

###### Theorem 1 (Two-Level-Lattice (TLL) NN Architecture [7, Theorem 7]).

Let be a CPWA function, and let be an upper bound on the number of local linear functions in . Then there is a Two-Level-Lattice (TLL) NN architecture parameterized by and values of such that:

 f=NNΘ\tiny TLL ¯N. (11)

In particular, the number of linear regions of is such an upper bound on the number of local linear functions.

In this paper, we will find it convenient to use Theorem 1 to create TLL architectures component-wise. To this end, we define the following notion of NN composition.

###### Definition 10.

Let and be two -layer NNs with parameter lists:

 Θi=((W|1i,b|1i),…,(W|Ki,b|Ki)),i=1,2. (12)

Then the parallel composition of and is a NN given by the parameter list

 Θ1∥Θ2≜(([W|11W|12],[b|11b|12]),…,([W|K1W|K2],[b|K1b|K2])). (13)

That is accepts an input of the same size as (both) and , but has as many outputs as and combined.

###### Corollary 1.

Let be a CPWA function, each of whose component CPWA functions are denoted by , and let be an upper bound on the number of linear regions in each .

Then for a TLL architecture representing a CPWA , the -fold parallel TLL architecture:

 Θ\tiny TLL∥ nm¯N≜Θ\tiny TLL ¯N∥⋯∥Θ\tiny TLL ¯N (14)

has the property that .

###### Proof.

Apply Theorem 1 component-wise. ∎

Finally, note that a ReLU NN function, , is known to be a continuous, piecewise affine (CPWA) function consisting of finitely many linear segments. Thus, a function is itself necessarily globally Lipschitz continuous.

### 2.4 Notation Pertaining to Hypercubes

Since the unit ball of the max-norm, , on is a hypercube, we will make use of the following notation.

###### Definition 11 (Face/Corner of a hypercube).

Let be a unit hypercube of dimension . A set is a -dimensional face of if there exists a set such that and

 ∀x∈F . ⋀j∈J(πj(x)∈{0,1}). (15)

Let denote the set of -dimensional faces of , and let denote the set of all faces of (of any dimension). A corner of is a -dimensional face of . Furthermore, we will use the notation to denote an full-dimensional face (-dimensional in this case) whose index set and whose projection on the coordinate is .

We extend these definitions by isomorphism to any other hypercube in .

## 3 Problem Formulation: NN Architectures for Control

We begin by stating the first main problem that we will consider in this paper: that of designing an assured NN architecture for nonlinear control (hereafter referred to as the controller architecture problem). Specifically, we wish to identify a ReLU architecture to be used as a feedback controller for the control system ; this architecture must further come with the assurance that there exist parameter weights for which the realized NN controller controls to some proscribed specification.

However, for pedagogical reasons, we will state two versions of the controller architecture problem in this section. The first will be somewhat generic in order to motivate a crucial innovation of this paper: a new simulation relation, Abstract Disturbance Simulation (ADS) (see also [3]). The second formulation of this problem, then, actually incorporates ADS into a formal problem statement, where it serves to facilitate the design of assured controller architectures. Our solution of this second, more specific version, is the main contribution of this paper, and appears as Theorem 2 in Section 4.

### 3.1 Generic Controller Architecture Problem

As noted in Section 1, designing an assured NN architecture for control hinges on the well-posedness of the desired (binary) closed-loop specification, and this is as much a statement about the specification as it is about the architecture. Thus, a formal problem of NN architecture design (for control) necessarily begins with a framework for describing closed-loop system specifications.

To this end, we will formulate our controller architecture problem in terms of a -sampled metric transition system embedding of the underlying continuous-time models (see Section 2.2). Although this choice may seem an unnatural deviation from the underlying continuous-time models, it affords two important benefits. First, metric transition systems come with a natural and flexible notion of specification satisfaction in the form of (bi)simulation relations. In this paradigm, specifications are described by means of another transition system that encodes the specification; the original system then satisfies the specification if it is simulated by the (transition system) encoding of the specification. Importantly, it is well known that a diverse array of specifications can be captured in this context, among them LTL formula satisfaction [32] and stability. Secondly, the sample period

constitutes an additional degree of freedom in the specification relative to the original continuous-time system (or a proscribed fixed sample-period embedding); this extra degree of freedom will facilitate the development of assured NN architectures.

From this, we consider the following generic formulation of a controller architecture design problem.

###### Problem 1 (Controller Architecture Design – Generic Formulation).

Let and be given. Let be a feedback controllable Lipschitz control system, and let be a transition system encoding for a specification on .

Now, suppose that there exists a Lipschitz-continuous controller with Lipschitz constant s.t.:

 Sτ(ΣΨ)⪯Sspec. (16)

Then the problem is to find a ReLU architecture, , with the property that there exists values for such that:

 Sτ(ΣNNΘ)⪯S%spec. (17)

In both (16) and (17), is as defined in Definition 7.

The main assumption in creftype 1 is that there exists a controller which satisfies the specification, . We use this assumption primarily to help ensure that the problem is well posed: indeed, it is known that there are nonlinear control problems for which no continuous controllers exits. Thus, this assumption is in some sense an essential requirement to formulate a well-posed controller architecture problem: for if there exists no such , then there could be no NN controller that satisfies the specification, either, since the latter also belongs to the class of Lipschitz continuous functions (modulo discrepancies in Lipschitz constants). In this way, the existence of a controller also subsumes any possible conditions on the nonlinear system that one might wish to impose: stabilizability, for example.

That creftype 1 is more or less ill-posed without assuming the existence of a controller (for some constant ) suggests a natural solution to the problem. In particular, a NN architecture can be bootstrapped from this knowledge by simply designing an architecture that is sufficiently parameterized as to adequately approximate any such , i.e. any function of Lipschitz constant at most (this is a preview of the approach we will subsequently use). Unfortunately, however, this approach also reveals a deficiency in the assumption associated with : controller approximation necessarily introduces instantaneous control errors relative to , and these errors can compound transition upon transition from the dynamics. As a consequence, the assumed information about is actually not as immediately helpful as it appears. In particular, if is not very robust, then the accumulation of such errors could make it impossible to prove that an (approximate) NN controller satisfies the same specification as , viz. (17).

This effect can be seen directly in terms of the simulation relations in (16) and (17). Take and consider two transitions from : one in given by and one in given by . Note that if merely approximates , there will in general be a discrepancy between and , i.e. . Thus, although necessarily has a simulating state in by assumption, need not have its own simulating state in . This follows because the simulation relation in Definition 7 can assert simulating states only through transitions (i.e. (6)), and there may be no transition in .

### 3.2 Abstract Disturbance Simulation

Motivated by the observations above, we propose a new simulation relation as a formal notion of specification satisfaction for metric transition systems; we call this relation abstract disturbance simulation or ADS (see also [3]). Simulation by means of an ADS relation is stronger than ordinary simulation (Definition 7) in order to incorporate a notion of robustness. Thus, abstract disturbance simulation is inspired by – and is related to – both robust bisimulation [33] and especially disturbance bisimulation [34]. Crucially however, it abstracts those notions away from their definitions in terms of specific control system embeddings and explicit modeling of disturbance inputs. As a result, ADS can then be used in a generic context such as the one suggested by creftype 1.

Fundamentally, ADS still functions in terms of conventional simulation relations; however, it incorporates robustness by first augmenting the simulated system with “virtual” transitions, each of which has a target that is perturbed from the target of a corresponding “real” transition. In this way, it is conceptually similar to the technique used in [31] and [35] to define a quantized abstraction, where deliberate non-determinism is introduced in order to account for input errors. As a result of these additional transitions, when a metric transition system is ADS simulated by a specification, this implies that the system robustly satisfies the specification relative to satisfaction merely by an ordinary simulation relation.

As a prerequisite for defining ADS, we introduce the following definition: it captures the idea of augmenting a metric transition system with virtual, perturbed transitions.

###### Definition 12 (Perturbed Metric Transition System).

Let be a metric transition system where for a metric space . Then the -perturbed metric transition system of , , is a tuple where the (altered) transition relation, , is defined as:

 xuSδ \thickspace(⟶x′ iff % ∃x′′∈X s.t. d(x′′,x′)≤δ and xuS(⟶x′′. (18)

Note that has identical states and input labels to , and it also subsumes all of the transitions therein, i.e. . However, as noted above, the transition relation for explicitly contains new nondeterminism relative to the transition relation of ; each additional nondeterministic transition is obtained by perturbing the target state of a transition in .

Using this definition, we can finally define an abstract disturbance simulation between two metric transition systems.

###### Definition 13 (Abstract Disturbance Simulation).

Let and be metric transition systems whose state spaces and are subsets of the same metric space . Then abstract-disturbance simulates under disturbance , written if there is a relation such that

1. for every , ;

2. for every there exists a pair ; and

3. for every and there exists a such that .

###### Remark 1.

corresponds with the usual notion of simulation for metric transition systems. Thus,

### 3.3 Main Controller Architecture Problem

Using ADS for specification satisfaction, we can now state the version of creftype 1 that we will consider and solve as the main result of this paper.

###### Problem 2 (Controller Architecture Design).

Let , and be given. Let be a feedback controllable Lipschitz control system, and let be a transition system encoding for a specification on .

Now, suppose that there exists a , positively invariant Lipschitz-continuous controller with Lipschitz constant such that:

Then the problem is to find a ReLU architecture, , with the property that there exists values for such that:

creftype 2 is distinct from creftype 1 in two crucial ways. The first of these is the foreshadowed use of ADS for specification satisfaction with respect to . In particular, we now assume the existence of a controller, , that satisfies the specification up to some fixed robustness margin , as captured by . This will be the main technical facilitator of our solution, since it enables the design of a NN architecture around the (still unknown) controller .

However, creftype 2 also has a second additional assumption relative to creftype 1: the controller must also be - positive invariant with respect to the compact subset of states under consideration, – see Definition 5. This is a technically, but not conceptually, relevant assumption, and it is an artifact of the fact that we are confining ourselves to a compact subset of the state space. It merely ensures that those perturbations created internally to will lie entirely within . - invariance captures this by asserting that those states within of the boundary of (i.e. ) are “pushed” sufficiently strongly towards the interior of so that after seconds, they are no longer close to the boundary of (i.e. in ). Thus, every transition in starting from has a target in , and the -perturbed version of has no transitions with targets outside of .

## 4 ReLU Architectures for Controlling Nonlinear Systems

We are almost able to state main theorem of this paper: that is Theorem 2, which directly solves creftype 2. As a necessary prelude, though, we introduce the following two definitions. The first formalizes the distance between coordinate-wise upper and lower bounds of a compact set (Definition 14). The second formalizes the notion of a (rectangular) grid of points that is sufficiently fine to cover a compact set by -norm balls of a fixed size (Definition 15).

###### Definition 14 (Extent of X).

The extent of a compact set is defined as:

 ext(X)≜maxk=1,…,n∣∣∣maxx∈Xπk(x)−minx∈Xπk(x)∣∣∣. (22)

Indeed, the extent of a compact set may also be regarded as the smallest edge length of a hypercube that can contain .

###### Definition 15 (η-grid).

Let be given, and let be compact and connected with non-empty interior. Then a set is an -grid of if

• the set of -balls, , has the properties that:

1. for all and , there is an integer such that ; and

2. .

Elements of will be denoted by bold-face font, i.e. .

Note: 1) asserts that the elements of are spaced on a rectangular grid, and 2) asserts that centering a closed ball at each element of covers the set .

###### Remark 2.

It is not the case that a compact and connected set with non-empty interior necessarily has an grid for any arbitrary choice of ; see Fig. 1 for an illustration.

Now we can state the main theorem of the paper.

###### Theorem 2 (ReLU Architecture for Control).

Let , and be given, and let and be as in the statement of creftype 2. Furthermore, suppose that there exists a positively invariant Lipschitz continuous controller with Lipschitz constant such that:

Finally, suppose that such that:

 Ku⋅μ⋅τ⋅e(Kx+2KuKcont)τ<δ. (24)

If is such that there exists an -grid of , then there exists an -fold parallel TLL NN architecture of size (see Corollary 1):

 N≥n!⋅⌈ext(X)η+2⌉n (25)

with the property that there exist values for such that:

###### Remark 3.

Note that the coefficient in (25) is significantly smaller than the analogous one exhibited in [3]: viz. . Furthermore, [3] also failed to account for the need to choose an -grid.

The size of the architecture specified by Theorem 2 is effectively determined by the ratio (for given state and control dimensions). This quantity has two specific connections to the assumptions in creftype 2. On the one hand, the maximum allowable is set by , which is determined by the Lipschitz constants of the dynamics and the properties of the assumed controller, . On the other hand, the specific character of state set itself influences the size the architecture, both explicitly by way of and implicitly via the requirement that an -grid exists for .

With regard to the first of these influences, (24) can be rearranged to provide the following upper bound for :

 η≤13⋅Kcont⋅Kue−(Kx+2KuKcont)τ⋅(δτ). (27)

Of unique interest is the influence of the unknown but asserted-to-exist controller, . In particular, note that the robustness of has an intuitive effect on the size of the architecture. As the robustness of the asserted controller increases (for fixed ), i.e. as increases, is permitted to be larger, and the architecture can be correspondingly smaller. Of course as the asserted controller becomes less robust, i.e. as decreases, the architecture must correspondingly become larger. This trade off makes intuitive sense in light of our eventual proof strategy: in effect, we will design an architecture that can approximate any potential , so the more robust is, the less precisely the NN architecture must be able to approximate it – so a smaller architecture suffices (and vice versa). See Section 4.1.

With regard to the second influence, the extent of the set has a straightforward and direct effect on the designed architecture: the “larger” the set is, the larger the architecture is required. However, the “complexity” of the set indirectly influences the size of the architecture via the -grid requirement. Indeed, if the boundary of has many thin protuberances (see Fig. 1, for example), then an extremely fine grid may be required to ensure that grid points are placed throughout . Crucially, this choice of may need to be significantly smaller than the maximum allowable computed via (24), as above – the result will be a significantly larger architecture than suggested by (24) alone. Unfortunately, this effect is difficult to quantify directly, given the variability in complexity of state sets. Nevertheless, given any connected compact set , there exists an for which has an grid.

###### Proposition 1.

Let be compact and connected with a non-empty interior. Then there exists an such that there exists an grid of .

###### Proof.

We will consider dyadic grids: that is grids based on , where the associated candidate grid is given by

 Xηk≜{x∈X|∃z∈Z.x=z2k}. (28)

For convenience, define the following notation:

 Hηk≜∪x∈Xηk¯¯¯¯B(x;ηk) (29)

Now because is connected with non-empty interior, it the case that . Hence, for every , there exists an such that for all , (consider a truncation of the binary expansion of each coordinate of ). It follows that if there is a such that for all , then the claim is proved (simply choose ).

Thus, suppose by contradiction that there is a divergent sequence s.t. for all . However, is compact, so has a convergent subsequence with limit ; this subsequence retains the property that . Moreover, by the above claim, there is some such that for all .

Now we consider two cases: first that for some , and second that is on the boundary of some . In the first case, we note that there exists an such that for all , , simply by the convergence of the subsequence to . This is clearly a contradiction, since it implies that for all .

On the other hand, suppose is on the boundary of some , and suppose that belongs to a -dimensional face of where (if then is itself a dyadic point, and the above argument applies directly, since ). Let this face be denoted . Then there exists a finite such that for some : use coordinate-wise binary expansions to find a dyadic point within the face that includes in the associated open ball (this is possible since has at least one non-dyadic coordinate by the assumption). This leads to the same contradiction as before, since a tail of the subsequence is eventually contained in this ball. ∎

The remainder of this section is divided as follows. Section 4.1 contains a proof sketch of Theorem 2, which divides the proof into two main intermediate steps. Section 4.2 and Section 4.3 thus contain the formal proofs for these intermediate steps. The overall formal proof of Theorem 2 then appears in Section 4.4.

### 4.1 Proof Sketch of Theorem 2

Our proof of Theorem 2 implements the following simple strategy. By assumption, a exists that satisfies the specification robustly (via ADS), and hence, we show that any suitably close approximation of will function as a controller that satisfies the specification as well (albeit non-robustly). This implies that we need only design a NN architecture with enough parameterization that it can approximate any possible that satisfies the conditions of the theorem.

There is, however, an important and non-obvious sequence of observations required to design such a NN architecture. To start, any such is Lipschitz continuous, so it is possible to uniformly approximate it by interpolating between its values taken on a uniform grid. Moreover, its Lipschitz constant has a known upper bound, so for a given approximation accuracy, the fineness of that grid can be chosen conservatively, and hence independent of any particular . However, we show it is possible to interpolate between points over such a uniform grid using a CPWA that has a number of linear regions (Definition 9) proportional to the number of points in the grid – which we reiterate is independent of any particular . This CPWA is in effect parameterized by the values it takes on those grid points, but in such a way that its number of linear regions is independent of those parameter values. This shows that an arbitrary can be approximated by a CPWA with a fixed number of linear regions; it remains to connect this to a NN architecture. Fortunately, the TLL NN architecture [2] can be used directly for this purpose by way of Theorem 1 [2, Theorem 7]: the result in question explicitly specifies a NN architecture that can implement any CPWA with a known, bounded number of linear regions.

Consequently, the proof of Theorem 2 can instead be decomposed into establishing the following two implications:

1. “Approximate controllers satisfy the specification”: There is a approximation accuracy, , and sampling period, , with the following property: if the unknown controller satisfies the specification (under disturbance and sampling period ), then any controller – NN or otherwise – that approximates to accuracy in will also satisfy the specification (but under no disturbance). See Lemma 1.

2. “Any controller can be approximated by a CPWA with the same fixed number of linear regions”: If unknown controller has a Lipschitz constant , then can be approximated by a CPWA with a number of regions that depends only on and the approximation accuracy. See Corollary 2.

The conclusion of Theorem 2 then follows from Step 1 and Step 2 by means Theorem 1 [2, Theorem 7], since any CPWA with the same number of linear regions (or fewer) can be implemented exactly by a common TLL NN architecture.

###### Remark 4.

Unlike our use of Theorem 1 to directly construct a single TLL, the architectures used in [29] consist of two successive TLL layers. The first is used to represent “basis” functions specified over overlapping regular polytopes (decomposable by a common set of simplexes [29, Figure 3.2]); and the second captures the sum of these basis functions. This approach leads to larger architectures compared to our direct approach: consider [29, Eq. (5.3)], which has exponentially many neurons in the number of linear regions. By contrast, the architecture of Theorem 2 has only polynomially many neurons as a function of the number of regions: i.e. as a function of , polynomially many units are needed [2], so the network has polynomially many neurons; and each network is straightforwardly polynomial in .

### 4.2 Proof of Theorem 2, Step 1: Approximate Controllers Satisfy the Specification

The goal of this section is to choose constant such that: for any satisfying the assumptions in Theorem 2, then any other controller with also satisfies the specification, i.e. .

Specifically, we will prove the following lemma.

###### Lemma 1.

Let be as in creftype 2. Also, let be - positively invariant w.r.t , and have Lipschitz constant at most . Further suppose that is such that:

 Ku⋅μ⋅τ⋅e(Kx+KuKΥ)τ<δ. (30)

Then for any that is Lipschitz continuous with Lipschitz constant , we have that

And hence if in addition , then:

Our specific approach to prove this will be as follows. First, we choose small enough such that a -second flow of doesn’t deviate by more that from over its duration, . This is accomplished by means of a Grönwall-type bound that uses . That is assume is - positively invariant, and use in the Grönwall inequality:

 (33)

Then we use this conclusion to construct the appropriate simulation relations that show:

First, we formally obtain the desired conclusion of (33) by way of the following proposition.

###### Proposition 2.

Let and be as in the statement of Lemma 1, and let be Lipschitz continuous with Lipschitz constant . Also, suppose that is such that:

 Ku⋅μ⋅τ⋅e(Kx+KuKΥ)τ<δ. (35)

If , then for any :

 ∥ζxΥ(τ)−ζxΨ(τ)∥≤δ. (36)
###### Proof.

By definition,we have:

 ∥ζxΥ(t)−ζxΨ(t)∥ ≤∫t0∥f(ζxΥ(σ),Υ(ζxΥ(σ)))−f(ζxΨ(t),Ψ′(ζxΨ(t)))∥dσ (37)

Now, we consider bounding the second normed quantity in (37). In particular, we have that:

 ∥Υ(ζxΥ(σ))−Ψ(ζxΨ(σ))∥ ≤∥Υ(ζxΥ(σ))−Υ(ζxΨ(σ))∥+∥Υ(ζxΨ(σ))−Ψ(ζxΨ(σ))∥ ≤KΥ⋅∥ζxΥ(σ)−ζxΨ(σ)∥+∥Υ−Ψ∥X. (38)

The first term is so bounded because is assumed to be globally Lipschitz continuous on all of . In particular, the first term of (38) can be bounded using the global Lipschitz continuity of , whether lies in or not. The second term is bounded as claimed because for all by the assumption of (-) forward invariance of : consequently, the -restricted norm may be employed.

Thus, using (37) and (38), we obtain the bound

 ∥ζxΥ(t)−ζxΨ(t)∥≤∫t0(Kx+KuKΥ)⋅∥ζxΥ(σ)−ζxΨ(σ)∥ +Ku⋅∥Υ−Ψ∥Xdσ. (39)

where the second term of the integrand is bounded by the constant by assumption. The claimed bound now follows from (39) by the Grönwall Inequality [30]. ∎

Now we can prove the main result of this section, Lemma 1; this proof follows more or less directly from Proposition 2.

###### Proof.

(Lemma 1) By definition, and have the same state spaces, . Thus, we propose the following as an abstract disturbance simulation under disturbance (i.e. a conventional simulation for metric transition systems):

 R={(x,x)|x∈X}. (40)

Clearly, satisfies the property that for all , , and for every , there exists an such that . Thus, it only remains to show the third property of Definition 13 under disturbance.

To wit, let . Then suppose that , so that in ; we will show subsequently that any such must be in . In this situation, it suffices to show that