1 Introduction
Remarkable progress has been made in reinforcement learning (RL) using (deep) neural networks to solve complex decisionmaking and control problems
[43]. While RL algorithms, such as policy gradient [52, 26, 41], Qlearning [49, 35], and actorcritic methods [32, 34] aim at optimizing control performance, the security aspect is of great importance for missioncritical systems, such as autonomous cars and power grids [20, 4, 44]. A fundamental problem is to analyze or certify stability of the interconnected system in both RL exploration and deployment stages, which is challenging due to its dynamic and nonconvex nature [20].The problem under study focuses on a general continuoustime dynamical system:
(1) 
with the state and the control action . In general, can be a timevarying and nonlinear function, but for the purpose of stability analysis, we study the important case that
(2) 
where comprises of a linear timeinvariant (LTI) component
that is Hurwitz (i.e., every eigenvalue of
has strictly negative real part), a control matrix , and a slowly timevarying component that is allowed to be nonlinear and even uncertain.^{1}^{1}1This requirement is not difficult to meet in practice, because one can linearize any nonlinear systems around the equilibrium point to obtain a linear component and a nonlinear part. The condition that is stable is a basic requirement, but the goal of reinforcement learning is to design a controller that optimizes some performance metric that is not necessarily related to the stability condition. For feedback control, we also allow the controller to obtain observations that are a linear function of the states, where may have a sparsity pattern to account for partial observations in the context of decentralized controls [8].Suppose that is a neural network given by an RL agent (parametrized by , which can be timevarying due to learning) to optimize some reward
revealed through the interaction with the environment. The exploration vector
captures the additive randomization effect during the learning phase, and is assumed to have a bounded energy over time (). The main goal is to analyze the stability of the system with the actuation of , which is typically a neural network controller, as illustrated in Fig. 1. Specifically, the stability criterion is stated using the concept of gain [55, 16].^{2}^{2}2This stability metric is widely adopted in practice, and is closely related to boundedinput boundedoutput (BIBO) stability and absolute stability (or asymptotic stability). For controllable and observable LTI systems, the equivalence can be established.Definition 1 (Inputoutput stability)
The gain of the system controlled by is the worstcase ratio between total output energy and total input energy:
(3) 
where is the set of all squaresummable signals, is the total energy over time, and is the control input with exploration. If is finite, then the interconnected system is said to have inputoutput stability (or finite gain).
This study investigates the possibility of using the gradient information of the policy to obtain a stability certificate, because this information can be easily extracted in realtime and is generic enough to include a large set of performanceoptimizing nonlinear controllers. Let be the set notation. By denoting
(4) 
as the set of controllers whose partial derivatives are bounded by and , it is desirable to provide stability certificate as long as the RL policy remains within the above “safety set.” Indeed, this can be checked efficiently, as stated (informally) in the following theorem.
Theorem 1 (Main result)
We call the constants and stabilitycertified gradient bounds for the underlying system. The above result is based on the intuition that a realworld stable controller should exhibit “smoothness” in the sense that small changes in the input should lead to small changes in the output. This incorporates the special case where controllers are known to have bounded Lipschitz constants (a simple strategy to calculate the Lipschitz constant of a deep neural network is suggested in [48]). To compute the gradient bounds, we borrow powerful ideas from the framework of integral quadratic constraint (in frequency domain) [33] and dissipativity theory (in time domain) [51] for robustness analysis. While these tools are celebrated with their nonconservatism in the robust control literature, existing characterizations of multiinput multioutput (MIMO) Lipschitz functions are insufficient. Thus, one major obstacle is to derive nontrivial bounds that could be of use in practice.
To this end, we develop a new quadratic constraint on gradientbounded functions, which exploits the sparsity of the control architecture and the nonhomogeneity of the output vector. Some key features of the stabilitycertified smoothness bounds are as follows: (a) the bounds are inherent to the targeted realworld control task; (b) they can be computed efficiently by solving some semidefinite programming (SDP) problem; (c) they can be used to certify stability when reinforcement learning is employed in realworld control with either offpolicy or onpolicy learning [47]. Furthermore, the stability certification can be regarded as an procedure, and we analyze its conservatism to show that it is necessary for the robustness of a surrogate system that is closely related to the original system.
The paper is organized as follows. Preliminaries on policy gradient reinforcement learning, the integrated quadratic constraint (IQC) and dissipativity frameworks are presented in Section 2. Main results on gradient bounds for a linear or nonlinear system are presented in Section 3, where we also analyze the conservatism of the certificate. The method is evaluated in Section 4 on two nonlinear decentralized control tasks. Conclusions are drawn in Section 5.
2 Preliminary
In this section, we give an overview of the main topics relevant to this study, namely policy gradient reinforcement learning and robustness analysis based on IQC framework and dissipativity theory.
2.1 Reinforcement learning using policy gradient
Reinforcement learning aims at guiding an agent to perform a task as efficiently and skillfully as possible through interactions with the environment. The control task is modeled as a Markov decision process (MDP), defined by the tuple
, where is the set of states , is a set of actions , indicates the world dynamics as in (1), is the reward at state and action , and is the factor to discount future rewards. A control strategy is defined by a policy , which can be approximated by a neural network with parameters. For a continuous control, the actions follow a multivariate normal distribution, where
is the mean, and the standard deviation in each action dimension is set to be a diminishing number during exploration or learning, and 0 during actual deployment. With a slight abuse of notations, we use
to denote this normal distribution over actions, and use to denote for simplicity. The goal of RL is to maximize the expected return:(5) 
where is the control horizon, and the expectation is taken over the policy, the initial state distribution and the world dynamics.
From a practitioner’s point of view, the existing methods can be categorized into four groups based on how the optimal policy is determined: (a) policy gradient methods directly optimize the policy parameters
by estimating the gradient of the expected return (e.g., REINFORCE
[52], natural policy gradient [26], and trust region policy optimization (TRPO) [41]); (b) valuebased algorithms like Qlearning do not aim at optimizing the policy directly, but instead approximate the Qvalue of the optimal policy for the available actions [49, 35]; (c) actorcritic algorithms keep an estimate of the value function (critic) as well as a policy that maximizes the value function (actor) (e.g., DDPG [32] and A3C [34]); lastly, (d) modelbased methods focus on the learning of the transition model for the underlying dynamics, and then use it for planning or to improve a policy (e.g., Dyna [46] and guided policy search [30]). We adopt an approach based on endtoend policy gradient that combines TRPO [41] with natural gradient [26] and smoothness penalty (this method is very useful for RL in dynamical systems described by partial or difference equations).Trust region policy optimization is a policy gradient method that constrains the step length to be within a “trust region” so that the local estimation of the gradient/curvature has a monotonic improvement guarantee. By manipulating the expected return using the identity proposed in [25], the “surrogate objective” can be designed:
(6) 
where the expectation is taken over the old policy , the ratio inside the expectation is also known as the importance weight, and is the advantage function given by:
(7) 
where the expectation is with respect to the dynamics (the dependence on is omitted), and it measures the improvement of taking action at state over the old policy in terms of the value function . A bound on the difference between and has been derived in [41], which also proves a monotonic improvement result as long as the KL divergence between the new and old policies is small (i.e., the new policy stays within the trust region). In practice, the surrogate loss can be estimated using trajectories sampled from as follows,
(8) 
and the averaged KL divergence over observed states can be used to estimate the trust region.
Natural gradient
is defined by a metric based on the probability manifold induced by the KL divergence. It improves the standard gradient by making a step invariant to reparametrization of the parameter coordinates
[3]:(9) 
where is the standard gradient, is the Fisher information matrix estimated with the trajectory data, and is the step size. In practice, when the number of parameters is large, conjugate gradient is employed to estimate the term without requiring any matrix inversion. Since the Fisher information matrix coincides with the secondorder approximation of the KL divergence, one can perform a backtracking line search on the step size to ensure that the updated policy stays within the trust region.
Smoothness penalty is introduced in this study to empirically improve learning performance on physical dynamical systems. Specifically, we propose to use
(10) 
as a regularization term to induce consistency during exploration. The intuition is that since the change in states between two consecutive time steps is often small, it is desirable to ensure small changes in output actions. This is closely related to another penalty term that has been used in [15]
, which is termed “double backpropagation”, and recently rediscovered in
[37, 22]:(11) 
which penalizes the gradient of the policy along the trajectories. Since bounded gradients lead to bounded Lipshitz constant, these penalties will induce smooth neural network functions, which is essential to ensure generalizability and, as we will show, stability. In addition, we incorporate a hard threshold (HT) approach that rescales the weight matrices at each layer by if , where is the Lipschitz constant of the neural network , is the number of layers of the neural network and is the certified Lipschitz constant. This ensures that the Lipschitz constant of the RL policy remains bounded by .
In summary, our policy gradient is based on the weighted objective:
(12) 
where the penalty coefficients and are selected such that the scales of the corresponding terms are about of the surrogate loss value . In each round, a set of trajectories are collected using , which are used to estimate the gradient and the Fisher information matrix ; a backtracking line search on the step size is then conducted to ensure that the updated policy stays within the trust region. This learning procedure is known as onpolicy learning [47].
2.2 Overview of IQC framework
The IQC theory is celebrated for systematic and efficient stability analysis of a large class of uncertain, dynamic, and interconnected systems [33]. It unifies and extends classical passitivitybased multiplier theory, and has close connections to dissipativity theory in the time domain [42].
To state the IQC framework, some terminologies are necessary. We define the space for signals supported on , where denotes the spatial dimension of , and the extended space (we will use and if it is not necessary to specify the dimension and signal support), where we use to denote the signal in general and to denote its value at time . For a vector or matrix, we use superscript to denote its conjugate transpose. An operator is causal if the current output does not depend on future inputs. It is bounded if it has a finite gain. Let be a bounded linear operator on a Hilbert space. Then, its Hilbert adjoint is the operator such that for all , where denotes the inner product. It is selfadjoint if .
Consider the system (see also Fig. 1)
(13)  
(14) 
where is the transfer function of a causal and bounded LTI system (i.e., it maps input to output through the internal state dynamics ), is the disturbance, and is a bounded and causal function that is used to represent uncertainties in the system. IQC provides a framework to treat uncertainties such as nonlinear dynamics, model approximation and identification errors, timevarying parameters and disturbance noise, by using their inputoutput characterizations.
Definition 2 (Integral quadratic constraints)
Consider the signals and
associated with Fourier transforms
and , and , where is a bounded and causal operator. We present both the frequency and timedomain IQC definitions:
(Frequency domain) Let be a bounded and selfadjoint operator. Then, is said to satisfy the IQC defined by (i.e., ) if:
(15) 
(Time domain) Let be any factorization of such that is stable and . Then, is said to satisfy the hard IQC defined by (i.e., ) if:
(16) where is the filtered output given by the stable operator . If instead of requiring nonnegativity at each time , the nonnegativity is considered only when , then the corresponding condition is called soft IQC.
As established in [42], the time and frequencydomain IQC definitions are equivalent if there exists as a spectral factorization of with such that and are stable.
Example 1 (Sector IQC)
A singleinput singleoutput uncertainty is called “sector bounded” between if , for all and . It thus satisfies the sector IQC with and . It also satisfies IQC with defined above.
Example 2 ( gain bound)
A MIMO uncertainty has the gain if , where . Thus, it satisfies IQC with and , where . It also satisfies IQC with defined above. This can be used to characterize nonlinear operators with fast timevarying parameters.
Before stating a stability result, we define the system (13)–(14) (see Fig. 1) to be wellposed if for any , there exists a solution , which depends causally on . A main IQC result for stability is stated below:
Theorem 2 ([33])
The above theorem requires three technical conditions. The wellposedness condition is a generic property for any acceptable model of a physical system. The second condition is implied if has the properties and . The third condition is central, and it requires checking the feasibility at every frequency, which represents a main obstacle. As discussed in Section Section 3.2, this condition can be equivalently represented as a linear matrix inequality (LMI) using the KalmanYakubovichPopov (KYP) lemma. In general, the more IQCs exist for the uncertainty, the better characterization can be obtained. If , , where is the number of IQCs satisfied by , then it is easy to show that , where ; thus, the stability test (17) becomes a convex program, i.e., to find such that:
(18) 
The counterpart for the frequencydomain stability condition in the timedomain can be stated using a standard dissipation argument [42].
2.3 Related work
To close this section, we summarize some connections to existing literature. This work is closely related to the body of works on safe reinforcement learning, defined as the process of learning policies that maximize performance in problems where safety is required during the learning and/or deployment [20]. A detailed literature review can be found in [20], which has categorized two main approaches by modifying: (1) the optimality condition with a safety factor, and (2) the exploration process to incorporate external knowledge or risk metrics. Riskaversion can be specified in the reward function, for example, by defining risk as the probability of reaching a set of unknown states in a discrete Markov decision process setting [14, 21]. Robust MDP is designed to maximize rewards while safely exploring the discrete state space [36, 50]. For continuous states and actions, robust model predictive control can be employed to ensure robustness and safety constraints for the learned model with bounded errrors [7]. These methods require an accurate or estimated models for policy learning. Recently, modelfree policy optimization has been successfully demonstrated in realworld tasks such as robotics, business management, smart grid and transportation [31]. Safety requirement is high in these settings. Existing approaches are based on constraint satisfaction that holds with high probability [45, 1].
The present analysis tackles the safe reinforcement learning problem from a robust control perspective, which is aimed at providing theoretical guarantees for stability [55]. Lyapunov functions are widely used to analyze and verify stability when the system and its controller are known [39, 10]. For nonlinear systems without global convergence guarantees, region of convergence is often estimated, where any state trajectory that starts within this region stays within the region for all times and converges to a target state eventually [27]. For example, recently, [9] has proposed a learningbased Lyapunov stability verification for physical systems, whose dynamics are sequentially estimated by Gaussian processes. In the same vein, [2]
has employed reachability analysis to construct safe regions in the state space by solving a partial differential equation. The main challenge of these methods is to find a suitable nonconservative Lyapunov function to conduct the analysis.
The IQC framework proposed in [33] has been widely used to analyze the stability of largescale complex systems such as aircraft control [19]. The main advantages of IQC are its computational efficiency, nonconservatism, and unified treatment of a variety of nonlinearities and uncertainties. It has also been employed to analyze the stability of smallsized neural networks in reinforcement learning [28, 5]
; however, in their analysis, the exact coefficients of the neural network need to be known a priori for the static stability analysis, and a region of safe coefficients needs to be calculated at each iteration for the dynamic stability analysis. This is computationally intensive, and it quickly becomes intractable when the neural network size grows. On the contrary, because the present analysis is based on a broad characterization of control functions with bounded gradients, it does not need to access the coefficients of the neural network (or any forms of the controller). In general, robust analysis using advanced methods such as structured singular value
[38] or IQC can be conservative. There are only few cases where the necessity conditions can be established, such as when the uncertain operator has a block diagonal structure of bounded singular values [16], but this set of uncertainties is much smaller than the set of performanceoriented controllers learned by RL. To this end, we are able to reduce conservatism of the results by introducing more informative quadratic constraints for those controllers, and analyze the necessity of the certificate criteria. This significantly extends the possibilities of stabilitycertified reinforcement learning to large and deep neural networks in nonlinear largescale realworld systems, whose stability is otherwise impossible to be certified using existing approaches.3 Main results
This section will introduce a set of quadratic constraints on gradientbounded functions, describe the computation of a smoothness margin for linear (Theorem 3) and nonlinear systems (Theorem 4). Furthermore, we examine the conservatism of the certificate condition in Theorem 3 for linear systems.
3.1 Quadratic constraints on gradientbounded functions
The starting point of this analysis is a less conservative constraint on general vectorvalued functions. We start by recalling the definition of a Lipschitz continuous function:
Definition 3 (Lipschitz continuous function)
We define both the local and global versions of the Lipschitz continuity for a function :

The function is locally Lipschitz continuous on the open subset if there exists a constant (i.e., Lipschitz constant of on ) such that
(19) 
If is Lipschitz continuous on with a constant (i.e., in (19)), then is called globally Lipschitz continuous with the Lipschitz constant .
Lipschitz continuity implies uniform continuity. The above definition also establishes a connection between locally and globally Lipschitz continuity. The norm in the definition can be any norm, but the Lipschitz constant depends on the particular choice of the norm. Unless otherwise stated, we use the Euclidean norm in our analysis.
To explore some useful properties of Lipschitz continuity, consider a scalarvalued function (i.e., ). Let denote a hybrid vector between and , with and . Then, local Lipschitz continuity of on implies that
(20) 
If we were to assume that is differentiable, then it follows that its (partial) derivative is bounded by the Lipschitz constant. For a vectorvalued function that is Lipschitz, it is necessary that every component be Lipschitz. In general, every continuously differentiable function is locally Lipschitz, but the reverse is not true, since the definition of Lipschitz continuity does not require differentiability. Indeed, by the Rademacher’s theorem, if is locally Lipschitz on , then it is differentiable at almost every point in [13].
For the purpose of stability analysis, we can express (19) as a pointwise quadratic constraint:
(21) 
The above constraint, nevertheless, can be sometimes too conservative, because it does not explore the structure of a given problems. To elaborate on this, consider the function defined as
(22) 
where and is a deterministic but unknown parameter with a bounded magnitude. Clearly, to satisfy (19) on for all possible tuples , we need to choose (i.e., the function has the Lipshitz constant 1). However, this characterization is too general in this case, because it ignores the nonhomogeneity of and , as well as the sparsity of the problem representation. Indeed, only depends on with its slope restricted to for all possible , and only depends on with its slope restricted to . In the context of controller design, the nonhomogeneity of control outputs often arises from physical constraints and domain knowledge, and the sparsity of control architecture is inherent in scenarios with distributed local information. To explicitly address these requirements, we state the following quadratic constraint.
Lemma 1
For a vectorvalued function that is differentiable with bounded partial derivatives on (i.e., for all ), the following quadratic constraint is satisfied for all , , , and :
(23) 
where is given by
(24) 
where denotes a diagonal matrix with diagonal entries specified by , and is determined by and , is a set of nonnegative multipliers that follow the same index order as , , , , and is related to the output of by the constraint:
(25) 
where denotes the Kronecker product.
Proof
For a vectorvalued function that is differentiable with bounded partial derivatives on (i.e., for all ), there exist functions bounded by for all and such that
(26) 
By defining , since , it follows that
(27) 
The result follows by introducing nonnegative multipliers , and the fact that .
This above bound is a direct consequence of standard tools in real analysis [54]. To understand this result, it can be observed that (23) is equivalent to:
(28) 
with , where depends on and . Since (28) holds for all , it is equivalent to the condition that for all and , which is a direct result of the bounds imposed on the partial derivatives of . To illustrate its usage, let us apply the constraint to characterize the example function (22), where , and all the other bounds () are zero. This clearly yields a more informative constraint than merely relying on the Lipschitz constraint (21). In fact, for a differentiable Lipschitz function, we have , and by limiting the choice of , (28) is reduced to (21). However, as illustrated in this example, the quadratic constraint in Lemma 1 can incorporate richer information about the structure of the problem; therefore, it often gives rise to nontrivial stability bounds in practice.
The constraint introduced above is not a classical IQC, since it involves an intermediate variable that relates to the output through a set of linear equalities. For stability analysis, let be the equilibrium point, and without loss of generality, assume that and . Then, one can define the quadratic functions
and the condition (23) can be written as
(29) 
which can be used to characterize the set of associated with the function , as we will discuss in Section 3.4.
To simplify the mathematical treatment, we have focused on differentiable functions in Lemma 1
; nevertheless, the analysis can be extended to nondifferentiable but continuous functions (e.g., the ReLU function
) using the notion of generalized gradient [13, Chap. 2]. In brief, by reassigning the bounds on partial derivatives to uniform bounds on the set of generalized partial derivatives, the constraint (23) can be directly applied.In relation to the existing IQCs, this constraint has wider applications for the characterization of gradientbounded functions. The ZamesFalb IQC introduced in [53] has been widely used for singleinput singleoutput (SISO) functions , but it requires the function to be monotone with the slope restricted to with , i.e., whenever . The MIMO extension holds true only if the nonlinear function is restricted to be the gradient of a convex realvalued function [40, 24]. As for the sector IQC, the scalar version can not be used (because it requires whenever there exists such that , which is extremely restrictive), and the vector version is in fact (21). In contrast, the quadratic constraint in Lemma 1 can be applied to nonmonotone, vectorvalued Lipschitz functions.
3.2 Computation of the smoothness margin
With the newly developed quadratic constraint in place, this subsection explains the computation for a smoothness margin of an LTI system , whose statespace representation is given by:
(30) 
where is the state (the dependence on is omitted for simplicity). The system is assumed to be stable, i.e., is Hurwitz. We can connect this linear system in feedback with a controller . The signal is the exploration vector introduced in reinforcement learning, and is the policy action. We are interested in certifying the set of gradient bounds of such that the interconnected system is inputoutput stable at all time , i.e.,
(31) 
where is a finite upper bound for the gain. Let or denote that is positive semidefinite or positive definite, respectively. To this end, define the as follows:
(32) 
where and
where is defined in (24). We will show next that the stability of the interconnected system can be certified using linear matrix inequalities.
Theorem 3
Let be stable (i.e., is Hurwitz) and be a bounded causal controller. Assume that:

the interconnection of and is wellposed;

has bounded partial derivatives on (i.e., , for all , and ).
If there exist and a scalar such that is feasible, then the feedback interconnection of and is stable (i.e., it satisfies (31)).
Proof
The proof follows a standard dissipation argument. To proceed, we multiply to the left and its transpose to the right of the augmented matrix in (32), and use the constraints and . Then, can be written as a dissipation inequality:
where is known as the storage function, and is its derivative with respect to time . Because the second term is guaranteed to be nonnegative by Lemma 1, if is feasible with a solution , we have:
(33) 
which is satisfied at all times . From wellposedness, the above inequality can be integrated from to , and then it follows from that:
(34) 
Hence, the interconnected system with the RL policy is stable.
The above theorem requires that be stable when there is no feedback policy . This is automatically satisfied in many physical systems with an existing stabilizing (but not performanceoptimizing) controller. In the case that the original system is not stable, one needs to first design a controller to stablize the system or design the controller under uncertainty (in this case, the RL policy), which are wellstudied problems in the literature (e.g., controller synthesis [16]). Then, the result can be used to ensure stability while delegating reinforcement learning to optimize the performance of the policy under gradient bounds.
The above result essentially suggests a computational approach in robust control analysis. Given a stable LTI system depicted in (30), the first step is to represent the RL policy as an uncertainty block in a feedback interconnection. Because the parameters of the neural network policy may not be known a priori and will be continuously updated during learning, we characterize it using bounds on partial gradients (e.g., if it is known that the action is positively correlated with certain observation metric, we can specify its partial gradient to be mostly positive with only a small negative margin). A simple but conservative choice is a gain bound IQC; nevertheless, to achieve a less conservative result, we can employ the quadratic constraint developed in Lemma 1, which exploits both the sparsity of the control architecture and the nonhomogeneity of the outputs. For a given set of gradient bounds , we find the smallest such that (32) is feasible, and corresponds to the upper bound on the gain of the interconnected system both during learning (with the excitation added to facilitate policy exploration) and actual deployment. If is finite, then the system is provably stable in the sense of (31).
We remark that is quasiconvex, in the sense that it reduces to a standard LMI with a fixed . To solve it numerically, we start with a small and gradually increase it until a solution is found. This is repeated for multiple sets of . Each iteration (i.e., LMI for a given set of and ) can be solved efficiently by interiorpoint methods. As an alternative to searching on for a given , more sophisticated methods for solving the generalized eigenvalue optimization problem can be employed [11].
3.3 Extension to nonlinear systems with uncertainty
The previous analysis for LTI systems can be extended to a generic nonlinear system described in (1). The key idea is to model the nonlinear and potentially timevarying part as an uncertain block with IQC constraints on its behavior. Specifically, consider the LTI component :
(35) 
where is the state and is the output. The linearized system is assumed to be stable, i.e., is Hurwitz. The nonlinear part is connected in feedback:
(36) 
where and are defined as before, and is the nonlinear and timevarying component. In addition to characterizing using the Lipschitz property as in (23), we assume that satisfies the IQC defined by as in Definition 2. The system has the statespace representation:
(37) 
where is the internal state and is the filtered output. By denoting as the new state, one can combine (35) and (37) via reducing and letting :
(38) 
where , , , , are matrices of proper dimensions defined above. Similar to the case of LTI systems, the objective is to find the gradient bounds on such that the system becomes stable in the sense of (31). In the same vein, we define as:
(39) 
where , and
where is defined in (24). The next theorem provides a stability certificate for the nonlinear timevarying system (1).
Theorem 4
Let be stable (i.e., in (35) is Hurwitz) and be a bounded causal controller. Assume that:

the interconnection of , , and is wellposed;

has bounded partial derivatives on (i.e., for all , and );

, where is stable.
If there exist and a scalar such that in (39) is feasible, then the feedback interconnection of the nonlinear system (1) and is stable (i.e., it satisfies (31)).
Proof
The proof is in the same vein as that of Theorem 3. The main technical difference is the consideration of the filtered state and the output to impose IQC constraints on the nonlinearities in the dynamical system [33]. The dissipation inequality follows by multiplying both sides of the matrix in (39) by
Comments
There are no comments yet.