The computation of Lyapunov functions has attracted significant attention during the last decades. The reason is that the knowledge of a Lyapunov function not only serves as a certificate for asymptotic stability of an equilibrium but also allows to give estimates about its domain of attraction or to quantify its robustness with respect to perturbations, for instance, in the sense of input-to-state stability. As explicit analytic expressions for Lyapunov functions are typically not available, numerical methods need to be used.
Known approaches use, e.g., representations by radial basis functions as inGiesl (2007), piecewise affine functions, see Hafstein (2007), or sum-of-squares techniques, see Anderson and Papachristodoulou (2015) and the references therein. For a comprehensive overview we refer to the survey by Giesl and Hafstein (2015).
The usual approaches have in common that the number of degrees of freedom needed for storing the Lyapunov function (or an approximation thereof with a fixed approximation error) grows very rapidly — typically exponentially — with the dimension of the state space. This is the well known curse of dimensionality, which leads to the fact that the mentioned approaches are confined to low dimensional systems.
In general, the same is true if a deep neural network is used as an approximation architecture. While it is known that such a network can approximate every -function arbitrarily well, see Cybenko (1989); Hornik et al. (1989), the number of neurons needed for this purpose typically grows exponentially with the state dimension, see (Mhaskar, 1996, Theorem 2.1) or Theorem 4.1, below. However, this situation changes drastically if additional structural assumptions are imposed, which is the approach we follow in this paper.
More precisely, we show that under a small-gain condition a deep neural network can store an approximation of a Lyapunov function with a given required accuracy using a number of neurons that grows only polynomially with the dimension of the system. In other words, under a small-gain condition the curse of dimensionality can be avoided, at least in terms of storage requirement.
The study in this paper was inspired by a couple of recent similar results for solutions of partial differential equations and optimal value functions, such asDarbon et al. (2019); Hutzenthaler et al. (2019); Reisinger and Zhang (2019), and our proof technique makes use of some arguments from Poggio et al. (2017). However, the — to the best of our knowledge — entirely new insight is that a small-gain condition leads to a class of Lyapunov functions that allows for applying these techniques. This is, in our opinion, the main contribution of this paper.
2 Problem Formulation
We consider nonlinear ordinary differential equations of the form
with a Lipschitz continuous . We assume that is a globally111In order to avoid technicalities in the exposition we consider global asymptotic stability here. The subsequent arguments could be adapted if the domain of attraction of is not the whole .asymptotically stable equilibrium of (1).
It is well known (see, e.g., Hahn (1967)) that this is the case if and only if there exists a -function satisfying for all , for all , as and
for a function with for all . It is our goal in this paper to design a neural network that is able to store an approximation of such a Lyapunov function on a compact set in an efficient manner. To this end, we exploit the particular structure of that is induced by a small-gain condition.
3 Small-gain conditions and Lyapunov functions
In small-gain theory, the system (1) is divided into subsystems of dimensions ,
. To this end, the state vectorand the vector field are split up as
with and denoting the state and dynamics of each , . With
and by rearranging the arguments of the , the dynamics of each can then be written as
Nonlinear small gain theory that we apply here relies on the input-to-state stability property introduced in Sontag (1989). It goes back to Jiang et al. (1994) and in the form for large-scale systems we require here it was developed in the thesis by Rüffer (2007) and in a series of papers around 2010, see, e.g., Dashkovskiy et al. (2010, 2011) and the references therein. ISS small-gain conditions can be based on trajectories or Lyapunov functions and exist in different variants. Here, we use the variant that is most convenient for obtaining approximation results because it yields a smooth Lyapunov function. We briefly discuss one other variant in Section 6(vi).
For formulating the small gain condition, we assume that for the subsystems there exist ISS-Lyapunov functions , satisfying for all
with functions , ,222As usual, we define to be the space of continuous functions with and is strictly increasing to . , . Setting , we define the map by
and the diagonal operator by
We say that (1) satsfies the small-gain condition, if there is a decomposition into subsystems , , with ISS Lyapunov functions satisfying the following condition: there are bounded positive definite333A continuous function is called positive definite if and for all . functions , , satisfying and such that for the inequality
holds for all with .
The following theorem then follows from Theorem 4.1 in Dashkovskiy et al. (2011).
In Dashkovskiy et al. (2011), the property from Definition 3.1 is called a weak small-gain condition. This is because if the system (1) has an additional input (that is taken into account in the assumptions on the ), then the construction of yields an integral ISS Lyapunov function as opposed to an ISS Lyapunov function. Under a stronger version of the small-gain condition, the same construction yields an ISS Lyapunov function. We briefly discuss corresponding extensions of our approach in Section 6(iv).
4 Deep neural networks
A deep neural network is a computational architecture that has several inputs, which are processed through hidden layers of neurons. The values in the neurons of the highest layer () is then used in order to compute the output of the network. In this paper, we will only consider feedforward networks, in which the input is processed consecutively through the layers , , …, . For our purpose of representing Lyapunov functions, we will use networks with the input vector and one output . Here, the vector represents the free parameters in the network that need to be tuned (or “learned”) in order to obtain the desired output. In our case, the output shall approximate the Lyapunov function, i.e., we want to choose such that for all , where is a fixed compact set containing the origin. Figure 1 shows generic neural networks with one and two hidden layers.
Here, the topmost layer is the output layer, followed by one or two hidden layers numbered with , and the input layer. The number determines the number of hidden layers, here or . Each hidden layer consists of neurons and the overall number of neurons in the hidden layers is denoted by . The neurons are indexed using the number of their layer and their position in the layer . Every neuron has a value , for each layer these values are collected in the vector . The values of the neurons at the lowest level are given by the inputs, i.e., . The values of the neurons in the hidden layers are determined by the formula
for , where
is a so called activation function and, are the parameters of the layer. With we denote the Euclidean scalar product between two vectors . In the output layer, the values from the topmost hidden layer are linearly combined to deliver the output, i.e.,
The vector collects all parameters , , of the network.
In case of one hidden layer, in which and thus , we obtain the closed-form expresssion
The universal approximation theorem states that a neural network with one hidden layer can approximate all smooth functions arbitrarily well. In its qualitative version, going back to Cybenko (1989); Hornik et al. (1989), it states that the set of functions that can be approximated by neural networks with one hidden layer is dense in the set of continuous functions. In Theorem 4.1, below, we state a quantitative version, given as Theorem 1 in Poggio et al. (2017), which is a reformulation of Theorem 2.1 in Mhaskar (1996).
For its formulation we fix a compact set . For a continuous function we define the infinity-norm over as
Given a fixed constant , we then define the set of functions
where denoted the functions from to that are -times continuously differentiable, are multiindices of length with entries , and denotes the -th directional derivative with respect to .
Let be infinitely differentiable and not a polynomial. Then, for any , a neural network with one hidden layer provides an approximation for all with a number of of neurons satisfying
and this is the best possible.
Theorem 4.1 implies that one can readily use a network with one hidden layer for approximating Lyapunov functions. However, in general the number of neurons needed for a fixed approximation accuracy grows exponentially in , and so does the number of parameters in . This means that the storage requirement as well as the effort to determine easily exeeds all reasonable bounds already for moderate dimensions . Hence, this approach also suffers from the curse of dimensionality. In the next section, we will therefore exploit the particular structure of small-gain based Lyapunov functions in order to obtain neural networks with (asymptotically) much lower .
5.1 The case of known subsystems
For our first result, for fixed we consider the family of Lipschitz maps , , for which (1) satisfies the small-gain condition from Definition 3.1 with . We assume that for each in this family we know the subsystems satisfying the small-gain condition from Definition 3.1, i.e., their dimensions , and states .
For this situation, we use a network with one hidden layer of the form depicted in Figure 2.
In this network, the (only) layer for consists of sublayers . The input of each of the neurons in is the state vector of the subsystem , which forms a part of the state vector . We assume that every sublayer has neurons, whose parameters and values are denoted by, respectively, , , and , . Since , the layer contains neurons, which is also the total number of neurons in the hidden layers. The values are then given by
and the overall output of the network is
Given a compact set , for each there exist a Lyapunov function such that the following holds. For each the network depicted in and described after Figure 2 with infinitely differentiable and not polynomial, provides an approximation for all with a number of of neurons satisfying
Hence, by Theorem 4.1 we can conclude that there exist values , , , , such that the output of each sublayer satisfies
for a number of neurons
Since this is true for all sublayers , , , by summing over the sublayers we obtain and thus
with the overall number of neurons . ∎
5.2 The case of unknown subsystems
The approach in the previous subsection requires the knowledge of the subsystems in order to design the appropriate neural network. This is a rather unrealistic assumption that requires a lot of preliminary analysis in order to set up an appropriate network. Furtunately, there is a remedy for this, which moreover applies to a larger family of systems than considered above. To this end, we consider the family of maps in (1) with the following property: after a linear coordinate transformation , invertible and depending on , the map lies in . In contrast to Section 5.1, now we do not assume that we know the subsystems of , nor their dimensions and not even their number .
The neural network that we propose for is depicted in Figure 3.
Here, we use different activation functions in the different levels. While in layer we use the same as in Proposition 5.1, in Level we use the identity as activation function, i.e., . Layer consists of sublayers , , , each of which has exactly inputs and neurons. The coefficients and neuron values of each are again denoted with , , and , respectively, for . The -dimensional input of each neuron in is given by
We note that this network is a special case of the lower network in Figure 1.
Given a compact set , for each there exist a Lyapunov function such that the following holds. For each the network depicted in and described after Figure 3 with infinitely differentiable and not polynomial in layer and in layer , provides an approximation for all with a number of of neurons satisfying
Proof: Let be the (unknown) dimensions of the subsystems and the first index of the variables of , i.e., . Using the notation from above and the fact that , the value of the inputs of the sublevels is given by
Hence, by choosing and as the transpose of the -th row of , we obtain . Hence, by appropriately assigning all the , we obtain
where the number of the zeros equals . This can be done for , with being the number of subsystems. The inputs for the remaining sublayers , , are set to by setting the corresponding and to . For this choice of the parameters of the lower layer, each sublayer of the layer receives the transformed subsystem states (and a number of zeros) as input, or the input is .
Since the additional zero-inputs do not affect the properties of the network, the upper part of the network, consisting of the hidden layer and the output, has exactly the structure of the network used in Proposition 5.1. We can thus apply this proposition to the upper part of the network and obtain that it can realize a function that approximates a Lyapunov function for in the sense of Proposition 5.1.
As the lower layer realizes the coordinate transformation , the overall network then approximates the function . By means of the invertibility of
and the chain rule one easily checks that this is a Lyapunov function for. The claim then follows since the number of neurons in the upper layer is equal to that given in Proposition 5.1, while that in the lower layer equals . This leads to the overall number of neurons given in the theorem. ∎
In this section we discuss a few aspects and possible extensions of the results just presented.
From the expressions for in Proposition 5.1 and Theorem 5.2 one sees that for a given the storage effort only grows polynomially in the state dimension , where the exponent is determined by the maximal dimension of the subsystems . The proposed approach hence avoids the curse of dimensionality, i.e., the exponential growth of the effort. There is, however, an exponential dependence on the maximal dimension of the subsystems in the small-gain formulation. This is to be expected, because the construction relies on the low-dimensionality of the and if this is no longer given, we cannot expect the method to work efficiently.
In order to use the proposed architecture in practice, suitable learning methods have to be developed. This is a serious problem in its own right that is not addressed in this paper. However, given that the decisive property of a Lyapunov function (2) is a partial differential inequality, there is hope that similar methods as for the learning of solutions of partial differential equations, see, e.g., Han et al. (2018); Huré et al. (2019); Sirignano and Spiliopoulos (2018), can be implemented. This will be subject of future research.
There have been attempts to use small-gain theorems for grid-based constructions of Lyapunov functions, e.g., in Camilli et al. (2009); Li (2015). The problem of this construction, however, is, that it represents the functions from Theorem 3.2 separately for the subsystems and the small-gain condition has to be checked additionally (which is a difficult task). The representation via the neural network does not require to check the small-gain condition nor is the precise knowledge of the subsystems necessary.
The reasoning in the proofs remains valid if we replace by and asymptotic stability with ISS. Hence, the proposed network is also capable of storing ISS and iISS Lyapunov functions.
In current neural network applications ReLU activation functionsare often preferred over activation functions. The obvious disadvantage of this concept is that the resulting function is nonsmooth in , which implies the need to use concepts of nonsmooth analysis for interpreting it as a Lyapunov function. While one may circumvent the need to compute the derivative of by means of using nonsmooth analysis or by passing to an integral representation of (2) (see also item (vii), below), the nonsmoothness may cause problems in learning schemes and it remains to be explored if these are compensated by the advantages of ReLU activation functions.
There are other types of Lyapunov function constructions based on small-gain conditions different from Definition 3.1, e.g., a construction of the form
. Since maximization can also be efficiently implemented in neural networks (via so-called max pooling in convolutional networks), we expect that such Lyapunov functions also admit an efficient approximation via deep neural networks. However, when using this formulation we have to cope with two sources of nondifferentiability that complicate the analysis. On the one hand, this is caused by the maximization in the definition ofand on the other hand by the functions , which in most references are only ensured to be Lipschitz.
One may argue that a mere approximation of is not sufficient for obtaining a proper approximation of a Lyapunov function, because in order for to be a Lyapunov function, should also approximate such that (2) is maintained approximately.
On the one hand, this is not strictly true. It is known (see, e.g., Section 2.4 in Grüne and Pannek (2017)), that under mild regularity conditions on asymptotic stability follows from the existence of a with
for all . This integral inequality, in turn, holds for each Lyapunov function simply by integrating (2). Now, in order to ensure that a function satisfies (3) approximately (i.e., plus an error term on the right hand side) it is sufficient to approximate itself by . An approximation is not needed.
On the other hand, it is known that neural networks also approximate derivatives of functions, cf. Hornik et al. (1990). Hence, it seems likely that our results are extendable to an approximation of and . This will be investigated in future research.
We have proposed a class of deep neural networks that allows for approximating Lyapunov functions for systems satisfying a small-gain condition. The number of neurons needed for an approximation with fixed accuracy depends exponentially on the maximal dimension of the subsystems in the small-gain condition but only polynomially on the overall state dimension. Thus, it avoids the curse of dimensionality. The network structure does not need any knowledge about the exact dimensions of the subsystems and even allows for a subsystem structure that only becomes visible after a linear coordinate transformation.
The result suggests that suitably designed deep neural networks are a promising efficient approximation architecture for computing Lyapunov functions. It motivates future research on the development of suitable training techniques in order to actually compute the Lyapunov functions.
- Anderson and Papachristodoulou (2015) Anderson, J. and Papachristodoulou, A. (2015). Advances in computational Lyapunov analysis using sum-of-squares programming. Discrete Contin. Dyn. Syst. Ser. B, 20(8), 2361–2381.
- Camilli et al. (2009) Camilli, F., Grüne, L., and Wirth, F. (2009). Domains of attraction of interconnected systems: a Zubov method approach. In Proceedings of the European Control Conference — ECC2009, 91–96. Budapest, Hungary.
Cybenko, G. (1989).
Approximation by superpositions of a sigmoidal function.Math. Control Signals Systems, 2(4), 303–314.
- Darbon et al. (2019) Darbon, J., Langlois, G.P., and Meng, T. (2019). Overcoming the curse of dimensionality for some Hamilton-Jacobi partial differential equations via neural network architectures. Preprint, arXiv:1910.09045.
- Dashkovskiy et al. (2011) Dashkovskiy, S., Ito, H., and Wirth, F. (2011). On a small gain theorem for ISS networks in dissipative Lyapunov form. Eur. J. Control, 17(4), 357–365.
- Dashkovskiy et al. (2010) Dashkovskiy, S.N., Rüffer, B.S., and Wirth, F.R. (2010). Small gain theorems for large scale systems and construction of ISS Lyapunov functions. SIAM J. Control Optim., 48(6), 4089–4118.
- Giesl and Hafstein (2015) Giesl, P. and Hafstein, S. (2015). Review on computational methods for Lyapunov functions. Discrete Contin. Dyn. Syst. Ser. B, 20(8), 2291–2331.
- Giesl (2007) Giesl, P. (2007). Construction of global Lyapunov functions using radial basis functions, volume 1904 of Lecture Notes in Mathematics. Springer, Berlin.
- Grüne and Pannek (2017) Grüne, L. and Pannek, J. (2017). Nonlinear Model Predictive Control. Theory and Algorithms. Springer-Verlag, London, 2nd edition.
- Hafstein (2007) Hafstein, S.F. (2007). An algorithm for constructing Lyapunov functions, volume 8 of Electronic Journal of Differential Equations. Monograph. Texas State University–San Marcos, Department of Mathematics, San Marcos, TX. Available electronically at http://ejde.math.txstate.edu/.
- Hahn (1967) Hahn, W. (1967). Stability of Motion. Springer, Berlin, Heidelberg.
Han et al. (2018)
Han, J., Jentzen, A., and E, W. (2018).
Solving high-dimensional partial differential equations using deep learning.Proc. Natl. Acad. Sci. USA, 115(34), 8505–8510.
- Hornik et al. (1989) Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 3(5), 551–560.
- Hornik et al. (1990) Hornik, K., Stinchcombe, M., and White, H. (1990). Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 2(5), 359–366.
- Huré et al. (2019) Huré, C., Pham, H., and Warin, X. (2019). Some machine learning schemes for high-dimensional nonlinear PDEs. Preprint, arXiv:1902.01599v1.
- Hutzenthaler et al. (2019) Hutzenthaler, M., Jentzen, A., and Kruse, T. (2019). Overcoming the curse of dimensionality in the numerical approximation of parabolic partial differential equations with gradient-dependent nonlinearities. Preprint, arXiv:1912.02571v1.
- Jiang et al. (1994) Jiang, Z.P., Teel, A.R., and Praly, L. (1994). Small-gain theorem for ISS systems and applications. Math. Control Signals Syst., 7, 95–120.
- Li (2015) Li, H. (2015). Computation of Lyapunov functions and stability of interconnected systems. Dissertation, Fakultät für Mathematik, Physik und Informatik, Universität Bayreuth, Germany.
- Mhaskar (1996) Mhaskar, H.N. (1996). Neural networks for optimal approximation of smooth and analytic functions. Neural Computations, 8(1), 164–177.
- Poggio et al. (2017) Poggio, T., Mhaskar, H., Rosaco, L., Brando, M., and Liao, Q. (2017). Why and when can deep – but not shallow – networks avoid the curse of dimensionality: a review. Int. J Automat. Computing, 14(5), 503–519.
- Reisinger and Zhang (2019) Reisinger, C. and Zhang, Y. (2019). Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Preprint, arXiv:1903.06652v1.
- Rüffer (2007) Rüffer, B.S. (2007). Monotone Systems, Graphs, and Stability of Large-Scale Interconnected Systems. Dissertation, Fachbereich 3, Mathematik und Informatik, Universität Bremen, Germany.
- Sirignano and Spiliopoulos (2018) Sirignano, J. and Spiliopoulos, K. (2018). DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375, 1339–1364.
- Sontag (1989) Sontag, E.D. (1989). Smooth stabilization implies coprime factorization. IEEE Trans. Autom. Control, 34, 435–443.