In recent years, the advances in machine learning techniques have motivated the control systems community to investigate the use of Recurrent Neural Networks (RNNs) for the identification of nonlinear dynamical systems. Among the various RNN architectures, Gated Recurrent Units (GRUs,chung2014empirical
) and Long Short-Term Memory (LSTM,hochreiter1997long) networks have proven to be particularly suited for system identification tasks (forgione2019model), as their stateful structure is able to retain long-term memory of past inputs and states. †† This work has been submitted to IFAC for possible publication.
In light of their superior modeling capabilities, these gated RNNs have been widely adopted by control practitioners, in conjunction with nonlinear model-based control strategies, such as Model Predictive Control (MPC), especially in process control domain (wong2018recurrent). For example, in terzi2020learning a complex nonlinear system is identified by means of an LSTM network, which is then used as a prediction model in a Nonlinear MPC (NMPC) regulator. A similar strategy is adopted in lanzetti2019recurrent, where a GRU network is used to identify and control a paper machine.
Despite the adoption of RNNs in many data-driven control applications, only limited theoretical foundations are nowadays available to justify the use of these NN architectures in systems control domain. The available theoretical results are mainly focused on achieving provably stable models. In bonassi2020stability and stipanovic2020stability, the authors derive conditions that should be satisfied by network’s weights in order to guarantee its stability. Similar conditions have been derived for LSTMs networks in bonassi2019lstm and terzi2019model, where an NMPC control strategy with guaranteed closed-loop stability is also formulated.
While the stability of RNN architectures can be used to design standard closed-loop-stable MPC laws, see terzi2019model, the tracking performances of these control strategies mainly depend on the accuracy of the trained networks. To address this problem, in the literature several offset-free NMPC formulations have been proposed, which allow to achieve perfect tracking of output references with stability guarantees (pannocchia2015offset). In morari2012
offset-free tracking is achieved by enlarging the system with a disturbance model. A state observer is then designed for the enlarged system, so that the disturbance is estimated and compensated by the NMPC control system. Another approach is the one proposed bymagni2001output, where the system is augmented with an integral action on the output tracking error, a state observer is designed and a stabilizing NMPC law is synthesized for the augmented system.
The aim of this work is to further elaborate on bonassi2020stability and terzi2019model, and to describe an unitary approach for the identification and off-set free control of an unknown dynamical system. Specifically, we herein propose to use a stable GRU network to identify the unknown plant from data collected from the real system. Once trained, this network is used as predictive model for an NMPC controller which ensures offset-free tracking of constant references with closed-loop stability guarantees, adopting the control strategy proposed by magni2001output. Therefore, sufficient conditions under which it is possible to design a converging state observer for the augmented system are devised. The stabilizing NMPC law is thus formulated for the enlarged system. The approach is tested on a pH neutralization process benchmark system, showing remarkable results in terms of reference tracking capabilities under control constraints.
is a vector,is its transpose and its Euclidean norm. For simplicity, vectors whose components are all equal to one and to zero are compactly denoted as and . By boldface fonts we indicate sequences of vectors, i.e. , and . For compactness, for time-varying quantities the generic time index is omitted if it can be inferred from the context. Superscript indicates some quantity at the successive time instant, i.e. at . Therefore, and . The Hadamard product between matrices or vectors is denoted by . By and
we indicate the sigmoid and tanh activation functions, respectively, i.e.and .
In this section we introduce GRUs, discussing how these networks can be used to identify nonlinear dynamical systems and how stability properties can be enforced. For simplicity we focus on single-layer GRU networks, but the approach can be easily extended to multi-layer networks.
Let us now introduce a single-layer GRU expressed in state-space form (bonassi2020stability)
where is the state, is the input, is the output, and is assumed. Functions and are known as update and forget gates, respectively. The matrices , , , are the weights of the network, and must be tuned during the training procedure in such a way that (1) approximates sufficiently well the unknown plant from which the data was collected. For compactness, may be re-written as
where and can be easily retrieved from (1).
In the following we assume that the input is unity-bounded, i.e. or, equivalently, . Note that this is a quite customary assumption when working with neural networks, and it can be easily satisfied by means of a suitable normalization (terzi2019model). It can be then shown that in finite time the state of the GRU enters a unity-boxed invariant set.
Lemma 1 (bonassi2020stability, Lemma 2)
For any initial state and any input sequence , there exists a finite time instant at which the state enters its invariant set , i.e. .
We remind that some function is said to be a function if , it is strictly increasing and as . A function is of class if it is with respect to and, in addition, as . Under these premises, we can summarize the main stability results for GRU networks. For more details, the interested reader is addressed to bonassi2020stability.
Definition 1 (Iss)
System (2) is said to be Incrementally Input-to-State Stable (ISS) if there exists functions and such that, for any pair of initial states and , and any pair of input sequences and it holds that
where is the state trajectory at time , when (2) is initialized in and it is feed by the input sequence .
Note that, since , this stability property implies that the effects of initial conditions asymptotically vanish, guaranteeing that the modeling performances of the network is independent of the initialization.
As ISS plays a key role in the proposed control design, see Section 3, the following sufficient condition for the ISS of GRU networks proposed by bonassi2020stability is enforced during the training of the network.
Theorem 1 (bonassi2020stability, Theorem 2)
A sufficient condition for the ISS of system (2) is that
where , , and are defined as
Note that this sufficient condition boils down to an inequality on the weights of the network. The fulfillment of (4) can be enforced as a constraint during the training procedure of the network. In case of use of unconstrained training algorithms, this condition may be relaxed, penalizing the violation of (4
) in the loss function, as outlined in Section4 and discussed in bonassi2020stability.
3 Control architecture design
Thus far, a black-box model with guaranteed stability properties has been presented. In this Section, we assume that the has been trained to model the unknown plant, and that the weights of this network satisfy condition (4), so that Theorem 1 guarantees the ISS of the network.
The goal of this section is to employ such model for offset-free tracking of constant reference signals. In particular, we adopt the control strategy proposed by magni2001output which, informally speaking, consists in the following steps: (i) augmenting the system model with an integrator on the output tracking error, (ii) designing a weak detector for the augmented plant, and (iii) synthesizing a stabilizing NMPC law for the augmented system.
The resulting control architecture is depicted in Figure 1. Under mild assumptions on the model’s smoothness, and provided that the state observer is a weak detector for the augmented system and that the NMPC law stabilizes the reference equilibrium, magni2001output guarantees that piece-wise constant references can be tracked with zero offset.
Differently from magni2001output, in general is not an equilibrium of , i.e. and . However, since the designed observer is a weak detector with respect to any equilibrium, it is enough to design the NMPC law stabilizing the generic equilibrium of .
3.1 Model augmentation and state observer design
To achieve offset-free tracking of the output reference , the system model is augmented with the discrete-time integral of the tracking error. The enlarged system reads as
where is the state of the integrator, and is the exogenous input computed by the NMPC. Denoting by the state of the augmented system and by its output, (6) can be compactly denoted by
Then, a state observer needs to be designed for the enlarged system. This observer is required to be a weak detector of (7) for the equilibrium , , and , in the sense specified by the following definition (magni2001bookchapter).
System (7) is weakly detectable with respect to the equilibrium if there exists a function and a function such that the following properties hold:
is of class and
There exist functions , , and of class such that, for any , any and such that , and any such that , it holds
where and .
Then, is called weak detector of .
The proposed state observer for the augmented system reads as follows
The following results allow to provide guidelines for the tuning of the observer gains so as to guarantee that is a weak detector of . The proofs are reported in the Appendix.
The observer satisfies Theorem 2 if its gains are a feasible solution of the following optimization problem
Note that Proposition 1 involves solving a non-linear optimization problem to spot suitable weights of . The constraints of (11) enforce the nominal convergence of the observed states to the real states . Furthermore, recalling that the spectral radius of the matrix is bounded by its norm, i.e. , by minimizing
one minimizes the maximum eigenvalue, likely making the observer convergence faster.
Therefore, the ISS property of the model allows to guarantee the weak detectability of the augmented system, and the designed observer enjoys an estimation error nominally converging to zero.
3.2 MPC control design
Having designed a weak detector for the augmented system , an NMPC stabilizing control law is now synthesized. As discussed, we consider the equilibrium , such that and .
Let , , and be the matrices of the linearized augmented system around this equilibrium
Henceforth it is assumed that the triplet is stabilizable, observable and does not have transmission at point one in the complex plane. These conditions guarantee the existence of an open neighborhood of where any reference can define an equilibrium of the system (de1997narx, Theorem 1).
According to the MPC approach, at each time-step a Finite Horizon Optimization Control Problem (FHOCP) is solved to spot the control sequence minimizing a cost function throughout the horizon . The parameter is called prediction horizon, and it is the time window throughout which the system is simulated and the cost function is evaluated. In addition, we define the parameter , called control horizon. From the end of the control horizon onward, the control is assumed to be computed by the auxiliary law , where is the LQ gain stabilizing the linearized system, i.e. , and computed with the state and control weight matrices and .
According to the Receding Horizon technique, once the optimal solution of the FHOCP is retrieved, the first optimal control move, , is applied to the system, i.e. . At the following time instant, the model is re-initialized in the observed state , and the entire procedure is repeated.
The FHOCP is stated as follows
and and are the matrices used to compute . Note that after the control horizon the LQ auxiliary control law (13f) is assumed to be adopted, thus . The term is the terminal cost, defined as a non-quadratic approximation of the LQ cost-to-go, as discussed later this section.
The system model is embedded in the FHOCP by means of (13d) and (13e), where the state is initialized in the values estimated by the weak detector (13b). Moreover, in addition to the integrator model embedded in , a further integrator model with state is included. This integrator is initialized in the true (known) integrator state , see (13c), and it is used to ensure that the input sequence satisfies the input constraints (13h) throughout the prediction horizon. The state is then updated according to the integrator model (13g), where is the selection matrix extracting from , i.e. .
Eventually, the terminal constraint (13i) is imposed, by means of which the state at the end of the prediction horizon is forced to lie within the terminal set . A suitable design of the terminal cost and of the terminal set is paramount to guarantee the closed-loop stability and the recursive feasibility of the NMPC scheme. Herein we consider the approach proposed in magni2001stabilizing for the computation of these terminal ingredients.
By definition, the terminal set is the set of states that can be steered to the equilibrium by the stabilizing LQ control law . In practice, this set is approximated by an inner hyper-ellipsoid in which a Lyapunov function defined by is decreasing, that is
where is the solution of the LQ Lyapunov equation , with and being design parameters.
Concerning the terminal cost , we define it as a finite-step approximation of the cost required to steer the system from to the equilibrium by means of the LQ auxiliary control law. Specifically, being the state of the system subject to the auxiliary control law, the terminal cost can be approximated by
where it is reminded that , and is the index that determines the truncation of the infinite horizon cost while ensuring an accurate approximation.
Let be the Receding Horizon control law. According to ohno1978new, this control law is Lipschitz continuous, since the constraints of (13) are regular. Thus, provided that a weak detector such as (9) is employed, by Theorem 3 of magni2001output the closed-loop system is guaranteed to converge to the target equilibrium , i.e. offset-free tracking is achieved.
4 Numerical results
In this Section, the pH neutralization process described by pHbenchmark is used as a test-bed for evaluating both the modeling capabilities of GRU networks and the performance of the proposed control architecture.
As depicted in Figure 2, the pH process consists of two tanks. Tank 2 is fed by an acid flow rate , and its output is . Since hydraulic dynamics are faster than the others involved, it is assumed that . Tank 1, named reactor tank, receives three flows: the acid stream , a buffer flow an alkaline flow . The only control variable is , while the other two are considered as disturbances. The goal is to set the pH concentration of the output flow rate, , to the desired level. Hence, the process has only one input, , and one output, . The underlying SISO nonlinear model can be described by the following equations