I Introduction
System identification is the process of generating dynamic models from data [1], and is also referred to as learning dynamical systems (e.g. [2]). When scaling control and identification algorithms to largescale systems, it can be useful to treat a system as a sparse network of local subsystems interconnected through a graph [3, 4, 5]. In this paper, we propose algorithms for identification of such networked systems in state space form:
(1) 
where and are the state and input respectively, and the model dynamics
can be either linear or nonlinear. We assume that measurements (or estimates) of state and input sequences are available.
Our approach:

uses distributed computation (i.e. network nodes only share data and parameters with immediate neighbors),

can generate models with a strong form of stability called contraction,

can generate monotone models, i.e. ordering relations between states are preserved.
Imposing contraction and/or monotonicity on models provides two benefits when identifying systems that are known to satisfy those properties. Firstly, incorporating prior knowledge can significantly improve the quality of the identified models. Secondly, it guarantees that properties of the real system that are useful for controller design are present in the identified model.
This work is motivated by the observation that many systems have the combination of largescale, sparse dynamics, monotonicity and stability. Examples include traffic networks [6, 7, 8], chemical reactions [9], combination therapies [10, 11, 12, 13], wildfires [14] and power scheduling [10].
The key technical difficulty we address is the simultaneous identification and stability verification of largescale networked systems. We propose a convex model set with scalable stability conditions and an algorithm based on ADMM that decomposes the identification problem into easily solvable, subproblems that require only local communication between subsystems.
Ia Networked System Identification
Standard approaches to system identification do not work well for largescale networked systems for three reasons [15]: firstly, the dataset must be collected at a central location, a process which may be prohibitive for complex systems; secondly, the computational and memory complexities prohibit application to large systems; finally, the network structure may not be preserved by identification. For instance, standard subspace identification methods have an computational and memory complexities respectively, and any sparse structure in the dynamics is destroyed through an unknown similarity transformation [16].
Previous work in networked system identification can be loosely categorized into two areas; the identification of a network topology [17, 18, 19], and the identification of a system’s dynamics with known topology. In the latter category, almost all prior work has focused on the case where subsystems are linear time invariant (LTI) and described by state space models [15, 20, 21] or transfer functions (a.k.a. modules) [22, 23].
When identifying the subsytem dynamics, states or outputs of neighbors are treated as exogenous inputs, ignoring feedback loops induced by the network topology. This improves scalability as the identification of each subsystem can be performed in parallel. However, accurate identification of the individual subsystems does not imply accurate identification of the full network, because the ignored feedback loops may have a strong effect and even introduce instability. A simple case with two subsystems which has received significant attention is closedloop identification [24]
Prior works in networked system identification assume stable LTI network dynamics and establish identifiability [25] and consistency [26]. These assumptions then imply model stability in the infinite data limit. However, model stability is not guaranteed with finite data sets or in nonlinear blackbox identification problems, where the true system is usually not in the model set.
IB Identification of Stable Models
Standard methods for system identification do not guarantee model stability, even if the system from which the data are collected is stable. For linear system identification considerable attention has been paid to this problem, and several methods have been suggested based on regularisation or model constraints [27, 28, 29, 30]. Even for linear systems, the set of stable models is not convex using standard parameterisations, to the authors’ knowledge all existing methods introduce some bias in the identification procedure.
For linear systems most definitions of stability are equivalent. The nonlinear case is more nuanced, and the definition used depends on the requirements of the problem at hand. Standard Lyapunov methods are not appropriate in system identification as the stability certificate must be constructed about a known stable solution, whereas the very purpose of system identification is to predict a system’s response to previously unseen inputs. Contraction [31] and incremental stability (e.g. [32]) are more appropriate since they ensure stability of all possible solutions and consequently, do not require apriori knowledge of the inputs and state trajectories.
Stability guarantees have also been investigated for nonlinear system identification. For instance, systems can be identified using sets of stable recurrent neural networks
[33] or stable Gaussian process state space models [34]. A limitation of these approaches is that they do not allow joint search for a model and its stability certificate, which can be conservative even for linear systems.This paper builds on previous work in jointlyconvex parameterization of models and their stability certificates via implicit models [35, 36, 37, 38, 39, 40] and associated convex bounds for model fidelity via Lagrangian relaxation [38, 41, 42]. The main development in this paper is to significantly improve scalability of this approach via a novel model parametrization and contraction constraint that are jointly convex and permit a particular upstream/downstream network decomposition (defined below).
IC Monotone and Positive Systems
Monotone systems are a class of dynamic system characterized by the preservation of an order relation for solutions (c.f. Definition 2 below). A closely related class is positive systems, for which state variables remain nonnegative for all nonnegative inputs (c.f. Definition 3 below). For linear systems, positivity and monotonicity are equivalent.
A useful property of monotone systems is that they often admit simplified stability tests. In particular, for linear positive systems the existence of separable Lyapunov functions, i.e. those representable as the sum or maximum over functions of individual state variables, is necessary and sufficient for stability [43]. This property has been used to simplify analysis [44], control [45] and identification [46] of positive systems. Separable stability certificates have also been shown to exist for certain classes of nonlinear monotone systems [47, 48, 49, 50]. and have been used for distributed stability verification [51] and control [52]. Monotonicity can also simplify nonlinear model predictive control [10] and formal verification using signal temporal logic [53].
ID LeastSquares Equation Error
Identification typically involves the optimization of a quality of fit metric over a model set. In this paper we use what is arguably the simplest and most widelyapplied qualityoffit metric, leastsquares equation error (a.k.a. one step ahead prediction error):
(2) 
where and are state and input measurements or estimates. Leastsquares equation error is a natural choice for shortterm prediction if state measurements are available.
If longterm predictions are needed, then simulation error, defined as
(3) 
is a better measure of performance. The dependence on simulated states, however, renders the cost function nonconvex [56, 57] and notoriously difficult to optimize [58]. Consequently, equation error optimization is often used to initialize local search methods (e.g. gradient descent) for models with good simulation error or used as a surrogate for simulation error with better numerical properties. In the latter context, model stability is particularly important since a model can have small equation error but be unstable and therefore exhibit very large simulation error. In fact, when a model is contracting, it can be shown that small equation error implies small simulation error [35].
In many contexts, system state measurements are not available. Nevertheless, equation error frequently arises as a subproblem via estimated states, e.g. in subspace identification algorithms [59, 60, 21]
, where states are estimated using using matrix factorizations, or in maximum likelihood identification via the expectation maximization (EM) algorithm where they are estimated from the joint smoothing distribution
[61, 62].IE Contributions
The main contributions of this work as are follows: we propose a model structure and convex constraints that guarantee monotonicity, positivity, and/or contraction of the model. For large scale networked systems, we refine the model and constraints to have a separable structure, and we introduce a separable bound on equation error, so the identification problem can be solved using distributed computation. The algorithm, based on ADMM, decomposes into easily solved separable optimization problems at each step. Data and parameters are only communicated to immediate neighbours in the network. Finally, we evaluate the scalability and fitting performance of the method on a number of numerical examples.
Ii Preliminaries and Problem Setup
Notation
A graph is defined by a set of nodes (vertices) and edges
. The vector
is the column vector of ones, with size inferred from context. For vectors , refers to the elementwise inequality. For matrices , and refer to elementwise inequalities. For symmetric matrices , means that is positive definite. For a vector , is the matrix with the elements of along the diagonal. The set of symmetric matrices is denoted . The set of nonsingular Mmatrices is denoted . For a matrix , means and for , whereare the eigenvalues of
. For brevity, we will sometimes drop the arguments from a function where the meaning may be inferred from context.Iia Differential Dynamics
The contraction and monotonicity conditions we study can be verified by way of a systems differential dynamics, a.k.a. linearized, variational, or prolonged dynamics. For the system (1), the differential dynamics are
(4) 
where and . In conjunction with (1), the differential dynamics describe the linearized dynamics along all solutions of the system.
IiB Contraction Analysis
We use the following definition of nonlinear stability:
Definition 1 (Contraction).
A system is termed contracting with rate , where , if for any two initial conditions , , given the same input sequence , and some , there exists a continuous function such that the corresponding trajectories satisfy .
Contraction can be proven by finding a contraction metric which verifies conditions on the differential dynamics [31]. A contraction metric is a function such that:
(5)  
(6) 
for some
The choice of contraction metric is problem dependent. Prior works have proposed quadratic contraction metrics for which (6) is linear in the stability certificate and can be verified using semidefinite programming. A number of works have also noted that using a weighted norm can lead to separable constraints [63, 51] allowing for stability verification of largescale networked systems.
In the context of system identification, the joint search for model in (1) and contraction metric is nonconvex due to the nonlinear function composition .
IiC Monotone and Positive Systems
We now define system monotonicity and positivity of dynamical systems.
Definition 2 (Monotone System).
A system (1) is termed monotone if for inputs and and initial conditions , , the following implication holds:
Monotonicity results from and where and come from the differential dynamics (4).
Definition 3 (Positive System).
A system (1) is positive if for all inputs and initial conditions , the resulting trajectory has .
A sufficent condition for a system to positive is for it to be monotone and admit as a solution, i.e. in (1).
IiD Network Structure
We assume model (1) is partitioned into subsystems. The interactions between these subsystems is described by a directed graph . Here, we have a set of nodes denoted corresponding to the subsystems. Each subsystem has its own state denoted and may take an input denoted (we allow for the case ). The global state and input is attained by concatenating the states and inputs of each subsystem,
(7) 
The set of edges describes how the subsystems interact with each other. In particular, means that the state of subsystem affects the state of subsystem . The edge list may arise naturally from the context of the problem, e.g. in traffic networks where edges come from the physical topology of the road network, or may be identified from data [17, 64].
For each subsystem , we define the set of upstream neighbours and the set of downstream neighbours . The term upstream neighbours of refers to the subsystems whose state affects the state of subsystem , and the term downstream neighbours refers to the subsystems whose state is affected by subsystem ’s state. In general, we allow selfloops so that a node can be both upstream and downstream to itself. This notation is illustrated in Fig. 1.
We can write the dynamics of the individual interacting subsystems as follows:
(8) 
where corresponds to the element in (1) and and .
IiE Separable Optimization using ADMM
Consider an optimization problem of the form,
(9) 
which may include constraints on via indicator functions appearing in . The indicator function for the constraint is the function which is zero for and infinite otherwise.
Definition 4 (Separable).
The problem (9) is termed separable with respect to the partitioning if it can be written as .
In this paper we encounter problems of the form:
(10) 
where and are two different partitions of the same vector . In our context, these partitionings correspond to the sets of upstream or downstream neighbors discussed in the previous section. For such problems, the alternating directions method of multipliers (ADMM) can be applied [65]. We write (10) as
(11)  
Applying ADMM results in iterations in which each step is separable with respect to the partition or , and can thus be solved via distributed computing. For convex problems, ADMM is guaranteed to converge to the optimal solution [65].
IiF Problem Statement
To summarise, the main objective of this paper is as follows. Given state and input measurements , and a graph describing the network topology, identify models (8) at each node such that:

during the identification procedure, each subsystem only communicates with immediate (upstream and downstream) neighbours;

convergence is guaranteed and leastsquares equation error is small at each subsystem;

model behavioural constraints such as contraction, monotonicity, and/or positivity can be guaranteed for the interconnected system (1).
Iii Convex Behavioral Constraints
In this section we develop a convex parametrization of models with contraction, monotonicity and/or positivity guarantees. As described in subsection IIB, jointly searching for a model (1) and contraction metric is nonconvex.
Following [37, 38], we solve this problem by instead searching for models in the following implicit form:
(12) 
The differential dynamics of (12) are:
(13) 
where , and .
Definition 5 (WellPosed).
I.e., wellposedness means that is a bijection with respect to its first argument, and implies the existence of an explicit model of the form (1) where . Furthermore, it implies that for any initial condition and sequence of inputs , there exists a unique trajectory satisfying (12).
Iiia Stability and Monotonicity Constraints
In this section, we develop convex conditions on the implicit model (12) that guarantee wellposedness, monotonicity, positivity, and contraction. The main result is the following:
Theorem 1.
A model of the form (12) is:

[(a)]

wellposed if there exists such that for all ,
(14) 
contracting with rate if (a) holds and there exists a matrix function such that for all :
(15) (16) 
monotone if (a) holds and for all :
(17) 
positive if (c) holds and:
(18) 
contracting and monotone if (a) and (c) hold, and for all
(19) Positivity is also enforced if (18) holds.
Proof.
See appendix A. ∎
We refer to the stability conditions in Theorem 1 (b) or (e) as contraction conditions as they ensure contraction using a state dependent weighted norm of the differentials: , noting that for the purpose of contraction analysis the exogenous input can be considered as a timevariation.
IiiB Model Parametrizations
As formulated above, Theorem 1 applies to models represented by the infinite dimensional space of continuously differentiable functions and . In practice, these functions are usually parametrized by a finitedimensional vector. In this section we briefly discuss some common model parametrizations and how the constraints can be enforced.
For linear models, (14) is a semidefinite constraint, (15)(19) are linear and can be enforced using semidefinite programming. Furthermore, if is diagonal, then (14) is also linear and the model set is polytopic.
If the functions and are multivariate polynomials or trigonometric polynomials, then the constraints can be enforced using sum of squares programming [66, 67].
The model set (12
) also contains a class of recurrent neural networks with sloperestricted, invertible activation functions. In this case,
is the inverse of the activation functions, is affine, and simulation of the explicit model yields the equation of a standard recurrent neural network [68]. The conditions in Theorem 1 (b) or (d) then correspond to diagonal dominance conditions on the weight matrices which can be enforced via linear constraints.Finally, if the requirement for global verification of these properties is relaxed, then these constraints can be applied pointwise for arbitrary parametrizations and , which amount to linear and semidefinite constraints if and are linearly parametrized.
Iv Distributed Identification
In this section we consider the problem of distributed identification of networked systems with the behavioral constraints introduced in Theorem 1. First, we propose a particular structure for (12) for which the constraints in Theorem 1 are separable. We then propose an objective function that is separable (with respect to a different partition). Finally we propose an algorithm for fitting the proposed models that requires only local communication between subsystems at each step.
Iva Distributed Model
We propose the following model structure for distributed identification, in which depends only on local states and inputs, and is a summation of nonlinear functions of states and inputs from upstream neighbours:
(21) 
Models of the form (21) are widely used for statistical modelling, and are referred to as generalized additive models (GAMs) [69]. This class of models also includes linear systems, and a class of recurrent neural networks. We assume that each of the functions and are linearly parametrized by and respectively.
We define two partitions of the model parameters; the sets of upstream and downstream parameters. These are denoted and respectively. Objective functions, constraints and optimization problems are called upstreamseparable or downstreamseparable if they are separable with respect to these partitions. Upstream and downstream separable optimization problems are closely related to the columnwise and rowwise separable optimization problems used in [70].
For the parametrization (21), the differential dynamics have a sparsity pattern determined by the network topology. In particular, the block of is:
and is block diagonal. This means depends only on parameters and the block depends only on . As each block of and has an independent parametrization, functions of disjoint sets of elements of or will be separable.
IvB Convex Bounds for Equation Error
In Section IVA we propose a convex set of implicit models. However, this approach shifts the convexity problem from the model set to the objective function as equation error (2), s.t. , is no longer convex in the model parameters.
One approach might be to minimize the implicit equation error
(22) 
as a surrogate for equation error. This approach however, strongly biases the resulting model and leads to poor performance [42]. Instead we use the convex upper bound for equation error proposed in [42], which is based on Lagrangian relaxation.
The leastsquares equation error (2) for the implicit model (12) is:
(23)  
Note that this problem is not jointly convex in and . The following convex upper bound was proposed in [42]:
(24) 
where is a Lagrange multiplier. The function (24) is convex in as it is the supremum of an infinite family of convex functions [38].
For our parametrization (21), is block diagonal which then implies that (24) is upstream separable so it can be written as
(25) 
where
The evaluation of is not trivial as it involves the calculation of the supremum of a nonlinear multivariate function. In this work we linearise (25) with respect to and solved for the supremum of the resulting concave quadratic function, giving:
IvC Alternating Directions Method of Multipliers (ADMM)
In Section IVA we introduced a model set for which the constraints in Theorem 1 are downstream separable and in Section IVB we introduced an upstream separable objective function. Note however, that the constraints and objective are not jointly separable with respect to the same partition. We use ADMM to solve this problem.
We now develop the algorithm for the case where (12) is wellposed, monotone and contracting, however, a parallel construction without monotonicity or contraction constraints introduces no additional complexity. Consider the following set of parameters
(27) 
Applying ADMM as discussed in Section IIE to the problem gives the following iteration scheme for iteration :
(28)  
(29)  
(30) 
for .
When using a GAM structure (21), we have the following result:
Proposition 1.
Proof.
See Appendix B. ∎
In particular, the ADMM approach corresponds to performing the following iterations locally at each node :
(31)  
(32)  
(33) 
The distributed algorithm is listed in Algorithm 1. The steps (31) and (32) require access to the upstream and downstream parameters respectively. These can be solved by the nodes in the graph, however, communication between both upstream and downstream parameters is necessary between steps. The update (33) is trivially separable and can be solved as either an upstream or downstream separable problem.
V Discussion
Va Conservatism of the Separable Model Structure
We have proposed searching over the model set (21) with (27), and it is important to understand which systems may fall into this model set. A particular question of interest is whether there are contracting and monotone systems which cannot be represented by this structure, and there are two main reasons why this may occur: the separable structure of the model (21), and the assumption of a separable contraction metric in condition (19).
An exact characterization of the functions functions that be approximated via the GAM structure (21) is difficult to give, however, they have widely applied in statistical modelling, see [69] for details. Note that while the functions in the implicit system (21) are additive, the resulting explicit system (8) may not be. For example, the scalar functions and . Both and are additive; however, the function is not.
Conservatism may also be introduced by the assumption of a separable contraction metric. For the case of linear positive systems, it is has been shown that the existence of a separable Lyapunov functions is both necessary and sufficient [43]. This means that contains all positive linear systems [46]:
Theorem 2.
Proof.
See Appendix C. ∎
Things are more complicated for nonlinear monotone systems. Separable contraction metrics have been shown to exist for certain classes of monotone systems [49] and separable weighted contraction metrics have been used for the analysis of monotone systems [6, 51]. For incrementally exponentially stable systems, it has been shown that the existence of weighted contraction metrics, are necessary and sufficient [50], however the statedependant weighting depends on the all system states and is therefore not separable in the sense we use. To the authors’ knowledge, a complete characterisation of the class of contracting monotone systems that admit separable metrics is still an open problem.
VB Consistency
It has be previously noted that system identification approaches that guarantee stability lead to a bias towards systems that are too stable [28, 29, 71]. Empirical evidence suggests that for methods based on Lagrangian relaxation [41, 42] this bias is smaller.
There are a number of situations that lend themselves towards consistent identification. Firstly, consider the situation where we have noiseless state and input measurements produced by a model with such that . Then we also have so the bound is tight and LREE recovers the true minimizer of equation error.
Now, consider the situation where the unconstrained minimizer of equation error (2), is a monotone, additive function that is contracting in the identity metric. That is, for the function where , the following hold:

is additive so that (8) can be written as ,

,

.
where . Then, optimizing (26) returns the same solution as the unconstrained least squares minimizer of .
Proposition 2.
Proof.
Our proof mirrors that of [42, Sec. IV Proposition 1]. ∎
VC Iteration Complexity of Distributed Algorithm
In this section, we investigate the computational complexity of each step in the distributed algorithm. In general, the complexity depends on the model parametrization used, however, we limit our discussion to the case where the models are parametrized by polynomials and the constraints are enforced using sum of squares programming.
The first step, (31), is a semidefinite program and can be solved using standard solvers. If no structural properties are exploited, a primaldual interior point method (IPM), would require operations per iteration per node [72], where is the number of upstream free parameters .
The second step, (32), is a sumofsquares problem that can solved as a semidefinite program. If and both have degree , then the size of Gram matrix corresponding to (19) for the additive model (21) is . Solving (32) using a primaldual IPM requires approximately operations per iteration per node [72], where is the number of downstream free parameters.
If a local computational resource is associated with each node in the network, and the number of neighbours for each node satisfies a uniform bound, then the time taken for each iteration will not increase with the number of nodes. However, computation time will grow quickly with the number neighbours, the size of the local states and the degrees of the polynomials used in the model.
VD Other Quality of Fit Criteria
Lagrangian relaxation of leastsquares equation error was chosen as it is convex, upstream separable, quick to compute, and leads to a simple implementation of ADMM. Any method that treats neighbouring states as exogenous inputs will be upstream separable. However, any such approach will also be susceptible to instability due to the introduction of new feedback loops via the network topology, even if it guarantees stability of the local models. Consequently, one can similarly apply any convex quality of fit criteria such us convex upper bounds on simulation error [38, 41] and still guarantee convergence of ADMM. Alternatively, a nonconvex quality of fit criteria like simulation error can be used at the expense of ADMM’s convergence guarantees.
If a model structure does not permit distributed identification, the conditions proposed in Section III can still be used to ensure stability and/or monotonicity. Joint convexity of the model set and stability constraints is still an important as it simplifies constrained optimization allowing for the easy application of penalty, barrier or projected gradient methods [39].
Vi Numerical Experiments
In this section we present numerical results exploring the scalability and identification performance the proposed approach.
This section is structured as follows: first, we look at the identification of positive linear systems, and explore the computational complexity of the and contraction conditions; we then explore the consistency of fitting nonlinear models when the true system lies in the model set, essentially analysing the effect of convex bound on equation error; finally, we apply the method to the identification of a (simulated) nonlinear traffic network. The traffic network does not lie in the model set so only an approximate model can be identified. We explore the regularising effect of the model constraints and scalability of the method to large networks.
Previous methods for the identification of models with stability guarantees have ensured contraction using a quadratic metric [38, 41, 42]. Contraction is implied by the following semidefinite constraint:
(34)  
where , . We refer to (34) as an contraction condition as it implies the contraction conditions (6) with a state dependent weighted norm of the differentials .
We will make future reference to the following convex sets of parameters, in addition to defined in (27):
Here the subscripts refer to the following properties:

 Monotone contracting models i.e. ,

 Monotone models i.e. ,

 Models that are not constrained to be contracting or monotone i.e. ,

 Models that are monotone and contracting in , i.e. ,
All functions and are polynomials in all monomials of their arguments up to a certain degree.
As a baseline for comparison, we will also compare to models denoted , with explicit polyonomial models (1
) fit by leastsquares without any separable structure imposed. We will also compare to standard wavelet and sigmoid Nonlinear AutoRegressive with Exogenous input (NARX) models implemented as part of the Matlab system identification toolbox.
For the implicit models, the model class prefix is followed by the degrees of the polynomials in and in parenthesis. For example, the notation refers to unconstrained models with having degree and having degree . For the explicit polynomial models , the degree used follows in parenthesis, so are explicit polynomial models of degree 5 in all arguments.
The NARX models were fit at each node using the regressors . The wavelet NARX models were set to automatically choose the number of basis functions and the sigmoid NARX models were set to use basis functions. The focus for each model was set to produce the best performance. For the wavelet network, we used a focus on simulation and for the sigmoid network, we used a focus on prediction.
The constraints (14), (17), (18), (19) and (34) are enforced using sum of squares programming [66]. All programs are solved using the SDP solver MOSEK with the parser YALMIP [73] on a standard desktop computer (intel core i7, 16GB RAM).
Via Identification of Linear Positive Systems
In this subsection we study the scalability of the proposed method for the identification of linear positive systems.
We compare the computation time using the proposed contraction constraint to a previously proposed contraction constraint (i.e. quadratic Lyapunov function). Note that for linear systems, the model sets and both are parameterizations of all stable positive linear systems so no difference in quality of fit is observed.
ViA1 Scalability of Separable Linear and Quadratic Metrics
We illustrate the difference in scalability between the models and . Each experimental trial consists of the following steps:

A stable positive system with state dimension is randomly generated using Matlab’s rand function; has a banded structure with band width equal to . Stability was ensured by rescaling to have a spectral radius of .

The system is simulated for time steps;
is obtained by adding white noise to the simulated states at SNR equal to 40dB.

This process is repeated 5 times for each .
The time taken to solve each optimization problem is shown in Fig. 2. Here, we see a significant improvement in the computational complexity from approximately cubic growth for to linear growth for . The networked approach allows us to solve stable identification problems with at least states.
Note that no explicit attempts to exploit the sparsity of the system were made; use of solvers and parsers designed to exploit sparsity could improve performance, especially for the SDPs associated with the LMI parametrization, e.g. [74].
ViB Identification of Nonlinear Models
In this section we study the consistency of fitting nonlinear implicit models via the LREE bound on equation error. In Section VB we saw that in the noiseless case, optimization of LREE will return the true model parameters. We will now explore the effect of introducing noise on the model estimates. The experiments in this section can be seen to supplement those in [42, Sec. IV] which studied the effects of noise and model stability on consistency in the linear setting.
We generate models by sampling a parameter vector and then projecting onto the set . The models have degree 3, state size and . We then generate training data with samples by randomly sampling
from the uniform distribution on
and generated noisy measurements of by , whereis normally distributed noise with a specified Signal to Noise Ratio (SNR). Models
are then trained by minimizing with and performance measured using Normalized Equation Error (NEE):(35) 
where is the identified dynamic model and is the true where is the sample estimate of the 2norm of the function .
In Figure 3, we have plotted the NEE that results from fitting models from by optimizing LREE (26) and implicit equation error (22). We can see that LREE provides a much better fit than implicit equation error, especially as the number of data points increases.
To explore the effect of noise on the consistency of LREE, we have plotted NEE versus the size of the dataset for varying noise level (measured in decibels) in Figure 4. If we had a consistent estimator of the explicit model (1), we would expect to see with consistent slope for all SNR levels. What we in fact observe, however, is that in noisier conditions the NEE initially decreases and then plateaus at a certain level. This phenomena can also be seen in [75, Sec. IV], where LREE produces models biased towards being too stable, even in the infinite data limit.
ViC Identification of Traffic Networks
In this section we examine a potential application of our approach, the identification of a traffic network. The dynamics of traffic networks are thought to be monotone when operating in the free flow regime [7]. Note that monotonicity of some traffic models is lost when certain nodes are congested [76].
The data are generated using the model in [7], which is not in the proposed model set. Hence this section provides a test of robustness of the proposed approach to modelling assumptions.
For this application, we consider using equation error as a surrogate for simulation error. Model performance is therefore measured using Normalized Simulation Error (NSE):
(36) 
where are the simulated states.
We will first introduce the model, then study the effect of the model constraints by comparison to existing methods, and finally examine scalability to large networks.
ViC1 Simulation of a traffic network
The dynamics are simulated over a graph (e.g. Fig. 6), where, each node represents a road with state corresponding to the density of traffic on the road, denoted . Nodes marked in allow cars to flow into the network, and nodes marked out allow cars to flow out of the network. Each edge is randomly assigned a turning preference denoted such that (this ensures that the total number of cars at each intersection is conserved). Each node has a capacity of . Vehicles transfer from roads to according to the routing policy,
where and are monotone demand and supply curves for road . The dynamics of the complete system are then found to be
(37) 
where
The input nodes take a time varying input . We use the following method to generate data sets of size :

[(i)]

First, we generate an input signal for each of size . This signal changes value every 5 seconds to a new value that is normally distributed with mean
. Negative values of are set to zero. An example input signal is shown in Fig. 5. 
The dynamics (37) are integrated over seconds.

A training set of size is generated by sampling every 0.5 seconds.
ViC2 Regularization Effect of Model Constraints
In this section we will explore the effects of introducing monotonicity, positivity, and contraction constraints.
Introducing model constraints limits the expressivity of our model. Consequently, one might expect the estimator bias to increase and the variance to decrease
[77, Chapter 7]. Empirical evidence in this section suggests that a judicious choice of constraints can reduce the variance with a minimal increase in bias.Using the method outlined in Section VIC1 for simulating a traffic network and the graph depicted in Fig. 6, we generate 100 different training sets of size with and then compare the results on three different validation sets. The first validation set has inputs generated with parameters (the same as the training set). The second and third validation sets have parameters and respectively. These are used to test the generalizability of our model to inputs outside the training set.
In figures 10 and 14, we have plotted the on both the training set and validation sets 1 and 2 for our proposed model sets, the polynomial model, the NARX models and the model set . The percentage of total models that displayed instability is indicated in both the bar graph in the upper portion of the figures.
In all cases, the identified linear models performed poorly. This is unsurprising as the true system is highly nonlinear.
Comparing the models and with the remaining models, we can see that the stability constraints have a regularizing effect where increasing the degree of the polynomials reduces the median NSE; in other words, increasing model complexity improves model fidelity. The other models on the other hand perform worse with increasing the complexity. This is most clearly seen in the models , where increase the polynomial degree results in poorer fits on validation data.
Our results also suggest that model stability constraints significantly improve robustness. Without stability constraints, a model that appears stable during training may turn out to be unstable under a slight shift in the input data distribution. This can be seen most clearly in the models and , where on the training data distribution, most models are stable. However, increasing the variance of the inputs to the network results a large number of unstable models with unbounded NSE, c.f. Fig. (c)c and Fig. (c)c. Further evidence is shown in Table I, where we can see that once the variance of the input data doubles, almost all models that do not have stability constraints are unstable.
To compare to a standard approach, we also compare to wavelet and sigmoid NARX models fit using the Matlab system identification tool box. The resulting is shown in Fig. 14 and show the number of models producing unstable models and negative state estimates in tables I and II respectively. While we observed extremely high performance of the individually identified subsystems, simulating the network interconnection of those subsystems produces many unstable models, many negative state estimates and poor quality of fit.
Wavelet  Sigmoid  

train. ( 
Comments
There are no comments yet.