## 1 Introduction

Over the last decade, artificial neural network (ANN) has been extensively studied due to its strength in identifying complex nonlinear structure in data

Bengio13 ; LeCun15 ; Schmidhuber15 . Recent studies on the application of ANN on the data-driven modeling of physical systems have shown promising results kutz17 ; Pathak18 ; RAISSI18 ; Yeo19 . In particular, in Lu17 ; Lu18, it is shown that ANN is capable of predicting the dynamics of a chaotic attractor. When training an ANN for the modeling of a dynamical system, it is typical to use a sufficiently long time series data, so that the ANN fully explores the phase space. Given the huge amount of training data, it is unclear whether the long-term forecast capability of ANN is due to a high dimensional interpolation in a phase space, similar to the delay-coordinate embedding

Packard80 ; Takens81 , or because ANN really learns the nonlinear dynamics. As a first step to answer this question, we first consider the problem, where only a partial observation of a time series data is available. We investigate if ANN can reconstruct the nonlinear dynamics from the incomplete information.Reconstructing dynamics from an incomplete observation has practical applications in many physical, biological, and engineering problems, where the state of the system is accessible only through a sensor network. For example, in geophysics, it is not uncommon to find time series data, which contains a large amount of missing information or is irregularly sampled, due to sensor malfunction, atmospheric conditions, or physical limitations of the sensors Musial11 ; Shen15 ; Ozken18 . When a priori knowledge on the physical system is available, one of the standard approaches to reconstruct the nonlinear dynamics from an incomplete data set is to design a statistical model that incorporates the physical knowledge Gove06 ; MOFFAT07 ; OIKONOMOU18 . In Hamilton15 , it is shown that a high-resolution temporal dynamics can be reconstructed from a low-resolution time series data, which has a 10 times lower time resolution, by using only a subset of the governing equations. Incorporating the physics knowledge on the training of ANN, RAISSI18 proposed “physics informed neural network”, which can recover the complex nonlinear dynamics from only a small fraction of the data.

It is challenging to reconstruct nonlinear dynamics without having any prior knowledge on the underlying process. There are three major approaches to reconstruct the missing information; (a) identifying / modeling (auto-)correlation structures Sinopoli04 ; Kondrashov06 ; Pappas14 , (b) low-dimensional modeling Smith98 ; venturi04 ; RODRIGUES13

, and (c) pattern matching

Phan17 . Although there is a significant progress in imputing the missing data, most of the classical approaches are based on strong assumptions, such as linearization, existence of a low-dimensional structure, or a priori knowledge on the correlation structures. The reconstruction accuracy of these models degrade quickly when the fraction of missing data is large Li18 ,*e.g.*, more than 50%, and the underlying dynamical system is nonlinear.

Recently, a few ANN-based methods have been proposed for the data-driven reconstruction of the missing information by using recurrent neural networks (RNN).

Lipton16 showed that simply filling in a missing period with the last known observation and using a missing data indicator as one of the input data to RNN can increase the prediction accuracy in a clinical binary classification problem. In Berglund15 , a Bidirectional-RNN based method is developed for a probabilistic reconstruction of the missing information for a binary time series data. Che18 proposed an unidirectional RNN model, which assumes a relaxation of the unobserved dynamics towards an empirical mean of the data. Based on the relaxation model of Che18 , Cao18 and Luo18 developed Bi-directional RNN and Generative Adversarial Network models, respectively. Those ANN models showed better performance compared to the conventional statistical imputation approaches for the applications in the health care domain. However, it should be noted that the relaxation towards an empirical mean is essentially based on the assumption of a stationary Markov process, which does not consider temporal structures or dynamics of the underlying process.In this study, we develop a novel RNN model for the data-driven reconstruction of nonlinear dynamics from temporally sparse observations. For the modeling of nonlinear dynamics, the echo state network (ESN) Jaeger04 is employed. The reconstruction problem is formulated as a fixed-point problem. The behavior of the proposed fixed-point ESN is thoroughly studied by using chaotic time series data. It is shown that ESN can reconstruct the chaotic, nonlinear dynamics from as little as 5% of the data. This paper is organized as follows; the echo state network is reviewed in section 2.1. In 2.2, a fixed-point iteration method is proposed. Numerical experiments of the fixed-point ESN model is presented in section 3. Finally, conclusions are given in section 4.

## 2 Echo State Network

### 2.1 Review of Echo State Network

The echo state network (ESN) has been studied extensively due to its strength in learning complex dynamics from time series data Pathak18 ; Lu17 ; Inubushi17 ; Antonik18 . ESN is a nonlinear state-space model, of which latent state consists of a set of randomly generated dynamical systems. Following Jaeger04 ; LUKOSEVICIUS12 , the evolution equation of the latent state is

(1) |

Here, is the latent state of ESN,

is a vector of the input signal to ESN,

is a temporal relaxation coefficient, and is a nonlinear function. The subscript denotes a time stamp, , where denotes a time step size, or data sampling interval. The input signal may consist of the observation of the physical system, , and an exogenous forcing or auxiliary input, , such that . For the nonlinear function, usually a hyperbolic tangent function is used;(2) |

Here, and

are randomly generated weight matrices. Independent uniform random variables are used to generate

and Jaeger04 , e.g., , in which denotes a uniform random distribution in . It is worthwhile to note that the evolution equation of the standard recurrent neural network (RNN) has a very similar structure with (1–2). One of the most significant differences between RNN and ESN is that, in ESN, the weight matrices in (2) are generated randomly, while they are computed by a maximum likelihood method in RNN. The weight matrix, , decides the connectivity between the latent state, . It is suggested that making the connection sparse results in a richer internal dynamics Jaeger04 . Typically, has only 1 2% of nonzero elements. A more detailed explanation about generating is given in section 3.The echo state network solves an initial value problem. While it is straightforward to specify the initial condition of the physical variables, e.g., and , it is challenging to find the correct initial condition of the latent state, . In Jaeger04 , it is proposed that an ESN must have the “echo state property”, which is similar to a “fading memory effect” LUKOSEVICIUS12 ; Jaeger01 . For , it is straightforward to show the fading memory effect. Observe that (1) can be rewritten as

(3) |

in which . If we choose for , it is clear that the evolution equation of the latent state, (1), is the forward Euler approximation to the system of relaxation equations,

(4) |

Since , (4) implies that the effects of the misspecification of will vanish after some initial transient period. It is also clear that ESN relies on a set of randomly generated relaxation processes to model the nonlinear behaviors in the time series data. When , the “echo state property” can be guaranteed by making the spectral radius of smaller than unity, i.e., LUKOSEVICIUS12 .

Once the latent state of ESN is updated by (1), the physical process at the next time step () is computed by a linear projection of the latent state onto the physical space,

(5) |

in which is an augmented latent state, and

is a linear map. Conceptually, ESN is similar to a “kernel method” in the machine learning literature

Bishop06 , where a nonlinear projection is used to map the data into a high dimensional space with a goal of finding a linear pattern in the feature space. It should be noted that, although this class of methods has been successfully applied to solve complex problems in practice, it lacks theoretical analysis, such as the optimality of the projection and the convergence.In the standard ESN, “model training” is to find the linear map, , from the data. Let be a time series data of length . The linear map is computed by solving the following regularized optimization problem;

(6) |

in which denotes the Frobenius norm, and is a regularization parameter. The analytical solution of (6) is

(7) |

The second term in (6) corresponds to the standard regularization, often called the Tikhonov regularization. In ESN, first the physical variable, , is projected onto a high dimensional () latent space by a nonlinear map, , where . Because is generated randomly, there is no guarantee that is linearly independent. Hence, the regularization is usually required to deal with the possible multicollinearity.

Temporal relaxation parameter | |

-regularization parameter | |

Fraction of nonzero elements of | |

Spectral radius of | |

In summary, an ESN consists of the following update equations;

The free parameters of an ESN are listed in Table 1.

### 2.2 Fixed-point ESN for Sparse Observation

Here, we consider the time series data generated by a nonlinear dynamical system,

(8) |

in which is an unknown time marching operator and is the sampling interval of the observation. Let be the full observation obtained by an equidistance sampling of , i.e.,

We assume that the full observation, , is not accessible, and only a partial observation of , which is denoted by , is known. The missing (unobserved) elements of are marked by an arbitrarily large number, , i.e.,

Based on , we can define an indicator set of the observation, , in which

Here, we are interested in reconstructing from a sparse observation, , where the fraction of the missing data,

(9) |

is larger than 0.5. We assume that the exogenous forcing, , is fully known. Figure 1 shows an example of such a sparse observation.

The standard method of training an ANN, so called the “teacher forcing” method, consists of three steps; 1) provide an input data , 2) make a prediction, , and 3) compare the difference between and at every time step, i.e., for . However, when there are missing entries in the data set, such sequential process is not possible. One of the conventional approaches to circumvent the problem is to fill the missing entries by using a statistical imputation method and to train an RNN against the imputed data set, assuming that the imputed data is close to and has no systemic bias. When the missing fraction is small, e.g., , the imputation-and-training method works fairly well. For a very high missing fraction, e.g., , the standard imputation methods become increasingly unreliable.

To develop an ESN for the reconstruction of complex dynamics from a sparse observation, we exploit the capability of ESN as a universal function approximator Schrauwen07 ; Funahashi93 ; GRIGORYEVA18 . We assume that, for a large enough , there is an ESN, which can accurately computes the time series data, , such that

(10) |

and

(11) |

in which and are the submatrices of , which correspond to and , respectively. We can rewrite (10 – 11) as

(12) |

where

(13) | ||||

(14) |

Note that is a function of and depends on through (6). It is clear from (12) that is a fixed point of the nonlinear operator, .

Input: , , tolerance level (), maximum iteration ()

Output:

Now, we define a “reconstructed” observation vector,

(15) |

in which is the Hadamard product, is a vector of ones, and is a model prediction, or reconstruction. Simply, takes the observation if available, otherwise the missing entry is filled by the output of the ESN. From (12), the goal of the nonlinear reconstruction of the dynamics is equivalent to finding and , which satisfy

(16) |

Hence, we propose the following fixed-point iteration method,

(17) |

in which is an under-relaxation parameter of the fixed point iteration, and the superscript is the iteration count. The under relaxation is required to guarantee that the Jacobian of exists (see 37). It is trivial to show that upon convergence.

The linear map at -th iteration is computed by solving the following optimization problem,

(18) |

In other words, is computed by comparing only with the available observation data. Note that the time index starts from . As discussed in section 2.1, we cannot specify the correct initial condition of the latent state, . Hence, we disregard the ESN outputs in the first time steps to remove the effects of the initial condition. The analytical solution of (18) is

(19) |

Here, is the -th column of . Note that after a proper rescaling, the summations in the original ESN formulation (7) are time averaging, while the summations in (19) correspond to an ensemble averaging, or a Monte Carlo sampling. Hence, when the dynamical system is ergodic, the time series data is missing in random, and the number of observation, , is large enough, the linear map of the fixed-point method, , approaches to as the fixed-point iteration converges.

The fixed point iteration procedure is outlined in Algorithm 1. Computing involves updating the ESN twice over the entire time series (); first to compute and then to update . It is possible to reduce the computation cost by storing in the first passage and updating by computing only the projection . However, this approach requires a huge memory space. For example, when and , storing with double precision requires a few gigabytes of memory.

One of the advantages of a fixed point iteration method is that it does not require derivative information of the nonlinear operator, . Computing the derivatives requires an additional storage space of and has computational complexity of . However, it is challenging to analyze the convergence of a fixed point iteration and it usually exhibits a linear convergence. A theoretical analysis of the proposed fixed point iteration is provided in A.

## 3 Numerical experiments

In this section, a series of numerical experiments on the reconstruction of nonlinear dynamical systems is presented. The same ESN parameters are used across all the numerical experiments, except for ;

(20) |

The size of the ESN is also fixed at . To generate the sparse matrix , fist, elements of

are randomly selected and filled by sampling from the uniform distribution,

. Then, is scaled by multiplying a constant to make . As shown in Lu17 , the accuracy of an ESN depends on the choice of those free parameters. However, we do not make an attempt to fine tune those parameters, because, without having a priori knowledge on the dynamical system, hand-tuning those parameters with a sparse observation is impractical.### 3.1 Mackey–Glass Time Series

First, the behavior of the proposed fixed-point ESN is investigated by using a sparse observation of a univariate time series. The ground truth is generated by a discrete sampling of the Mackey-Glass delay-time dynamical system with a sampling interval of . The Mackey-Glass equation is Mackey77 ,

(21) |

where we use , , , and . For this set of parameters, the Mackey-Glass time series becomes chaotic Farmer82 . The length of the time series is . The sparse observation is generated by randomly removing fraction of the data. Figure 2 (a) shows the sparse observation when 95% of the time series data is removed. In general, it is very challenging to reconstruct the missing information of a univariate time series Phan17 . Without a prior knowledge, the usual practice is to use an interpolation. However, as shown in figure 2

(b), a standard cubic spline interpolation results in a poor estimation.

Figure 3 shows the improvement of over the iterations for . The initial condition, , is given as a linear interpolation (fig. 3 a). It is clearly shown that, over the iterations, the fixed-point ESN learns the complex nonlinear, time-delay dynamics closer and closer to the ground truth. By the 200-th iteration, the reconstructed trajectory, , becomes almost indistinguishable from .

Figure 4 shows the reconstructed chaotic attractor of the Mackey-Glass time series. Note that only the linearly interpolated time series (fig. 4 a) together with the locations of missing data is provided to the fixed-point ESN. After 200 iterations, the fixed-point ESN is able to closely reconstruct the chaotic attractor. This result suggests that the capability of ESN is beyond a high-dimensional interpolation in the phase space.

In figure 5, a normalized root mean-square error of is shown as a function of the iteration number. The normalized root mean-square error (NRMSE) is defined as,

(22) |

There is a rapid decrease in NRMSE in the first 100 iterations. NRMSE is decreased from 0.93 of the initial condition to 0.15 at . After , seems to be converged and there is no further improvement in NRMSE.

0.60 | 0.80 | 0.90 | 0.95 | |

0.2 | 0.2 | 0.2 | 0.4 | |

In figure 6, the behavior of the fixed-point ESN is investigated for a range of the missing rate, . The relaxation and regularization parameters are listed in table 2. It is found that, when is large, both and need to be increased to make the fixed-point iteration stable. The data sets have the equal length, . The number of observations changes from at to at .

Figure 6 (a) shows the convergence of the fixed-point ESN for the range of . During the fixed point iteration, the convergence is checked by the -improvement,

(23) |

It is shown that, when is modest, the fixed point iteration converges very rapidly. For , i.e., 40% of the data is available, the fixed-point ESN converges under 5 iterations, which increases to 50 iterations for .

In figure 6 (b), the accuracy of the fixed-point ESN is compared with linear and cubic spline interpolations. The normalized RMSE is computed as,

(24) |

Here, and , respectively, denote the linear and cubic spline interpolations. The fixed-point ESN outperforms both the linear and cubic spline interpolations even at low missing rate, . The root mean-square error of the fixed-point ESN is about 30% of the linear interpolation and 80% of the cubic spline interpolation. The strength of the fixed-point ESN is more pronounced at higher . When , RMSE of the fixed-point ESN is less than 8% of the linear and cubic spline interpolations.

The effects of the free parameters and the random generation of ESN are shown in figure 7. In the experiments, four free parameters considered, , , , and . In each experiment, all of the free parameters are fixed at the reference values (20) except for the one being tested. The normalized RMSEs with respect to the linear interpolation () are computed from 20 randomly generated fixed-point ESNs. It is shown that the expectation of , for the reference parameter set, e.g., in figure 7 (a), is much lower than from a single realization in figure 6 (b); versus . Note that, for the range of parameters tested, the variations of is less than 0.04, indicating that the fixed-point ESN is not so sensitive to the changes in the free parameters.

The effects of the temporal relaxation parameter, , are shown in figure 7 (a). In general, the error decreases at smaller for the Mackey-Glass problem. However, it is found that, when , the fixed-point ESN does not converge. The temporal relaxation parameter, , works effectively as a regularization. Increasing enforces a longer autocorrelation, or higher inertia, which results in a smoother trajectory. On the other hand, does not impose such constraint in the temporal dynamics. When the missing fraction, , is high, a regularization in the temporal structure is required to guarantee temporal smoothness in the solution, which explains why the fixed-point ESN does not converge when .

Figure 7 (b) shows the changes in as a function of the scale of the weight matrix, . It is shown that, while is not so sensitive to , the standard deviation of is an increasing function of . The scale of determines the relative contribution between the input variable, , and the latent state, , to the time evolution of the latent state, . Because the time evolution of becomes more autonomous when is small, the standard deviation of also becomes smaller Yeo19b .

It is argued that making the connection of the latent state sparse leads to a richer dynamics Jaeger04 . In figure 7 (c), is computed as a function of the sparsity, . It is shown that, in the range of explored, the fixed-point ESN is not sensitive to the changes in .

In figure 7 (d), is shown to be an increasing function of when . In general, it is expected that increasing makes the nonlinear representation capability of an ESN stronger. However, when the number of data is fixed, using a large makes it difficult to train an ESN because the number of unknown parameters increases linearly proportional to . For , the number of observations is , while the number of unknown parameters is . The increase in the reconstruction error for seems to be related with the decrease in the ratio between the number of unknown parameters to the number of data.

### 3.2 Lorenz-63 System

In this section, we use the Lorenz-63 system Lorenz63 to validate the fixed-point ESN. The Lorenz-63 system is given by the following set of equations,

(25) |

We used the coefficients from Lorenz63 , , , and . The time series data is generated by sampling at every, . The length of the time series is .

First, we consider a partial observation of the Lorenz-63 system Lu17 . The data set consists of only (). We assume that is completely known, while fraction of the data is randomly missing in . Figure 8 shows an example of the partial observation with missing data. Here, we aim to recover the missing data in from ().

Figure 9 shows the reconstruction, , at four different iteration counts. The fixed-point ESN starts from a linear interpolation (Fig. 9 a). Similar to the Mackey-Glass time series (Fig. 5), the reconstruction error quickly reduces in the first 100 iterations (Fig. 9 b-c). By , is already very close to the ground truth, . At , becomes almost indistinguishable from . The RMSE normalized by the linear interpolation is for the time window shown in figure 9 (d).

0.90 | 0.2 | 0.06 | |
---|---|---|---|

0.95 | 0.4 | 0.37 |

The normalized RMSEs () for and 0.95 are shown in table 3, together with the parameters (, ). is computed over the entire trajectory, . It is shown that, when , RMSE of the fixed-point ESN is only about 6% of the linear interpolation, which changes to 37% at . Figure 10 explains why there is a significant increase in RMSE at . Because of the random sampling, in the time window shown in figure 10, there is only one observation at for . In other words, two extremely large missing intervals ( and ) exist consecutively. This is a rare case, because the average interval of the missing observation is for . As discussed in Lu17 , it is not possible to correctly predict given only , because, when is not observed, the symmetry of the Lorenz-63 system implies that and are equally possible. Hence, the reconstructed trajectory, , predicts a positive peak at , while the ground truth, , reaches a negative peak at . Then, for , the fixed-point ESN cannot adjust its dynamics due to the lack of the data. On one hand, this observation demonstrates the limitation of the fixed-point ESN, which fails to capture the dynamics when the missing interval is too long. One the other hand, the Lorenz system is chaotic, of which long-term predictability is quite limited.

In the next experiment, the sparse observation of all three variables, , are given, and we aim to reconstruct the dynamics of the three-dimensional time series simultaneously. Figure 11 (a–c) shows an example of the sparse observation of the Lorenz-63 time series. All three variables have the same missing rate, i.e., . Figure 11

(d) shows the availability of the data. Since the time series data is removed at random, the probability that all three variables are observed at the same time frame is 0.001 for

.Figure 12 shows the reconstructed Lorenz-63 time series from the 90% randomly missing data after 90 iterations. The relaxation and regularization parameters are, respectively, and . It is shown that the fixed-point ESN is capable of accurately reconstruct the nonlinear dynamics of the Lorenz-63 system.

0.90 | 0.95 | |
---|---|---|

0.18 | 0.51 | |

0.17 | 0.36 |

The normalized RMSEs with respect to are shown in table 4 for and 0.95. The normalized RMSE is defined as,

For , the fixed-point ESN makes a much more accurate reconstruction compared to the linear and cubic spline interpolations. However, the accuracy of the fixed-point ESN significantly decreases at . The fixed-point ESN still outperforms those interpolation methods. But, there is about three-fold increase in as changes from 0.9 to 0.95. At , the cubic spline interpolation results in a large error due to the overshoot, which makes smaller than .

### 3.3 Lorenz-96 System

In this experiment, the fixed-point ESN is tested against a 6-node Lorenz-96 system Lorenz96 ;

(26) |

The Lorenz-96 system is periodic, i.e., . Here, we consider and . The time series data is generated by sampling at every and the length of the time series data is .

Figure 13 (a–c) show a subset of the 6-node Lorenz-96 system for . Figure 13 (d) shows an example of the data availability. Because the probability that every is observed at the same time frame is for , the data set does not contain a time frame at which every is observed.

Figure 14 shows the reconstructed 6-node Lorenz-96 system from the sparse observation of . The parameters of the fixed-point ESN are and . It is again shown that is almost indistinguishable from . The normalized RMSEs are and . When the fixed-point ESN is used to recover the dynamics from a sparser data set of , the normalized RMSEs are significantly increased to and .

### 3.4 Forced van der Pol oscillator

Now, we consider a forced van der Pol oscillator, which is given by the following equations,

(27) | ||||

(28) |

The exogenous forcing, , is given by an Ornstein-Uhlenbeck process as

(29) |

in which is the Wiener process. The parameters used in this simulation are, , , and . Similar to the partial observation of the Lorenz-63 system, is not given and the data set consists of , which is sampled at every . Hereafter, we use to denote . The length of the time series is .

In sections 3.1 – 3.3, we consider the reconstruction of chaotic time series. As shown in figure 10, the accuracy of the reconstruction becomes lower as the interval of the missing information increases. The decrease in the accuracy seems to be related with the chaotic nature of the time series. On the other hand, the unforced van der Pol oscillator has a stable limit cycle. The dynamics of the forced van der Pol oscillator in (27–28) depends on the restoring force to the limit cycle and excursion due to the exogenous forcing, . Hence, in theory, if we know the governing equations and the full history of , we can find no matter how long the interval between the observations.

The sparse observation of the van der Pol oscillator is shown in figure 15. Here, we consider an extremely sparse data, in which 99% of the observations are randomly removed. Since , the number of observations is only 500. Although the average interval of the missing data is , due to the random removal, it is not uncommon to find a missing interval of . The exogenous forcing is shown in figure 15 (b). While the Ornstein-Uhlenbeck process is computed with to numerically integrate (27–28), is downsampled with the sampling interval of . In a sense, we introduce an uncertainty, in which the sub-timescale information of is unknown. Hence, the time series becomes a Markov process, . Although the subscale fluctuations of are not significant, it breaks the assumption of the fixed-point ESN that there exists a deterministic mapping, , which guarantees the existence of a fixed point.

Figure 16 shows the -improvement in terms of the iteration count. In both cases, and are used. It is shown that, for , the fixed-point ESN quickly converges to a solution. The -improvement becomes by the 100-th iteration. When only 1% of the data is available (), the convergence becomes much slower. The -improvement reaches at the iteration number , and there is no further reduction of afterward.

Figure 17 shows the reconstructed dynamics for . Even with the extremely sparse data, the fixed-point ESN is able to reconstruct the nonlinear dynamics. The normalized RMSE is for and for . It is observed that, although the reconstructed trajectory, , closely follows the dynamics, is much noisier compared to the previous experiments on the deterministic time series, which seems to be related with the uncertainty in .

## 4 Concluding remarks

In this study, a novel model-free method is developed to reconstruct nonlinear dynamics from a temporally sparse observation, in which a large fraction of the time series data is randomly missing, or unobserved. Based on the assumptions of the noise-free observation and universal function approximation capability of a recurrent neural network, we show that the reconstruction problem can be solved by a fixed-point problem. The fixed-point method consists of two major components; outer-iteration to find the fixed-point solution and inner-loop to compute the parameters of the recurrent neural network. In theory, any recurrent neural network architecture can be used for the inner-loop, but the computational cost of solving the inner-loop with a stochastic gradient method will make it impractical. Hence, we employ the echo state network for the recurrent neural network, of which parameter can be computed by solving a simple ordinary least square problem. The fixed-point method is simple to implement and can be solved with

floating point operations, because it does not require to compute the derivatives of the nonlinear operator. Although it may be possible to develop a gradient-based optimization algorithm to solve the problem, computing the derivatives of the echo state network will require additional memory of and floating operation of , which makes it difficult to use in practice.The proposed method, referred as the fixed-point ESN, is tested against time series data generated by chaotic dynamical systems. For the Mackey-Glass time series, it is shown that the fixed-point ESN can accurately reconstruct the nonlinear dynamics even when the 95% of the data are randomly removed. For the missing fraction , the average time interval between two consecutive observations is , while the characteristic period of the Mackey-Glass time series is Gers01 . The Lorenz-63 and 6-node Lorenz-96 systems are used to demonstrate the capability of the fixed-point ESN for the reconstruction of nonlinear dynamics of multivariate time series. In both cases, the fixed-point ESN provides very good approximation of the complex nonlinear dynamics up to . Finally, we use a forced van der Pol oscillator, of which dynamics is largely determined by the exogenous forcing, to investigate the behavior of the fixed-point ESN. It is shown that the fixed-point ESN can learn the relation between the exogenous forcing and the dynamical system with only 1% of the time series data.

It is demonstrated that the echo state network is capable of learning the dynamics from only a partial observation of a complex nonlinear dynamical system.
In this study, we limit our interest only to relatively low dimensional dynamical systems. It is a subject of further investigation how the nonlinear reconstruction method can be extended to a high dimensional dynamical system with complex correlation structures, such as a spatio-temporal process. In many practical applications, observations are typically corrupted by a sensor noise, which makes the observation a stochastic process. Although, it is shown that the fixed-point ENS still can learn the nonlinear dynamics for a small noise, the accuracy of the fixed-point ESN is not guaranteed under a small signal-to-noise ratio. Finally, although it is empirically shown that the fixed-point ESN is very effective in the nonlinear reconstruction, our understanding on the theoretical understanding, *e.g.*, accuracy and convergence, is far from complete.

## Appendix A Convergence analysis of fixed-point ESN

It is challenging to theoretically show the convergence of a fixed point iteration of a nonlinear map. One of the widely used method to analyze the convergence is based on the Banach fixed-point theorem, which provides a *sufficient* condition. The Banach fixed point theorem requires a contraction mapping;

(30) |

in which is a Lipschitz constant. In Lu18 , it is *assumed* that ESN provides a contraction mapping. However, it is a too strong assumption and in general such a globally uniform contraction is not expected for a nonlinear map. Instead, here, we explore a local convergence around a fixed point, .

Let denote the Jacobian of in (17). From the contraction mapping theorem, it can be shown that, if for a subordinate matrix norm , there is a neighborhood around the fixed point, , in which the fixed-point iteration converges to .

For simplicity, here we assume that is univariate, i.e., . But, extending the analysis to a multivariate time series is straightforward. The Jacobian of the fixed point iteration is given as

(31) |

in which is the Kronecker delta.

In the second term on the right hand side (RHS) of (31),

(32) |

Here, is a set of nonzero entries of , i.e., the summation is over the available observations, and . By definition,

and, from (10), . Then, (32) can be written as

(33) |

At a fixed point, , we have and , which makes . Therefore, the second term on the RHS of (31) vanishes.