Discovering Governing Equations from Partial Measurements with Deep Delay Autoencoders

by   Joseph Bakarji, et al.
University of Washington

A central challenge in data-driven model discovery is the presence of hidden, or latent, variables that are not directly measured but are dynamically important. Takens' theorem provides conditions for when it is possible to augment these partial measurements with time delayed information, resulting in an attractor that is diffeomorphic to that of the original full-state system. However, the coordinate transformation back to the original attractor is typically unknown, and learning the dynamics in the embedding space has remained an open challenge for decades. Here, we design a custom deep autoencoder network to learn a coordinate transformation from the delay embedded space into a new space where it is possible to represent the dynamics in a sparse, closed form. We demonstrate this approach on the Lorenz, Rössler, and Lotka-Volterra systems, learning dynamics from a single measurement variable. As a challenging example, we learn a Lorenz analogue from a single scalar variable extracted from a video of a chaotic waterwheel experiment. The resulting modeling framework combines deep learning to uncover effective coordinates and the sparse identification of nonlinear dynamics (SINDy) for interpretable modeling. Thus, we show that it is possible to simultaneously learn a closed-form model and the associated coordinate system for partially observed dynamics.



There are no comments yet.


page 6

page 7

page 12

page 15

page 18


Learning normal form autoencoders for data-driven discovery of universal,parameter-dependent governing equations

Complex systems manifest a small number of instabilities and bifurcation...

SINDy-BVP: Sparse Identification of Nonlinear Dynamics for Boundary Value Problems

We develop a data-driven model discovery and system identification techn...

Deep Learning Models for Global Coordinate Transformations that Linearize PDEs

We develop a deep autoencoder architecture that can be used to find a co...

Data-driven modelling of nonlinear dynamics by polytope projections and memory

We present a numerical method to model dynamical systems from data. We u...

Discovering Sparse Interpretable Dynamics from Partial Observations

Identifying the governing equations of a nonlinear dynamical system is k...

Bounded nonlinear forecasts of partially observed geophysical systems with physics-constrained deep learning

The complexity of real-world geophysical systems is often compounded by ...

Extraction of Discrete Spectra Modes from Video Data Using a Deep Convolutional Koopman Network

Recent deep learning extensions in Koopman theory have enabled compact, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Scientific progress has been driven by the discovery of simple and predictive mathematical models from observations. Interpretable, parsimonious governing equations have been especially valuable as they typically have allowed for greater engineering insight, simple parametrizations, and improved extrapolation capabilities. Increasingly, these models are learned from data using machine learning algorithms, such as genetic programming 

[schmidt2009distilling, Bongard2007pnas, Schmidt2011pb, Daniels2015naturecomm] and the sparse identification of nonlinear dynamics (SINDy) [brunton2016discovering, rudy2017data]. When only partial observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to formulate a closed-form model in these variables, using either analytic or data-driven techniques. Time-delay embedding provides an approach to augment these limited measurements, and under certain conditions, given by Takens’ embedding theorem [takens1981detecting], the delay-augmented state yields an attractor that is diffeomorphic (i.e. continuously differentiably deformable) to the underlying, though unmeasured, full-state attractor. However, the coordinate transformation back to the original attractor is unknown and may be arbitrarily complex to represent. Learning interpretable and generalizable dynamical systems models in the embedding space has remained an open challenge since Takens introduced the embedding theorem in 1981 [takens1981detecting], with several promising recent developments [Daniels2015naturecomm, somacal2020uncovering, atkinson2020bayesian, ribera2021model]. The central goal of this work is to learn parsimonious, closed-form models from partially observed systems, leveraging the SINDy algorithm for model discovery and a custom deep autoencoder to learn an appropriate coordinate transformation from a time-delay embedding.

Time-delay embedding has a rich history in data-driven modeling [roux1983observation, Broomhead1986physd, Crutchfield1987cs, Broomhead1989prsla, kennel1992method, deyle2011generalized, Sugihara2012science, Giannakis2012pnas, Billings2013book, ye2015distinguishing, Ye2015pnas, brunton2017chaos, Loiseau2018jfm, pan2018data, das2019delay, Giannakis2019acha, runge2019inferring, zou2019complex, pan2020structure, giannakis2020delay], especially for making predictions when analytical models are not available. Notable applications of the theory include distinguishing noise from chaotic dynamics [roux1983observation, kennel1992method], and recent work has shown that delay embedding can improve machine learning models of time-series data [lee2019linking, Wehmeyer2018jcp]. Numerous attempts have been made to learn dynamical systems models directly in the delay embedded space. For example, when SINDy was applied to a time-delay embedding for chaotic systems, it was shown that the simplest model that describes the data is linear, establishing a strong connection to Koopman operator theory [brunton2017chaos]. This connection between delay embedding and Koopman has a longer history [mezic2004comparison, Susuki2015cdc, brunton2017chaos, arbabi2017ergodic, champion2019discovery, das2019delay, Giannakis2019acha, pan2020structure, Kamb2020siads, hirsh2021structured], including the use of dynamic mode decomposition (DMD) [Schmid2010jfm, kutz2016dynamic] on time-delay coordinates [tu2014dynamic, brunton2016extracting]. However, for chaotic systems, these linear models in delay coordinates suffer from closure issues and fall short of the goal of parsimonious nonlinear models. In related work, attempts have been made to extend SINDy to partially measured systems, using the variational annealing data assimilation technique [ribera2021model] or applying SINDy to higher derivatives of the measured variables [somacal2020uncovering], which is closely related to time-delay embedding.

Figure 1: Summary of workflow. , and are parametric functions to be discovered.

Because the data in the delay-embedded space is diffeomorphic to the full-state attractor, where a parsimonious nonlinear model often exists, it is reasonable to try and learn a coordinate transformation starting from a delay-embedded state. Figure 1

shows a schematic of the modeling task, involving learning a coordinate transform into a space where an analytical model may be obtained. Deep neural network autoencoders have shown great promise in representing arbitrarily complex coordinate transformations, especially for dynamical systems 

[Takeishi2017nips, lusch2018deep, Yeung2017arxiv, Mardt2018natcomm, Wehmeyer2018jcp, champion2019data, Otto2019siads, linot2020deep, gilpin2020deep, lee2020model, kalia2021learning]. Importantly, Gilpin [gilpin2020deep]

introduced a false nearest neighbors loss function in a deep neural network to promote embedded attractors that are similar to the full-state attractor. Similarly, Wehmeyer and Noé 

[Wehmeyer2018jcp] showed that autoencoders may be combined with time-delayed measurements to learn effective slow reaction coordinates for molecular kinetics. However, neither of these studies learned parsimonious models in the autoencoder coordinates. Champion et al. [champion2019data] showed that it is possible to combine autoencoder networks with SINDy to simultaneously learn a nonlinear coordinate transformation and discover a parsimonious dynamics model, making this a promising technique for learning models in delay-embedded coordinates.

Figure 2: General approach to data-driven discovery. , and are the encoder, decoder and analytical model respectively.

In this work, we leverage the SINDy-autoencoder framework [champion2019data] to learn parsimonious governing dynamical systems from delay-embedding of partial observations, as summarized in Fig. 2. In general, model discovery from full-state measurements is an ill-posed inverse problem, and model discovery from limited measurements compounds this challenge. As with other custom autoencoders that are used to learn dynamics [lusch2018deep, champion2019data]

, there are several critical steps, including choosing the architecture, designing the loss function, initializing the parameters, and tuning the optimization algorithm used to train the model. In addition to demonstrating this algorithm to learn closed-form models from partial observations of several dynamical systems, we describe how this inverse modeling problem may be broken down into sub-problems. In general, starting with a problem where the answer is known is important to guide the design of the architecture and loss functions. Thus, we show how the architecture and individual terms in the loss function affect the learning process with different levels of assumed knowledge. We also investigate what preprocessing should be applied to the data, such as whether or not to apply the singular value decomposition to the delay coordinates before the autoencoder network. Incredibly, we show that for the Lorenz system, from a partial observation of only the first coordinate, it is possible to learn a coordinate system in which the dynamics may be represented by a differential equation with even fewer terms than the original Lorenz system; the original system has 7 linear and quadratic terms, while our model only has 6 terms. We also show that it is possible to learn a Lorenz analogue from a video sequence of a chaotic waterwheel experiment.

The rest of the paper is organized as follows. First, we provide a brief problem definition in Sec. 1.1. Section 2 has background information about SINDy and how it has been combined with autoencoders. Section 3 discusses the proposed delay SINDy autoencoder, including the architecture and custom loss functions. The main results from numerical experiments are provided in Sec. 4. We analyze several canonical nonlinear and chaotic systems, in addition to video data from a chaotic waterwheel experiment that may be viewed as a Lorenz analogue. Finally, conclusions and discussions are provided in Sec. 5

1.1 Mathematical problem definition

An overarching inverse problem in data-driven dynamical systems is to learn a model for the evolution of a system, purely from observation data. Mathematically, this inverse problem can be stated as follows. Let , an

dimensional time-dependent state vector defined on the domain

, with and , be the solution of a nonlinear differential equation


where is a smooth and nonlinear function and a vector of system parameters. Further, let represent noisy measurements of the system, given by


where is a noise process. The goal is then to learn an approximate dynamical system


in terms of a new state , which may either be the measured state or an invertible function of :


If observation data for the full state are available, so that and , an approximation of can be inferred using a variety of machine learning algorithms. Although neural networks have been shown to provide useful predictive models [pathak2018model, vlachas2018data, wu2018deep, raissi2019physics, mardt2020deep], they don’t result in interpretable, closed-form expressions such as (3) that are amenable to analysis. Parsimonious modeling techniques, such as genetic programming [schmidt2009distilling, Bongard2007pnas] and SINDy [brunton2016discovering, rudy2017data], are able to learn analytic, closed-form models that often capture the exact form of the original dynamics (1) when they are known.

However, in many real world applications, only partial measurements are available, so the dimension of is less than the dimension of , i.e. . In general, it is not possible to learn directly in its original dimension in terms of the measurement vector . A direct approach of fitting a recurrent relation for

, using recurrent neural networks, Bayesian methods, Gaussian process regression, etc., or discovering the underlying differential equation using SINDy, do not generalize well if the dynamics of

are nonlinear and depend on hidden variables that control its evolution. The challenge of representing the dynamics in terms of a partial observation of the full-state is known as the closure problem, and it is one of the most fundamental issues in scientific modeling, with a particularly rich history in turbulence modeling [ling2016reynolds, pan2018data, duraisamy2019turbulence, brunton2020machine].

There are several embedding techniques [Coifman2008mmas, Yair2017pnas, brunton2017chaos, gilpin2020deep] to enrich the partial measurement vector, providing a map as in (4), although it remains a challenge to learn the dynamics (3) in these coordinates. Therefore, in general, it is necessary to simultaneously learn the coordinate transformation (4) and the dynamics (3). For simplicity, we will assume scalar measurements , although the following arguments generalize to multi-dimensional vectors of partial measurements. Before learning the map (4), we first time-delay embed the measurement , resulting in the intermediate vector , where and are the number of delays and the time increments between successive measurements respectively. Takens’ theorem [takens1981detecting] provides conditions for when time-delay embedding results in an attractor that is diffeomorphic to the original system. We then simultaneously learn the map (4) from and the dynamics (3) using a modification of the SINDy-autoencoder framework [champion2019discovery]. The goal is an interpretable differential equation in the transformed embedded variable. Takens’ embedding theorem also puts a condition on the embedding dimension to be for the diffeomorphism to be guaranteed.

2 Background

In this section, we review time-delay embedding, the sparse identification of nonlinear dynamics (SINDy) [brunton2016discovering] algorithm, and the recent combination of SINDy with deep autoencoders [champion2019discovery].

Delay embedding.

When defining the delay embedding variable of a measurement time series , the proper choice of and have been studied extensively [ma2006selection, kennel1992determining] (see appendix A). The embedded variable can be assembled into a Hankel matrix


The corresponding eigen-time-delays are given by the columns of

from the singular value decomposition (SVD)

, and they are used for linear system identification [Juang1985jgcd, Juang1991nasatm, Phan:1992, Juang1994book, Brunton2019book] and in the Hankel alternative view of Koopman (HAVOK) algorithm [brunton2017chaos]. The first few eigen-time-delays, whose rank can be chosen based on a Pareto front analysis, are often sufficient to encode the shape of a given signal, as shown in Fig. 13.


The SINDy algorithm frames the problem of discovering the model in (3

) as a sparse generalized linear regression 

[brunton2016discovering]. SINDy assumes a library of candidate functions that determine possible terms in . Given discrete-time samples of and , these may be cast into data matrices and , where , . Assuming candidate functions, with , it is possible to build the library data matrix . The SINDy algorithm then minimizes the loss


where the sparse matrix determines which candidate terms are active in the dynamics, so that . The sparsity constraint minimizes the number of non-zero terms in , but it requires an intractable combinatorial search. Thus, it is either replaced by or a sequentially thresholded least-square algorithm for computational efficiency. A sparse regularizer is motivated by Occam’s razor, which prefers simpler and more interpretable models. SINDy has been widely applied to model a range of complex systems [Schaeffer2017prsa, boninsegna2018sparse, Kaiser2018prsa, Loiseau2017jfm, Brunton2019book, Gelss2019mindy, gurevich2019robust, beetham2020formulating, Reinbold2020pre, schmelzer2020discovery, suri2020capturing, beetham2021sparse, reinbold2021robust].

SINDy autoencoders.

SINDy is most effective when the choice of the dictionary is informed by the structure of the problem. When the potential terms of the differential equations we seek are not known a priori, or if they include integral operators (e.g. non-local closures [bakarji2021data]), a naive implementation of SINDy gives sub-optimal results. A particularly versatile solution is to use autoencoders to transform the coordinates of the input variable before discovering a differential equation, as in Champion et al. [champion2019discovery]. The resulting SINDy autoencoder algorithm is summarized in Fig. 3. We will expand on this algorithm to address the problem of data-driven discovery of high-dimensional equations from low-dimensional measurements.

Figure 3: SINDy Autoencoder summary algorithm. and are encoder and decoder networks respectively. The network Jacobians and

are computed using automatic differentiation in tensorflow (i.e.


SINDy autoencoders simultaneously transform measurement data into a lower dimensional variable and discover a nonlinear differential equation in . The main element of the algorithm is an autoencoder that requires a reconstruction loss


In addition, the method incorporates the SINDy loss given in equation (6) in two different ways. First, the difference between the output of the SINDy layer,

, and that computed by the chain rule

, is minimized. This SINDy loss in is given by


Second, a consistency loss ensuring that can be reconstructed from using the path is included. Using the fact that

the resulting loss in is given by


Finally, a sparsity constraint

is included in the loss function to encourage the discovery of a parsimonious model. While the choice of these losses is sufficient for some high-dimensional data, the current problem requires more loss terms to account for delayed inputs.

3 Delay SINDy Autoencoders

We now leverage the SINDy autoencoders algorithm to learn coordinates and dynamics starting from a delay-embedding. The algorithm is extended by adding losses that take advantage of time-delay embedding properties.

Let be the embedding of discrete time-series data , given by the column vectors of the Hankel matrix in (5). Accordingly, is the -th entry of the embedding vector . Our approach relies on the following premises:

  1. Takens’ embedding theorem proves that a diffeomorphic map between the delay-embedded attractor and the original attractor exists under certain conditions. The theorem doesn’t specify how to find this map.

  2. The universal approximation properties of neural networks [csaji2001approximation] can approximate this diffeomorphism and others where the dynamics are simplified.

The goal is then to design a loss function whose optimal solution is a predictive model with a sparse analytical form. Ideally, the optimal sparse differential equation of the latent variable, , is similar to that of

when it is known. Finding the best loss function is matter of making the hypothesis class as small as possible and as close as possible to the optimal mappings in such a way to minimize the approximation and estimation errors. This is illustrated in Fig. 

4. We explore these premises in Sec. 4.

Figure 4: Illustration of the estimation and approximation errors in the hypothesis class.

In addition to the SINDy autoencoder losses in Sec. 2, we introduce three losses that take advantage of a priori knowledge of time-delay embedding properties, as illustrated in Fig. 5.

Loss 1.

First, the dimensionality reduction of can be performed using linear techniques, such as computing the SVD of the Hankel matrix in (5), or using nonlinear techniques, such as a deep autoencoder network. The diffeomorphism between and is expected to be highly nonlinear, which motivates the use of neural network. However, it is possible to perform an initial linear dimensionality reduction, using the first dominant modes of the Hankel matrix as an input to the autoencoder. This step can filter out noise and improve the computational efficiency of the method. If , then a linear projection of on an intermediate dimension

preserves the majority of the variance in the signal.

Figure 5:

Summary of delay SINDy Autoencoders algorithm. The tensor

denotes the row and column of the Hankel matrix .

In this case, the SVD projection and its inverse are included as part of the encoding and decoding of the time-series . We define to be the first columns of , and the first rows and columns of from the SVD of . The SINDy losses in and , given in the original SINDy autoencoder algorithm by (8) and (9) respectively, now become


where has been replaced by for consistency. Furthermore, the reconstruction loss is obtained from the difference between the feed-forward prediction and the input


Loss 2.

Second, we require the first component of the discovered latent variable to be an exact reconstruction of the input signal . This significantly constrains the hypothesis class by an -fold reduction of the number of permutations of the components of , to possibilities instead of . This reduction ultimately makes the optimization problem more feasible. The first component loss is simply given by


where is the -th delay embedding of and is the corresponding latent variable transformation.

In addition to favorably constraining the optimization, this loss function promotes a more physically interpretable solution. The variable typically has some physical meaning, as it is the quantity that is actually measured experimentally. Thus, fixing the first component of to be equal to guarantees that the dynamics in the latent space are directly relevant for the measured quantity .

Loss 3.

Finally, if the constraint in equation (12) is satisfied, the first dimension of the time integration of using the SINDy model, , should correspond to the vector . In Fig. 5, we set as an initial condition where is the initial time, and integrate it using the SINDy model with a time step that matches the sampling rate of the input data . The integration can be expressed as


The corresponding SINDy consistency loss is therefore given by


where the subscript of the parentheses designates the first component of , and for . The initial condition is not included in the loss to avoid redundancy with the loss. The integration procedure is included in the network as a recurrent layer with an RK4 integration scheme.

Finally, all the losses are combined to give


with weighting coefficients

that are hyperparameters to be optimized.

The proposed algorithm constrains the hypothesis class to make the inverse problem computationally feasible and the corresponding optimal solution predictive of the measurement data. However, the choice of the weights and the losses to include are not unique. Given more information about the system, such as conservation of momentum or energy, more losses can be included to guide the optimization algorithm to find the optimal transform within the hypothesis class.

4 Results

We demonstrate the ability of the proposed deep delay autoencoder network to discover coordinates and parsimonious models for a number of canonical dynamical systems. Simultaneously learning the coordinates and dynamics is a challenging inverse problem. To solve this inverse modeling problem, and to give insight into potential pitfalls, we break it down into the following sub-problems, of increasing difficulty:

  1. Optimize and (with and without an intermediate SVD layer) with known

    ; i.e. using supervised learning.

  2. Optimize and (with and without an SVD layer) with known analytical model , by fixing the coefficient matrix .

  3. Optimize , , and , initializing in the proximity of its original value in a known system of equations.

  4. Optimize , , and with random initialization.

The challenge of solving these optimization problems, in increasing order of difficulty, is due to the size of the hypothesis class which is spanned by the fitting parameters of , , and , as illustrated in Fig. 4. Increasing the complexity of the hypothesis decreases the approximation error by capturing the nonlinear diffeomophisms and . In the chaotic systems examples used in this section, we expect to be nonlinear, with at least second order polynomial features. However, expanding the hypothesis class makes the optimization task more difficult, thus increasing the estimation error that arises because of sub-optimal fitting (e.g. local minimum, over-fitting, lack of data etc.). Accordingly, by assuming some fitting functions to be known prior to optimization, we slowly increase the size of the hypothesis class to explore its robustness and generalization.

We apply the full algorithm to the following systems: the Lorenz system, the Lotka-Volterra equations, the Rössler attractor, and a video of the Lorenz waterwheel. In the following examples, the first component of is used for the partial measurement . However, not all scalar measurements of are equally fit to discover the mappings and . For example, measurement of the third component of the Lorenz system does not satisfy Takens’ embedding theory, and therefore, the delay-embedded attractor is not diffeomorphic to the full-state attractor . Furthermore, the original full-state system may not be the only coordinate system that admits a sparse representation of the dynamics, and indeed there may be even sparser representations, as we will show for the Lorenz system. Therefore, the maps and and the dynamics may not be unique, which we will discuss in the following test cases.

The delay embedding dimension and the time step are chosen to satisfy the equality , such that the delay embedding of is most unfolded in the embedding phase space (see Appendix A); this choice is also consistent with that from HAVOK [brunton2017chaos]. Along with and , the hyperparameters over which we optimize , , and are the SVD rank dimension , the dimension of the latent variable

, the architecture of the neural network (number of nodes, number of layers, activation function etc.), and the SINDy library (polynomial order, trigonometric features, threshold value, etc.). The high dimensionality of the hyperparameter space adds to the challenge of the optimization problem, which we address in the current study by running a hyperparameter search on parallel GPU clusters.

4.1 The Lorenz System

As a benchmark problem, we investigate the three dimensional Lorenz system, which arises in atmospheric modeling and is often used to study chaos and nonlinear dynamics. The system of equations for is given by


where one typically assumes , , and are positive. The system is known to be chaotic for , , and . In this experiment, we use simulation data of the first component as the measurement , which is input to the neural network. The goal is to find a coordinate transformation from the delay-embedded coordinates to a new coordinate system where the dynamics have a sparse, closed-form representation.

Figure 6 shows that sub-problems 1 and 2 can be solved with high accuracy, when either the full state or the dynamics are known, respectively. For small , sub-problem 2 has a large error for large amplitudes of where the trajectory switches lobes. This example also explores performance with and without the use of an intermediate SVD step.

Figure 6: Learning an autoencoder map from delay embedding to the full-state attractor, either with a known (supervised) or with a known full-state model (known equation). We investigate the performance with and without using an intermediate SVD layer. The parameters are , , and , using hidden layers in both the encoder and decoder. The computational advantage of using SVD comes at the cost of lower prediction accuracy.

The diffeomorphic transformations and are approximated in consecutive nonlinear stages through the hidden layers of the autoencoder. Figure 7

illustrates the evolution of the attractor as a function of network layers (column-wise) and training epochs (row-wise) when the analytical model is known and fixed in the SINDy coefficients

. We use the three dominant modes of every layer for visualization and initialize the network to map to , where are the three dominant right singular vector modes of the Hankel matrix . This initialization of and

results in a solution that is closer to the optimal solution than random initialization, particularly by making them smooth functions. This is a form of transfer learning in the context of data-driven modeling.

Figure 7: Evolution of the attractor as a function of epochs and deep network layers for the known equation case through the encoder. The three linear dominant modes of every network layer are shown for visualization. Columns correspond to layers row to selected epochs. At epoch 0, the network is initialized to map the embedded data to the three dominant modes of at the output layer.

Initializations of

with large coefficient perturbations (sampled from a normal distribution

) around the original coefficients of consistently lead to Lorenz-like chaotic attractors with a two-lobe structure. However, the convergence of the algorithm for random initializations of is sensitive to hyperparameters and does not always give rise to a model that has a similar butterfly structure as the original Lorenz attractor. This sporadic convergence indicates that the optimizer is prone to over-fitting and getting stuck in local minima, an issue that is commonly observed in deep learning. We find that applying a separate SINDy optimization on every epochs (with ) improves sparsity and efficiency significantly, while nudging the fitting parameters just enough to avoid local minima. For a given optimal set of weighting coefficients , we discover a sparse system of equations


whose corresponding attractor is similar to the original system, as shown in Fig. 9

. The discovered latent representation of the time-delay embedded data is a almost a scaling and translation transform of the original attractor. This discrepancy is expected from a method that considers linear transforms and rotation of a given dynamical systems to be equally valid diffeomorphic solutions. Note that this Lorenz-like system is more sparse than the original system of equations, containing 6 instead of 7 terms, but with 4 nonlinear terms instead of 2. The coefficients of equation

17 also appear to be involve 3 system parameters (, and ), similar to the original system with parameters , and .

Figure 8: Coefficients evolution as a function of epochs, where the dashed lines are the coefficients of the original system and the solid lines are the evolution of the discovered coefficients (left), loss and reconstruction loss as a function of epochs (bottom right), and input reconstructions as a function of time (top right).

Figure  8 shows the evolution of the coefficients as a function of time, the autoencoder reconstruction comparing and (optimized through equation  (11)), and a comparison between the first components of and (optimized through equation  (12)). The input is nearly perfectly reconstructed at the output of the autoencoder and in the first dimension of the latent variable, with loss below . The coefficients of the discovered model diverge from the original coefficients, although with a similar two-lobe attractor dynamics.

Figure 9: Results for Lorenz, Rössler, and Lotka Volterra systems.

4.2 Other systems

We now test the algorithm on the Lotka-Volterra model and the Rössler system, as well as on a challenging example of a video of a chaotic waterwheel experiment.

Lotka Volterra.

The Lotka-Volterra equations describe the interaction of multiple species with a predator-prey relationship. In the simplest case, a system of equations that quantifies the number of a predator and prey is given by


where , , , and are parameters that depend on the death rate and reproduction rate of both species. In this case, the attractor is a family of limit cycles that depend on the input parameters and the initial condition. The discovered models are structurally similar as shown in Fig. 9.

Rössler system.

The Rössler system is a three-dimensional system of differential equations, commonly used for studying chaotic systems, given by


where , are input parameters. The attractor is known for its intermittently bursting third dimension. Figure 9 shows that the discovered model is less sparse than the original system; however, the bursting effect in the third dimension is accurately captured.

Figure 10: Lorenz waterwheel discovery from video [lorenzww_video]

. An OpenCV line detection algorithm is used to find the angle of the spokes and a k-means algorithm is used for clustering lines that detect the same spoke. The angles of the lines are post-processed to deduce one angular velocity

for each frame, which is used as an input measurement variable.

Chaotic waterwheel.

Finally, we use video footage of the Lorenz waterwheel experiment to discover a latent representation and its corresponding system of equations, as shown in Fig. 10. The video is encoded into a time series using the Hough transform to detect the spokes of the wheel, a k-means clustering algorithm to group lines that identify the same spoke, which are post-processed to calculate the angular velocity of the wheel. A latent representation of the Lorenz-like dynamics of the angular velocity is discovered by the delay SINDy autoencoder, and third degree polynomial features are required in the model for chaotic dynamics (see Appendix B). The rough attractor shape and chaotic dynamics are both reproduced. The evolution of the losses as a function of epochs is shown in Fig. 11.

Figure 11: Evolution of the losses as a function of epochs for the Lorenz waterwheel case.

5 Discussion

This work has shown that it is possible to discover a sparse, interpretable, nonlinear dynamical system from incomplete or partial measurements using a combination of data augmentation through time-delay embedding, nonlinear dimensionality reduction and coordinate transformation with deep neural network autoencoders, and model identification with SINDy. Thus, this work begins to resolve a forty year old open challenge, which began with Takens’ embedding theorem, to explicitly learn a diffeomorphism back from delay coordinates to a coordinate system where the dynamics may be efficiently represented. We also highlight the importance of choosing the proper hypothesis class, as constrained by the relevant loss function. While the resulting analytical reconstructions do not always match the original system from which the data was generated, the discovered latent variables and their underlying models are sparse, sharing key structural features with the original system when it is known. Surprisingly, we discover Lorenz-like models that are more sparse than the original Lorenz equations from 1D measurements. Our models have only six terms, while the original Lorenz equations have seven terms; however, additional sparsity comes at the cost of a higher number of quadratic terms. The method’s ability to discover models that are qualitatively and quantitatively similar to the original system, when the differential equation is initialized with perturbations around the original coefficients, suggests that the optimizer is prone to over-fitting and becoming stuck in local minima, resulting in large estimation errors. Designing and adding appropriate regularization terms to the loss function to solve this issue remains an open question and is the subject of further investigation.

It is unknown whether or not there exists a loss function for which the original system of equations can be exactly recovered from one-dimensional measurements and random fitting parameter initialization. There is always the possibility of adding more physics-informed constraints to the loss function, such as known problem-specific symmetries, to help the optimizer find the desired model. Furthermore, the diffeomorphic transform between the delay embedding and the full-state attractor can be further enforced by using a bijective neural network, i.e. an invertible neural network, that honors the one-to-one map by construction. In addition, the SINDy algorithm used in this study is relatively basic in its use of the loss for enforcing sparsity, and many improvements have been suggested in the past few years [deSilva2020JOSS, kaptanoglu2021pysindy], which can be integrated into the proposed delay SINDy autoencoder framework. Particularly, using a probabilistic [fasel2021ensemble] or a weak [Schaeffer2017prsa, gurevich2019robust, Reinbold2020pre, reinbold2021robust] SINDy algorithm would certainly improve on the results obtained from real-world noisy data such as the Lorenz waterwheel video.

Crucially, this study proposes a general framework for discovering multi-dimensional models from partial, low-dimensional data, motivating many theoretical and numerical investigations into the combined use of time-delay embedding, SINDy, and autoencoders for data-driven modeling. Future work will involve exploring various deep network architectures, such as convolutional autoencoders, for discovering physics from high-dimensional video data.


The authors acknowledge support from the Army Research Office (ARO W911NF-19-1-0045) and the National Science Foundation AI Institute in Dynamic Systems (Grant No. 2112085).

Appendix A Delay embedding hyperparameters

The time delay increment , where is the sample time and is the number of delays included in the Hankel matrix, is typically chosen to make the attractor occupy (or “unfold” in) as much of the embedded phase space as possible. Methods for choosing using measures of autocorrelation or mutual information [fraser1989information] to maximize the phase-space unfolding have been studied extensively [kennel1992method, ma2006selection, wallot2018calculation]. Sampling strategies have also been studied for discovering physics from data [champion2019data]. In this study, we choose the embedding dimension and the time delay increment to match that chosen in HAVOK [brunton2017chaos]. A comparison of the unfolding of the attractor based on alone in figure 12 shows that the optimal unfolding is for .

[trim=0 0 0 40, clip,width=14cm]figures/effect_of_tau.pdf

Figure 12: The delay time increment is chosen to optimally unfold the attractor in the embedding space. In the case of the Lorenz attractor, seems to be the most appropriate choice.

When finding the dominant modes of the Hankel matrix, similar considerations for the dimension have to be taken into account. Figure 13 shows that the first three dominant modes are representative of the majority of the variance in the data. This helps guide the choice of the latent dimension when it is not known a priori. More generally, standard techniques for determining the number of autoencoder latent variables may be used before adding the SINDy losses.

Figure 13: The first 3 SVD modes of the Hankel matrix reflect the two-lobe structure and capture a majority of the variance.

It is reasonable to ask if the the learning of coordinates and identification of models can be separated into sequential learning steps. Fig. 14 shows the latent representation from an autoencoder without SINDy constraints. Because the embedding is non-unique, the autoencoder will typically fail to capture essential symmetries from the original system, making it difficult or impossible to achieve sparse models, thus motivating the combined SINDy-autoencoder approach.

Figure 14: Autoencoder latent representation with only a reconstruction loss (without SINDy). Error is . This coordinate system is unlikely to yield a sparse model.

Appendix B Lorenz waterwheel model

The discovered Lorenz waterwheel attractor shown in Fig. 10 has the same two-lobe structure as the original Lorenz system. However, the sparsity of the data, the measurement noise that arises from the line detection algorithm, the small frame rate of the video (24 frames/second) and its limited time span make the sparse identification of the differential equation a challenging task. After optimizing over hyperparameters, a third order polynomial dictionary (with and a layer encoder/decoder) is found to recreate the chaotic dynamics of the Lorenz system. The discovered coefficients are shown in table 1, where the differential equation is written as


A sparser and more generalizing analytical model can most likely be found by collecting more data and improving the algorithm that maps the video to the angular velocity. In future studies, we will explore machine learning techniques (e.g. using a CNN) to incorporate the video to angular velocity mapping as part of the SINDy autoencoder algorithm.

-0.12 1.70 0.04 -1.79 0.16 -0.02 0.03 -0.37
-1.05 0.05 -0.28 0.12 1.50 0.60 -0.10 0.18 -0.33
0.08 -0.12 0.30 0.07 0.08 -0.29 -0.22 0.35 -0.55 0.06 -0.07 0.20 0.03 -0.25 0.28
Table 1: Discovered coefficients of the Lorenz waterwheel system.