1 Introduction
Scientific progress has been driven by the discovery of simple and predictive mathematical models from observations. Interpretable, parsimonious governing equations have been especially valuable as they typically have allowed for greater engineering insight, simple parametrizations, and improved extrapolation capabilities. Increasingly, these models are learned from data using machine learning algorithms, such as genetic programming
[schmidt2009distilling, Bongard2007pnas, Schmidt2011pb, Daniels2015naturecomm] and the sparse identification of nonlinear dynamics (SINDy) [brunton2016discovering, rudy2017data]. When only partial observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to formulate a closedform model in these variables, using either analytic or datadriven techniques. Timedelay embedding provides an approach to augment these limited measurements, and under certain conditions, given by Takens’ embedding theorem [takens1981detecting], the delayaugmented state yields an attractor that is diffeomorphic (i.e. continuously differentiably deformable) to the underlying, though unmeasured, fullstate attractor. However, the coordinate transformation back to the original attractor is unknown and may be arbitrarily complex to represent. Learning interpretable and generalizable dynamical systems models in the embedding space has remained an open challenge since Takens introduced the embedding theorem in 1981 [takens1981detecting], with several promising recent developments [Daniels2015naturecomm, somacal2020uncovering, atkinson2020bayesian, ribera2021model]. The central goal of this work is to learn parsimonious, closedform models from partially observed systems, leveraging the SINDy algorithm for model discovery and a custom deep autoencoder to learn an appropriate coordinate transformation from a timedelay embedding.Timedelay embedding has a rich history in datadriven modeling [roux1983observation, Broomhead1986physd, Crutchfield1987cs, Broomhead1989prsla, kennel1992method, deyle2011generalized, Sugihara2012science, Giannakis2012pnas, Billings2013book, ye2015distinguishing, Ye2015pnas, brunton2017chaos, Loiseau2018jfm, pan2018data, das2019delay, Giannakis2019acha, runge2019inferring, zou2019complex, pan2020structure, giannakis2020delay], especially for making predictions when analytical models are not available. Notable applications of the theory include distinguishing noise from chaotic dynamics [roux1983observation, kennel1992method], and recent work has shown that delay embedding can improve machine learning models of timeseries data [lee2019linking, Wehmeyer2018jcp]. Numerous attempts have been made to learn dynamical systems models directly in the delay embedded space. For example, when SINDy was applied to a timedelay embedding for chaotic systems, it was shown that the simplest model that describes the data is linear, establishing a strong connection to Koopman operator theory [brunton2017chaos]. This connection between delay embedding and Koopman has a longer history [mezic2004comparison, Susuki2015cdc, brunton2017chaos, arbabi2017ergodic, champion2019discovery, das2019delay, Giannakis2019acha, pan2020structure, Kamb2020siads, hirsh2021structured], including the use of dynamic mode decomposition (DMD) [Schmid2010jfm, kutz2016dynamic] on timedelay coordinates [tu2014dynamic, brunton2016extracting]. However, for chaotic systems, these linear models in delay coordinates suffer from closure issues and fall short of the goal of parsimonious nonlinear models. In related work, attempts have been made to extend SINDy to partially measured systems, using the variational annealing data assimilation technique [ribera2021model] or applying SINDy to higher derivatives of the measured variables [somacal2020uncovering], which is closely related to timedelay embedding.
Because the data in the delayembedded space is diffeomorphic to the fullstate attractor, where a parsimonious nonlinear model often exists, it is reasonable to try and learn a coordinate transformation starting from a delayembedded state. Figure 1
shows a schematic of the modeling task, involving learning a coordinate transform into a space where an analytical model may be obtained. Deep neural network autoencoders have shown great promise in representing arbitrarily complex coordinate transformations, especially for dynamical systems
[Takeishi2017nips, lusch2018deep, Yeung2017arxiv, Mardt2018natcomm, Wehmeyer2018jcp, champion2019data, Otto2019siads, linot2020deep, gilpin2020deep, lee2020model, kalia2021learning]. Importantly, Gilpin [gilpin2020deep]introduced a false nearest neighbors loss function in a deep neural network to promote embedded attractors that are similar to the fullstate attractor. Similarly, Wehmeyer and Noé
[Wehmeyer2018jcp] showed that autoencoders may be combined with timedelayed measurements to learn effective slow reaction coordinates for molecular kinetics. However, neither of these studies learned parsimonious models in the autoencoder coordinates. Champion et al. [champion2019data] showed that it is possible to combine autoencoder networks with SINDy to simultaneously learn a nonlinear coordinate transformation and discover a parsimonious dynamics model, making this a promising technique for learning models in delayembedded coordinates.In this work, we leverage the SINDyautoencoder framework [champion2019data] to learn parsimonious governing dynamical systems from delayembedding of partial observations, as summarized in Fig. 2. In general, model discovery from fullstate measurements is an illposed inverse problem, and model discovery from limited measurements compounds this challenge. As with other custom autoencoders that are used to learn dynamics [lusch2018deep, champion2019data]
, there are several critical steps, including choosing the architecture, designing the loss function, initializing the parameters, and tuning the optimization algorithm used to train the model. In addition to demonstrating this algorithm to learn closedform models from partial observations of several dynamical systems, we describe how this inverse modeling problem may be broken down into subproblems. In general, starting with a problem where the answer is known is important to guide the design of the architecture and loss functions. Thus, we show how the architecture and individual terms in the loss function affect the learning process with different levels of assumed knowledge. We also investigate what preprocessing should be applied to the data, such as whether or not to apply the singular value decomposition to the delay coordinates before the autoencoder network. Incredibly, we show that for the Lorenz system, from a partial observation of only the first coordinate, it is possible to learn a coordinate system in which the dynamics may be represented by a differential equation with even fewer terms than the original Lorenz system; the original system has 7 linear and quadratic terms, while our model only has 6 terms. We also show that it is possible to learn a Lorenz analogue from a video sequence of a chaotic waterwheel experiment.
The rest of the paper is organized as follows. First, we provide a brief problem definition in Sec. 1.1. Section 2 has background information about SINDy and how it has been combined with autoencoders. Section 3 discusses the proposed delay SINDy autoencoder, including the architecture and custom loss functions. The main results from numerical experiments are provided in Sec. 4. We analyze several canonical nonlinear and chaotic systems, in addition to video data from a chaotic waterwheel experiment that may be viewed as a Lorenz analogue. Finally, conclusions and discussions are provided in Sec. 5
1.1 Mathematical problem definition
An overarching inverse problem in datadriven dynamical systems is to learn a model for the evolution of a system, purely from observation data. Mathematically, this inverse problem can be stated as follows. Let , an
dimensional timedependent state vector defined on the domain
, with and , be the solution of a nonlinear differential equation(1) 
where is a smooth and nonlinear function and a vector of system parameters. Further, let represent noisy measurements of the system, given by
(2) 
where is a noise process. The goal is then to learn an approximate dynamical system
(3) 
in terms of a new state , which may either be the measured state or an invertible function of :
(4) 
If observation data for the full state are available, so that and , an approximation of can be inferred using a variety of machine learning algorithms. Although neural networks have been shown to provide useful predictive models [pathak2018model, vlachas2018data, wu2018deep, raissi2019physics, mardt2020deep], they don’t result in interpretable, closedform expressions such as (3) that are amenable to analysis. Parsimonious modeling techniques, such as genetic programming [schmidt2009distilling, Bongard2007pnas] and SINDy [brunton2016discovering, rudy2017data], are able to learn analytic, closedform models that often capture the exact form of the original dynamics (1) when they are known.
However, in many real world applications, only partial measurements are available, so the dimension of is less than the dimension of , i.e. . In general, it is not possible to learn directly in its original dimension in terms of the measurement vector . A direct approach of fitting a recurrent relation for
, using recurrent neural networks, Bayesian methods, Gaussian process regression, etc., or discovering the underlying differential equation using SINDy, do not generalize well if the dynamics of
are nonlinear and depend on hidden variables that control its evolution. The challenge of representing the dynamics in terms of a partial observation of the fullstate is known as the closure problem, and it is one of the most fundamental issues in scientific modeling, with a particularly rich history in turbulence modeling [ling2016reynolds, pan2018data, duraisamy2019turbulence, brunton2020machine].There are several embedding techniques [Coifman2008mmas, Yair2017pnas, brunton2017chaos, gilpin2020deep] to enrich the partial measurement vector, providing a map as in (4), although it remains a challenge to learn the dynamics (3) in these coordinates. Therefore, in general, it is necessary to simultaneously learn the coordinate transformation (4) and the dynamics (3). For simplicity, we will assume scalar measurements , although the following arguments generalize to multidimensional vectors of partial measurements. Before learning the map (4), we first timedelay embed the measurement , resulting in the intermediate vector , where and are the number of delays and the time increments between successive measurements respectively. Takens’ theorem [takens1981detecting] provides conditions for when timedelay embedding results in an attractor that is diffeomorphic to the original system. We then simultaneously learn the map (4) from and the dynamics (3) using a modification of the SINDyautoencoder framework [champion2019discovery]. The goal is an interpretable differential equation in the transformed embedded variable. Takens’ embedding theorem also puts a condition on the embedding dimension to be for the diffeomorphism to be guaranteed.
2 Background
In this section, we review timedelay embedding, the sparse identification of nonlinear dynamics (SINDy) [brunton2016discovering] algorithm, and the recent combination of SINDy with deep autoencoders [champion2019discovery].
Delay embedding.
When defining the delay embedding variable of a measurement time series , the proper choice of and have been studied extensively [ma2006selection, kennel1992determining] (see appendix A). The embedded variable can be assembled into a Hankel matrix
(5) 
The corresponding eigentimedelays are given by the columns of
from the singular value decomposition (SVD)
, and they are used for linear system identification [Juang1985jgcd, Juang1991nasatm, Phan:1992, Juang1994book, Brunton2019book] and in the Hankel alternative view of Koopman (HAVOK) algorithm [brunton2017chaos]. The first few eigentimedelays, whose rank can be chosen based on a Pareto front analysis, are often sufficient to encode the shape of a given signal, as shown in Fig. 13.SINDy.
The SINDy algorithm frames the problem of discovering the model in (3
) as a sparse generalized linear regression
[brunton2016discovering]. SINDy assumes a library of candidate functions that determine possible terms in . Given discretetime samples of and , these may be cast into data matrices and , where , . Assuming candidate functions, with , it is possible to build the library data matrix . The SINDy algorithm then minimizes the loss(6) 
where the sparse matrix determines which candidate terms are active in the dynamics, so that . The sparsity constraint minimizes the number of nonzero terms in , but it requires an intractable combinatorial search. Thus, it is either replaced by or a sequentially thresholded leastsquare algorithm for computational efficiency. A sparse regularizer is motivated by Occam’s razor, which prefers simpler and more interpretable models. SINDy has been widely applied to model a range of complex systems [Schaeffer2017prsa, boninsegna2018sparse, Kaiser2018prsa, Loiseau2017jfm, Brunton2019book, Gelss2019mindy, gurevich2019robust, beetham2020formulating, Reinbold2020pre, schmelzer2020discovery, suri2020capturing, beetham2021sparse, reinbold2021robust].
SINDy autoencoders.
SINDy is most effective when the choice of the dictionary is informed by the structure of the problem. When the potential terms of the differential equations we seek are not known a priori, or if they include integral operators (e.g. nonlocal closures [bakarji2021data]), a naive implementation of SINDy gives suboptimal results. A particularly versatile solution is to use autoencoders to transform the coordinates of the input variable before discovering a differential equation, as in Champion et al. [champion2019discovery]. The resulting SINDy autoencoder algorithm is summarized in Fig. 3. We will expand on this algorithm to address the problem of datadriven discovery of highdimensional equations from lowdimensional measurements.
SINDy autoencoders simultaneously transform measurement data into a lower dimensional variable and discover a nonlinear differential equation in . The main element of the algorithm is an autoencoder that requires a reconstruction loss
(7) 
In addition, the method incorporates the SINDy loss given in equation (6) in two different ways. First, the difference between the output of the SINDy layer,
, and that computed by the chain rule
, is minimized. This SINDy loss in is given by(8) 
Second, a consistency loss ensuring that can be reconstructed from using the path is included. Using the fact that
the resulting loss in is given by
(9) 
Finally, a sparsity constraint
is included in the loss function to encourage the discovery of a parsimonious model. While the choice of these losses is sufficient for some highdimensional data, the current problem requires more loss terms to account for delayed inputs.
3 Delay SINDy Autoencoders
We now leverage the SINDy autoencoders algorithm to learn coordinates and dynamics starting from a delayembedding. The algorithm is extended by adding losses that take advantage of timedelay embedding properties.
Let be the embedding of discrete timeseries data , given by the column vectors of the Hankel matrix in (5). Accordingly, is the th entry of the embedding vector . Our approach relies on the following premises:

Takens’ embedding theorem proves that a diffeomorphic map between the delayembedded attractor and the original attractor exists under certain conditions. The theorem doesn’t specify how to find this map.

The universal approximation properties of neural networks [csaji2001approximation] can approximate this diffeomorphism and others where the dynamics are simplified.
The goal is then to design a loss function whose optimal solution is a predictive model with a sparse analytical form. Ideally, the optimal sparse differential equation of the latent variable, , is similar to that of
when it is known. Finding the best loss function is matter of making the hypothesis class as small as possible and as close as possible to the optimal mappings in such a way to minimize the approximation and estimation errors. This is illustrated in Fig.
4. We explore these premises in Sec. 4.In addition to the SINDy autoencoder losses in Sec. 2, we introduce three losses that take advantage of a priori knowledge of timedelay embedding properties, as illustrated in Fig. 5.
Loss 1.
First, the dimensionality reduction of can be performed using linear techniques, such as computing the SVD of the Hankel matrix in (5), or using nonlinear techniques, such as a deep autoencoder network. The diffeomorphism between and is expected to be highly nonlinear, which motivates the use of neural network. However, it is possible to perform an initial linear dimensionality reduction, using the first dominant modes of the Hankel matrix as an input to the autoencoder. This step can filter out noise and improve the computational efficiency of the method. If , then a linear projection of on an intermediate dimension
preserves the majority of the variance in the signal.
In this case, the SVD projection and its inverse are included as part of the encoding and decoding of the timeseries . We define to be the first columns of , and the first rows and columns of from the SVD of . The SINDy losses in and , given in the original SINDy autoencoder algorithm by (8) and (9) respectively, now become
(10a)  
(10b) 
where has been replaced by for consistency. Furthermore, the reconstruction loss is obtained from the difference between the feedforward prediction and the input
(11) 
Loss 2.
Second, we require the first component of the discovered latent variable to be an exact reconstruction of the input signal . This significantly constrains the hypothesis class by an fold reduction of the number of permutations of the components of , to possibilities instead of . This reduction ultimately makes the optimization problem more feasible. The first component loss is simply given by
(12) 
where is the th delay embedding of and is the corresponding latent variable transformation.
In addition to favorably constraining the optimization, this loss function promotes a more physically interpretable solution. The variable typically has some physical meaning, as it is the quantity that is actually measured experimentally. Thus, fixing the first component of to be equal to guarantees that the dynamics in the latent space are directly relevant for the measured quantity .
Loss 3.
Finally, if the constraint in equation (12) is satisfied, the first dimension of the time integration of using the SINDy model, , should correspond to the vector . In Fig. 5, we set as an initial condition where is the initial time, and integrate it using the SINDy model with a time step that matches the sampling rate of the input data . The integration can be expressed as
(13) 
The corresponding SINDy consistency loss is therefore given by
(14) 
where the subscript of the parentheses designates the first component of , and for . The initial condition is not included in the loss to avoid redundancy with the loss. The integration procedure is included in the network as a recurrent layer with an RK4 integration scheme.
Finally, all the losses are combined to give
(15) 
with weighting coefficients
that are hyperparameters to be optimized.
The proposed algorithm constrains the hypothesis class to make the inverse problem computationally feasible and the corresponding optimal solution predictive of the measurement data. However, the choice of the weights and the losses to include are not unique. Given more information about the system, such as conservation of momentum or energy, more losses can be included to guide the optimization algorithm to find the optimal transform within the hypothesis class.
4 Results
We demonstrate the ability of the proposed deep delay autoencoder network to discover coordinates and parsimonious models for a number of canonical dynamical systems. Simultaneously learning the coordinates and dynamics is a challenging inverse problem. To solve this inverse modeling problem, and to give insight into potential pitfalls, we break it down into the following subproblems, of increasing difficulty:

Optimize and (with and without an intermediate SVD layer) with known
; i.e. using supervised learning.

Optimize and (with and without an SVD layer) with known analytical model , by fixing the coefficient matrix .

Optimize , , and , initializing in the proximity of its original value in a known system of equations.

Optimize , , and with random initialization.
The challenge of solving these optimization problems, in increasing order of difficulty, is due to the size of the hypothesis class which is spanned by the fitting parameters of , , and , as illustrated in Fig. 4. Increasing the complexity of the hypothesis decreases the approximation error by capturing the nonlinear diffeomophisms and . In the chaotic systems examples used in this section, we expect to be nonlinear, with at least second order polynomial features. However, expanding the hypothesis class makes the optimization task more difficult, thus increasing the estimation error that arises because of suboptimal fitting (e.g. local minimum, overfitting, lack of data etc.). Accordingly, by assuming some fitting functions to be known prior to optimization, we slowly increase the size of the hypothesis class to explore its robustness and generalization.
We apply the full algorithm to the following systems: the Lorenz system, the LotkaVolterra equations, the Rössler attractor, and a video of the Lorenz waterwheel. In the following examples, the first component of is used for the partial measurement . However, not all scalar measurements of are equally fit to discover the mappings and . For example, measurement of the third component of the Lorenz system does not satisfy Takens’ embedding theory, and therefore, the delayembedded attractor is not diffeomorphic to the fullstate attractor . Furthermore, the original fullstate system may not be the only coordinate system that admits a sparse representation of the dynamics, and indeed there may be even sparser representations, as we will show for the Lorenz system. Therefore, the maps and and the dynamics may not be unique, which we will discuss in the following test cases.
The delay embedding dimension and the time step are chosen to satisfy the equality , such that the delay embedding of is most unfolded in the embedding phase space (see Appendix A); this choice is also consistent with that from HAVOK [brunton2017chaos]. Along with and , the hyperparameters over which we optimize , , and are the SVD rank dimension , the dimension of the latent variable
, the architecture of the neural network (number of nodes, number of layers, activation function etc.), and the SINDy library (polynomial order, trigonometric features, threshold value, etc.). The high dimensionality of the hyperparameter space adds to the challenge of the optimization problem, which we address in the current study by running a hyperparameter search on parallel GPU clusters.
4.1 The Lorenz System
As a benchmark problem, we investigate the three dimensional Lorenz system, which arises in atmospheric modeling and is often used to study chaos and nonlinear dynamics. The system of equations for is given by
(16a)  
(16b)  
(16c) 
where one typically assumes , , and are positive. The system is known to be chaotic for , , and . In this experiment, we use simulation data of the first component as the measurement , which is input to the neural network. The goal is to find a coordinate transformation from the delayembedded coordinates to a new coordinate system where the dynamics have a sparse, closedform representation.
Figure 6 shows that subproblems 1 and 2 can be solved with high accuracy, when either the full state or the dynamics are known, respectively. For small , subproblem 2 has a large error for large amplitudes of where the trajectory switches lobes. This example also explores performance with and without the use of an intermediate SVD step.
The diffeomorphic transformations and are approximated in consecutive nonlinear stages through the hidden layers of the autoencoder. Figure 7
illustrates the evolution of the attractor as a function of network layers (columnwise) and training epochs (rowwise) when the analytical model is known and fixed in the SINDy coefficients
. We use the three dominant modes of every layer for visualization and initialize the network to map to , where are the three dominant right singular vector modes of the Hankel matrix . This initialization of andresults in a solution that is closer to the optimal solution than random initialization, particularly by making them smooth functions. This is a form of transfer learning in the context of datadriven modeling.
Initializations of
with large coefficient perturbations (sampled from a normal distribution
) around the original coefficients of consistently lead to Lorenzlike chaotic attractors with a twolobe structure. However, the convergence of the algorithm for random initializations of is sensitive to hyperparameters and does not always give rise to a model that has a similar butterfly structure as the original Lorenz attractor. This sporadic convergence indicates that the optimizer is prone to overfitting and getting stuck in local minima, an issue that is commonly observed in deep learning. We find that applying a separate SINDy optimization on every epochs (with ) improves sparsity and efficiency significantly, while nudging the fitting parameters just enough to avoid local minima. For a given optimal set of weighting coefficients , we discover a sparse system of equations(17a)  
(17b)  
(17c) 
whose corresponding attractor is similar to the original system, as shown in Fig. 9
. The discovered latent representation of the timedelay embedded data is a almost a scaling and translation transform of the original attractor. This discrepancy is expected from a method that considers linear transforms and rotation of a given dynamical systems to be equally valid diffeomorphic solutions. Note that this Lorenzlike system is more sparse than the original system of equations, containing 6 instead of 7 terms, but with 4 nonlinear terms instead of 2. The coefficients of equation
17 also appear to be involve 3 system parameters (, and ), similar to the original system with parameters , and .Figure 8 shows the evolution of the coefficients as a function of time, the autoencoder reconstruction comparing and (optimized through equation (11)), and a comparison between the first components of and (optimized through equation (12)). The input is nearly perfectly reconstructed at the output of the autoencoder and in the first dimension of the latent variable, with loss below . The coefficients of the discovered model diverge from the original coefficients, although with a similar twolobe attractor dynamics.
4.2 Other systems
We now test the algorithm on the LotkaVolterra model and the Rössler system, as well as on a challenging example of a video of a chaotic waterwheel experiment.
Lotka Volterra.
The LotkaVolterra equations describe the interaction of multiple species with a predatorprey relationship. In the simplest case, a system of equations that quantifies the number of a predator and prey is given by
(18a)  
(18b) 
where , , , and are parameters that depend on the death rate and reproduction rate of both species. In this case, the attractor is a family of limit cycles that depend on the input parameters and the initial condition. The discovered models are structurally similar as shown in Fig. 9.
Rössler system.
The Rössler system is a threedimensional system of differential equations, commonly used for studying chaotic systems, given by
(19a)  
(19b)  
(19c) 
where , are input parameters. The attractor is known for its intermittently bursting third dimension. Figure 9 shows that the discovered model is less sparse than the original system; however, the bursting effect in the third dimension is accurately captured.
Chaotic waterwheel.
Finally, we use video footage of the Lorenz waterwheel experiment to discover a latent representation and its corresponding system of equations, as shown in Fig. 10. The video is encoded into a time series using the Hough transform to detect the spokes of the wheel, a kmeans clustering algorithm to group lines that identify the same spoke, which are postprocessed to calculate the angular velocity of the wheel. A latent representation of the Lorenzlike dynamics of the angular velocity is discovered by the delay SINDy autoencoder, and third degree polynomial features are required in the model for chaotic dynamics (see Appendix B). The rough attractor shape and chaotic dynamics are both reproduced. The evolution of the losses as a function of epochs is shown in Fig. 11.
5 Discussion
This work has shown that it is possible to discover a sparse, interpretable, nonlinear dynamical system from incomplete or partial measurements using a combination of data augmentation through timedelay embedding, nonlinear dimensionality reduction and coordinate transformation with deep neural network autoencoders, and model identification with SINDy. Thus, this work begins to resolve a forty year old open challenge, which began with Takens’ embedding theorem, to explicitly learn a diffeomorphism back from delay coordinates to a coordinate system where the dynamics may be efficiently represented. We also highlight the importance of choosing the proper hypothesis class, as constrained by the relevant loss function. While the resulting analytical reconstructions do not always match the original system from which the data was generated, the discovered latent variables and their underlying models are sparse, sharing key structural features with the original system when it is known. Surprisingly, we discover Lorenzlike models that are more sparse than the original Lorenz equations from 1D measurements. Our models have only six terms, while the original Lorenz equations have seven terms; however, additional sparsity comes at the cost of a higher number of quadratic terms. The method’s ability to discover models that are qualitatively and quantitatively similar to the original system, when the differential equation is initialized with perturbations around the original coefficients, suggests that the optimizer is prone to overfitting and becoming stuck in local minima, resulting in large estimation errors. Designing and adding appropriate regularization terms to the loss function to solve this issue remains an open question and is the subject of further investigation.
It is unknown whether or not there exists a loss function for which the original system of equations can be exactly recovered from onedimensional measurements and random fitting parameter initialization. There is always the possibility of adding more physicsinformed constraints to the loss function, such as known problemspecific symmetries, to help the optimizer find the desired model. Furthermore, the diffeomorphic transform between the delay embedding and the fullstate attractor can be further enforced by using a bijective neural network, i.e. an invertible neural network, that honors the onetoone map by construction. In addition, the SINDy algorithm used in this study is relatively basic in its use of the loss for enforcing sparsity, and many improvements have been suggested in the past few years [deSilva2020JOSS, kaptanoglu2021pysindy], which can be integrated into the proposed delay SINDy autoencoder framework. Particularly, using a probabilistic [fasel2021ensemble] or a weak [Schaeffer2017prsa, gurevich2019robust, Reinbold2020pre, reinbold2021robust] SINDy algorithm would certainly improve on the results obtained from realworld noisy data such as the Lorenz waterwheel video.
Crucially, this study proposes a general framework for discovering multidimensional models from partial, lowdimensional data, motivating many theoretical and numerical investigations into the combined use of timedelay embedding, SINDy, and autoencoders for datadriven modeling. Future work will involve exploring various deep network architectures, such as convolutional autoencoders, for discovering physics from highdimensional video data.
Acknowledgments
The authors acknowledge support from the Army Research Office (ARO W911NF1910045) and the National Science Foundation AI Institute in Dynamic Systems (Grant No. 2112085).
Appendix A Delay embedding hyperparameters
The time delay increment , where is the sample time and is the number of delays included in the Hankel matrix, is typically chosen to make the attractor occupy (or “unfold” in) as much of the embedded phase space as possible. Methods for choosing using measures of autocorrelation or mutual information [fraser1989information] to maximize the phasespace unfolding have been studied extensively [kennel1992method, ma2006selection, wallot2018calculation]. Sampling strategies have also been studied for discovering physics from data [champion2019data]. In this study, we choose the embedding dimension and the time delay increment to match that chosen in HAVOK [brunton2017chaos]. A comparison of the unfolding of the attractor based on alone in figure 12 shows that the optimal unfolding is for .
When finding the dominant modes of the Hankel matrix, similar considerations for the dimension have to be taken into account. Figure 13 shows that the first three dominant modes are representative of the majority of the variance in the data. This helps guide the choice of the latent dimension when it is not known a priori. More generally, standard techniques for determining the number of autoencoder latent variables may be used before adding the SINDy losses.
It is reasonable to ask if the the learning of coordinates and identification of models can be separated into sequential learning steps. Fig. 14 shows the latent representation from an autoencoder without SINDy constraints. Because the embedding is nonunique, the autoencoder will typically fail to capture essential symmetries from the original system, making it difficult or impossible to achieve sparse models, thus motivating the combined SINDyautoencoder approach.
Appendix B Lorenz waterwheel model
The discovered Lorenz waterwheel attractor shown in Fig. 10 has the same twolobe structure as the original Lorenz system. However, the sparsity of the data, the measurement noise that arises from the line detection algorithm, the small frame rate of the video (24 frames/second) and its limited time span make the sparse identification of the differential equation a challenging task. After optimizing over hyperparameters, a third order polynomial dictionary (with and a layer encoder/decoder) is found to recreate the chaotic dynamics of the Lorenz system. The discovered coefficients are shown in table 1, where the differential equation is written as
(20) 
A sparser and more generalizing analytical model can most likely be found by collecting more data and improving the algorithm that maps the video to the angular velocity. In future studies, we will explore machine learning techniques (e.g. using a CNN) to incorporate the video to angular velocity mapping as part of the SINDy autoencoder algorithm.
0.12  1.70  0.04  1.79  0.16  0.02  0.03  0.37  
1.05  0.05  0.28  0.12  1.50  0.60  0.10  0.18  0.33  
0.08  0.12  0.30  0.07  0.08  0.29  0.22  0.35  0.55  0.06  0.07  0.20  0.03  0.25  0.28 
Comments
There are no comments yet.