Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data

by   Shaowu Pan, et al.
University of Washington

High-dimensional spatio-temporal dynamics can often be encoded in a low-dimensional subspace. Engineering applications for modeling, characterization, design, and control of such large-scale systems often rely on dimensionality reduction to make solutions computationally tractable in real-time. Common existing paradigms for dimensionality reduction include linear methods, such as the singular value decomposition (SVD), and nonlinear methods, such as variants of convolutional autoencoders (CAE). However, these encoding techniques lack the ability to efficiently represent the complexity associated with spatio-temporal data, which often requires variable geometry, non-uniform grid resolution, adaptive meshing, and/or parametric dependencies. To resolve these practical engineering challenges, we propose a general framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF consists of two modified multilayer perceptrons (MLPs): (i) ShapeNet, which isolates and represents the spatial complexity, and (ii) ParameterNet, which accounts for any other input complexity, including parametric dependencies, time, and sensor measurements. We demonstrate the utility of NIF for parametric surrogate modeling, enabling the interpretable representation and compression of complex spatio-temporal dynamics, efficient many-spatial-query tasks, and improved generalization performance for sparse reconstruction.


page 10

page 12

page 15

page 19

page 23

page 30

page 32

page 33


An Ensemble Quadratic Echo State Network for Nonlinear Spatio-Temporal Forecasting

Spatio-temporal data and processes are prevalent across a wide variety o...

Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition

We propose an approximation to the forward-filter-backward-sampler (FFBS...

Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification

We consider the application of KronPCA spatio-temporal modeling techniqu...

FastSVD-ML-ROM: A Reduced-Order Modeling Framework based on Machine Learning for Real-Time Applications

Digital twins have emerged as a key technology for optimizing the perfor...

Exploring the Generalizability of Spatio-Temporal Crowd Flow Prediction: Meta-Modeling and an Analytic Framework

The Spatio-Temporal Crowd Flow Prediction (STCFP) problem is a classical...

Enhancing Dynamic Mode Decomposition Workflow with In-Situ Visualization and Data Compression

Modern computational science and engineering applications are being impr...

Cortical spatio-temporal dimensionality reduction for visual grouping

The visual systems of many mammals, including humans, is able to integra...

Code Repositories

1 Introduction

Machine learning and artificial intelligence algorithms have broadly transformed science and engineering, including the application areas of computer vision (Krizhevsky et al., 2012)

, natural language processing

(Sutskever et al., 2014), molecular dynamics (Zhang et al., 2018; Mardt et al., 2018), and dynamical systems (Brunton and Kutz, 2019). The subfield of scientific machine learning, which often focuses on modeling, characterization, design, and control of large-scale, physics-based models, has also experienced significant growth. Despite the achievements in scientific machine learning (Duraisamy et al., 2019; Karniadakis et al., 2021; Kutz, 2017; Brunton et al., 2020), there remain significant challenges in the representation of high-dimensional spatio-temporal dynamics, which are often modeled by nonlinear partial differential equations (PDEs). Data-driven modeling of PDE systems often relies on a more advantageous representation of the physics. In general, manifold-based methods are a dominant paradigm (Hesthaven et al., 2016; Carlberg et al., 2011; Peherstorfer and Willcox, 2016; Zahr and Farhat, 2015; Benner et al., 2015). However, there are recent innovations in developing mesh-based methods (Long et al., 2018; Zhu and Zabaras, 2018; Geneva and Zabaras, 2020; Bar-Sinai et al., 2019; Li et al., 2020a; Pfaff et al., 2020) and mesh-agnostic methods (Raissi et al., 2020; Lu et al., 2021a, b; Sun et al., 2020). Despite the diversity of algorithmic innovations, each has various challenges in efficiently or accurately representing complex spatio-temporal dynamics. Indeed, practical engineering applications require handling variable geometry, non-uniform grid resolutions, adaptive meshes, and/or parametric dependencies. We advocate a general mathematical framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF leverages a hypernetwork structure that allows one to isolate the spatial complexity, thus accounting for all other complexity in a second network where parametric dependencies, time, and sensor measurements are encoded and modulating the spatial layer. We show that NIF is highly advantageous for representing spatio-temporal dynamics in comparison with current methods.

Spatio-temporal data is ubiquitous. As such, a diversity of methods have been developed to characterize the underlying physics. In manifold-based modeling, which is the most dominant paradigm for reduced-order modeling, one first extracts a low-dimensional manifold from the solution of a PDE, typically using the singular value decomposition (Benner et al., 2015; Noack et al., 2003; Rowley et al., 2004) or a convolutional autoencoder (CAE) (Brunton and Kutz, 2019; Holmes et al., 2012; Mohan et al., 2019; Murata et al., 2020; Xu and Duraisamy, 2020; Lee and You, 2019; Ahmed et al., 2021b). Then one either directly solves the projected governing equation on the manifold (Carlberg et al., 2011, 2013)

or learns the projected dynamics from the data on the manifold with either Long Short-Term Memory (LSTM) 

(Mohan and Gaitonde, 2018)

, Artificial Neural Network 

(Pan and Duraisamy, 2018b; San et al., 2019; Lui and Wolf, 2019), polynomials (Qian et al., 2020; Brunton et al., 2016; Peherstorfer and Willcox, 2016). Other variants include jointly learning dynamics together with the manifold (Champion et al., 2019; Kalia et al., 2021; Lusch et al., 2018; Takeishi et al., 2017; Yeung et al., 2019; Otto and Rowley, 2019; Pan and Duraisamy, 2020; Lange et al., 2021; Mardt et al., 2020), and closure modeling to account for non-Markovian effects (Pan and Duraisamy, 2018a; Wang et al., 2020; Maulik et al., 2020; Ahmed et al., 2021a)

. Manifold-based approaches first reduce the prohibitively large spatial degrees of freedom (e.g.,

in fluid dynamics) into a moderate number (e.g., ) by learning a low-dimensional representation on which we project and solve the PDE, thus inheriting the physics (Carlberg et al., 2011), or simply performing time-series modeling on the reduced manifold (Xu and Duraisamy, 2020) or modal expansion (Taira et al., 2017)

. Equivalently, this preprocessing step can be viewed as learning a time-dependent vector-valued low dimensional representation of a spatio-temporal field. More broadly, learning an effective low-dimensional representation is often domain specific, for example, using a real-valued matrix with RGB channels for representing images in computer vision

(Szeliski, 2010), Word2Vec for representing words in natural language processing (NLP) (Mikolov et al., 2013), spectrograms for representing audio signal in speech recognition (Flanagan, 2013), among others.

Manifold-based methods have several drawbacks when applied to realistic spatio-temporal data. Specifically, practical engineering applications require handling variable geometry, non-uniform grid resolutions, adaptive meshes, and/or parametric dependencies. This includes data from incompressible flow in the unbounded domain (Yu et al., 2022), combustion (Bell and Day, 2011), astrophysics (Almgren et al., 2010), multiphase flows (Sussman, 2005), and fluid-structure interactions (Bhalla et al., 2013)) which are generated with advanced meshing techniques such as adaptive mesh refinement (AMR) (Berger and Oliger, 1984), and/or overset grids (Chan, 2009). Such meshes typically change with time or parameters (e.g., Mach number dependency on shock location) in order to efficiently capture the multi-scale phenomena111for some systems (Bryan et al., 2014), AMR is the only way to make such simulation possible., which violates the requirement of common SVD approaches. While CNNs require preprocessing the flowfield as an image, i.e., voxel representation with a uniform Cartesian grid, in order to perform discrete convolution, which is affordable in 2D but becomes increasingly expensive in 3D (Park et al., 2019) due to cubic memory requirements. The memory footprint limits the resolution to typically (Mescheder et al., 2019) unless one resorts to an optimized parallel implementation of 3D CNN on clusters (Mathuriya et al., 2018). Additionally, such uniform processing is inherently inconsistent with the nonuniform multi-scale nature of PDEs and can decrease the prediction accuracy of downstream tasks, e.g., failing to accurately recover total lift/drag from flowfield predictions (Bhatnagar et al., 2019) or efficiently capture small-scale geometry variations that can trigger critical physics phenomena. This leads us to pursue a mesh-agnostic and expressive paradigm beyond SVD and CAE for dimensionality reduction of spatio-temporal data.

Recent advances in mesh-based methods222In this paper, “mesh-based” in our paper is the same as “graph-based”. A key characteristics of graph-based method is that the online computational cost scales with the size of the graph (i.e., how much grid points) (Chen et al., 2021). While manifold-based methods only access mesh during postprocessing. have shown promising results either with discretization-invariant operator learning (Li et al., 2020a, c, b) or meshes based on the numerical solver (Pfaff et al., 2020; Sanchez-Gonzalez et al., 2020; Xu et al., 2021). On the other hand, a mesh-agnostic framework (e.g., physics-informed neural network (PINN) (Raissi et al., 2020)) has been successfully applied to solving canonical PDEs on problems where mesh-based algorithms can be cumbersome due to, for instance, arbitrary geometry (Berg and Nyström, 2018; Sun et al., 2020) and/or high-dimensionality (Sirignano and Spiliopoulos, 2018). In addition to leveraging the PDE governing equation, it successfully leverages the structure of multilayer perceptrons (MLPs) with the coordinate as input and the solution field as output. It can be viewed as using an adaptive global basis in space-time to approximate the solution with the known PDE, instead of the traditional local polynomial basis inside a cell, e.g., finite element analysis (Hughes, 2012). Concurrently, such MLP structure has been also employed in computer graphics community for learning 3D shape representations (Park et al., 2019; Mescheder et al., 2019), scenes (Mildenhall et al., 2020; Sitzmann et al., 2020; Tancik et al., 2020). A closely related work called MeshfreeFlowNet (MFN) (Esmaeilzadeh et al., 2020)

uses CNN and coordinate-based MLP to perform super-resolution for fluid problems. MFN first uses a 2D CNN to extract features from coarse-scale spatial-temporal field. Then, the extracted features are concatenated with

as an augmented input to a MLP, which outputs the high-resolution measurement at .

Motivated by the above seminal works, we introduce a mesh-agnostic representation learning paradigm called neural implicit flow (NIF) that exploits the expressiveness and flexibility of the multilayer perceptrons (MLPs) for dimensionality reduction of parametric spatio-temporal fields. In section 2, we present the NIF paradigm and several derived problem-specific frameworks. In section 3, we demonstrate the following capabilities of NIF:

  1. To our knowledge, NIF enables the first ever scalable 3D nonlinear dimensionality reduction on spatial-temporal datasets from arbitrary different meshes. Example includes a three-dimensional fully turbulent flows with over 2 million cells. (see section 3.3)

  2. NIF also enables modal analysis of spatial-temporal dynamics on adaptive meshes. As an example, we explore dynamic mode decomposition on the fluid flow past a cylinder with adaptive mesh refinement (see section 3.5).

NIF also provides a performance improvement in the following applications on several canonical spatio-temporal dynamics:

  1. NIF generalizes 40% better in terms of root-mean-square error (RMSE) than a generic mesh-agnostic MLP in terms of mesh-agnostic surrogate modeling for the parametric Kuramoto–Sivashinsky PDE under the same size of training data or trainable parameters (see section 3.1)

  2. NIF outperforms both (linear) SVD and (nonlinear) CAE in terms of nonlinear dimensionality reduction, as demonstrated on the Rayleigh-Taylor instability on adaptive mesh with a factor from 10 times to 50% error reduction. (see section 3.2)

  3. Compared with the original implicit neural representation which takes all information (including time ) into a single feedforward network (Sitzmann et al., 2020; Lu et al., 2021c), NIF enables efficient spatial sampling with 30% less CPU time and around 26% less memory consumption under the same level of accuracy for learning the aforementioned turbulence dataset. (see section 3.4)

  4. NIF outperforms the state-of-the-art method (POD-QDEIM (Drmac and Gugercin, 2016)) in the task of data-driven sparse sensing with 34% smaller testing error on the sea surface temperature dataset. (see section 3.6)

Finally, conclusions and future perspectives of NIF are presented in section 4.

2 Neural Implicit Flow

We begin by considering 3D spatial-temporal data with varying time/parameters, namely , with spatial coordinate , time , and parameters

(e.g., Reynolds number). Without loss of generality, consider a supervised learning problem: using an

-layer MLP with spatial coordinate as input to fit a single spatial realization at an arbitrary time and parameter , i.e., . An -layer MLP with as input is a vector-valued function defined as , with . The first layer weight has size , and the remaining hidden layer weight has size . The last layer weight has size . Biases are denoted as where subscript denotes the index of layer. . The set of weights and biases are defined as and correspondingly. The activation is a non-linear continuous function. One can use gradient-based optimization to find the weights and biases that minimize the discrepancy between and the single snapshot data at and :



is a loss function (e.g., mean squared error, mean absolute error) and

is some spatial measure which can depend on . This minimization leads to a key observation: a well-trained set of weights and biases of the MLP fully determines a spatial field that approximates the target field, in a mesh-agnostic sense. Such a trained MLP (what we call ShapeNet) is closely related to the so-called neural implicit representation (Sitzmann et al., 2020), and works in computer graphics (Park et al., 2019; Mescheder et al., 2019) used such an MLP to fit signed distance functions (SDFs) of a desired surface, which is implicitly defined by the zero iso-surface of the SDF.

Figure 1: Neural implicit flow framework for dimensionality reduction of spatio-temporal field from PDE. NIF uses the bottleneck layer, which is linearly mapped to weights (and biases ) of ShapeNet, as the latent representation. ParameterNet correlates external factors with such a latent representation.

To explicitly correlate the corresponding spatial fields with external factors of interests, such as time, parameters, and sparse sensor measurements, we use a second MLP (denoted as ParameterNet) to learn mappings from these parameters to weights and biases . Note that the typical output dimension of ParameterNet, e.g.,, the total number of scalars in and , is above thousands to tens of thousands. Dimensionality reduction assumes the existence of a rank- subspace that can approximate the spatio-temporal data. The reduced coordinates are denoted as . One can simply consider a linear output layer in ParameterNet after a bottleneck layer of width . Once are determined, and in the ShapeNet are completely determined, which in turn determines the spatial “flow” field conditioned on the time or more generally, any other external parameters. As a result, the bottleneck layer in fig. 1 can be viewed as an -dimensional latent representation, similar to the bottleneck layer in generic autoencoders (Goodfellow et al., 2016)333However, note that we haven’t introduced “encoder” in fig. 1. We will introduce an “encoder” based on sparse sensors in section 3.2.. In summary, we can find a mesh-agnostic representation of parametric spatio-temporal fields as in the aforementioned manifold-based methods, where spatial complexity, e.g., coherent structures, is explicitly decoupled from temporal and parametric complexity, e.g., chaos and bifurcations.

Note that any neural network that generates the weights and biases of another neural network is generally called a hypernetwork (Ha et al., 2016). This concept has been applied in image (Sitzmann et al., 2020) and language modeling (Ha et al., 2016) and its inspiration can be dated back to control fast-weight memory (Schmidhuber, 1992) in the early 90’s. Therefore, NIF can be viewed as a special family of hypernetworks for MLP with only spatial input while any other factors, e.g., time , parameter , are fed into the hypernetwork. This implies the dimensionality reduction is only performed towards spatial complexity, which is the biggest cause of computational and storage challenges for non-linear PDEs. Thus, spatial complexity is decoupled from other factors.

Figure 2: Comparion of NIF with DeepONet (Lu et al., 2021a) and Neural Operator (Li et al., 2020b, c, a). Notice that the skeleton of DeepONet can be viewed as that of a last layer parameterized NIF. Information from sensor measurements is parameterized as a matrix multiplication while NIF is parameterized everywhere inside the ShapeNet. Neural Operator learns continuous nonlinear convolution in order to be mesh-invariant and spatial invariant. Neural Operator and DeepONet focus on learning solution operator while NIF focus on learning a reduced latent representation of spatial-temporal dynamics on arbitrary mesh.

A comparison of the NIF architecture with other recent frameworks of machine learning for PDEs is shown in fig. 2. DeepONet (Lu et al., 2021a; Wang et al., 2021a) pioneered the learning of general operators associated with PDEs from data. It has been a building block of a DeepM&MNet (Cai et al., 2021), which has been applied to data assimilation in electroconvection (Cai et al., 2021) and hypersonic flows (Mao et al., 2021). Interestingly, the structure of DeepONet can be viewed as that of NIF with only the last linear layer of the ShapeNet determined by the so-called trunk net. To highlight the expressiveness of NIF, we compare a tiny NIF with only 51 parameters and a moderate DeepONet with 3003 parameters on a 1D modulated traveling sine wave. Figure 3 shows no visual difference between the ground truth and NIF.

Figure 3: Comparison between DeepONet and NIF on a 1D modulated traveling sine wave traveling from left to right. The wave is described as with . Data is generated by uniformly sampling in with 300 samples and in with 20 samples. Data is then standard normalized for training. Configuration of NIF: ShapeNet with input : 1-2-2-2-1, ParameterNet with input : 2-2-1-19. Configuration of DeepONet: branch net with input : 1-30-30-17. trunk net with input : 1-30-30-16-1.

Alternatively, neural operators (Li et al., 2020b, c, a; Kovachki et al., 2021) are discretization-independent mesh-based frameworks for learning solution operators of PDEs beyond the straightforward CNN approach on a fixed grid (Zhu and Zabaras, 2018; Long et al., 2018). Neural operators have been successful in recovering Green’s function, learning chaotic PDE systems (Li et al., 2021a), and leveraging known PDEs such as Navier-Stokes equations (Li et al., 2021b). It inherits translational and rotational invariance using graph neural networks, as opposed to NIF and DeepONet, which might be advantageous in the small data regime.

Distinct from the above pioneering works aimed at learning solution operators of PDEs, NIF emphasizes a scalable nonlinear dimensionality reduction paradigm that outperforms SVD and CAE in dealing with complex parametric spatial-temporal data (e.g., turbulence) on arbitrary mesh structure. In the following subsections, we will develop problem-specific, NIF-based frameworks on a variety of learning tasks.

2.1 Data-fit parametric surrogate modeling for PDEs

Consider a class of non-linear parametric PDEs, , . Here and is a non-linear function or operator in general. An example of data-fit parametric surrogate modeling (Benner et al., 2015; Sobieszczanski-Sobieski and Haftka, 1997; Frangos et al., 2010; Amsallem et al., 2013; Bhatnagar et al., 2019) is to find an approximated relation between the parameter and the corresponding PDE solution under fixed initial and boundary conditions. After the data-fit surrogate model is trained, it is expected to make predictions for an unseen input possibly in real-time444 Note that our major goal in the following PDE examples is to demonstrate NIF for dimensionality reduction, not for learning solution operator of PDEs as is advocated in other neural network architectures. However, we present the following example just to show the capability of efficient surrogate modeling where computational cost is independent of mesh size. Note that existing graph-based frameworks still suffer from poor scaling of computational cost with mesh size.. . An illustration of the method is shown in fig. 4. This is attractive for many engineering tasks that require many-query analyses such as optimization, uncertainty quantification, and control. In contrast to more physics-based models (Benner et al., 2015), a data-fit model simplifies the surrogate modeling for PDEs as a high-dimensional regression without access to prior knowledge. Despite the many disadvantages, including large sample complexity, lack of interpretability etc., it is the simplest and most widely used type of surrogate model (Qian et al., 2020; Loiseau et al., 2021).

As illustrated in fig. 4, we apply NIF to the above parametric surrogate modeling by simply allowing the weights and biases to depend on time and parameters through the ParameterNet ,


where is an MLP with its own weights and biases denoted as , vec is the matrix vectorization, and is the total number of unknown parameters in and . Again, it is important to note that the width of the layer before the last linear layer of approximates the total number of parameters we need to optimize. We set the bottleneck width which equals the input dimension of ParameterNet. Hence, the rank of is . We denote the above dependency of weights and biases simply as . By considering eq. 2, and extending eq. 1 from to , we arrive at the following minimization formulation,


where now becomes a measure in . A natural choice of to cover the domain of interest is the empirical distribution based on the numerical discretization of

where the data comes from. For example a tensor product of discretizations independently in

and leads to where denotes Dirac measure. Here total number of the spatial mesh points is , the number of snapshots in time is , and the number of parameters selected is . Once a proper is chosen, eq. 3 can be solved via gradient-based method, e.g., Adam optimizer (Kingma and Ba, 2014).

It should be noted that eq. 3 permits using any kind of proper empirical measure to cover . As shown in fig. 1, this can be especially advantage for problems where an efficient adaptive mesh (e.g., AMR, Overset), moving mesh (e.g., fluid-structure interaction) or simply parameter-dependent mesh (e.g., varying discontinuity with parameters) is adopted. It is a distinct feature that makes NIF different from previous pipelines of dimensionality reduction (Xu and Duraisamy, 2020; Mohan et al., 2019) with CAE (i.e., a homogeneous isotropic spatial measure) and SVD (i.e., a tensor product measure structure between and ). In this paper, we take as the mean square error and rewrite as the most general form555Taken the example of tensor product measure, the number of total training data points is the product of resolution on each dimensions, i.e., . Therefore, in practice is typically larger than millions. . So eq. 3 becomes the standard least-squares regression,

Figure 4: Application of NIF on mesh-agnostic surrogate modeling of parametric PDE. The case of 1D parametric Kuramoto-Sivashinsky equation is taken for illustration and further studied in section 3.1.

2.2 Learning representations for multi-scale spatial-temporal data

As described in section 2

, ShapeNet is a general MLP with activation function

still to be determined. Given the universal approximator theorem (Hornik et al., 1989)

, the choice seems to be initially arbitrary. However, there is a significant and often overlooked difference between “what can MLP approximate” and “what can MLP efficiently learn with gradient-based algorithms”. Standard MLPs with common activation functions, such as ReLU, tanh, swish, sigmoid, have been recently observed and proven

(Tancik et al., 2020) to suffer from extreme inefficiency on learning high-frequency functions even with increased network complexity. Indeed, the failure of standard MLPs on high frequency datasets is a well-known phenomenon called spectral bias (Rahaman et al., 2019) or F-principle (Xu et al., 2019). Recall that in fig. 1, ShapeNet critically relies on an MLP with input approximating the “shape” of spatial data, which can be problematic in the case of fluid flows with high wavenumber content, e.g., eddies, hairpin vorticies. Note that this issue is also relevant to PINNs (Wang et al., 2021b), which might explain the challenges of using data-free PINNs to solve Naiver-Stokes in fully turbulent regimes.

Recent advances in computer graphics propose remedies that uses Fourier features (Tancik et al., 2020) or -scaled sine activation functions666Activation function becomes . We use throughout this work. in MLP, called SIREN (Sitzmann et al., 2020), to learn high frequency content. The former Fourier feature approach has been recently introduced in the PINN community by Wang et al. (2021b) with the need to tune the length scale of the Fourier features for each dataset. Here we take the latter approach since we empirically found SIREN is much less sensitive to hyper-parameters of the network compared to the original Fourier feature approach (Tancik et al., 2020). Thus, as shown in fig. 5, we design ShapeNet using SIREN with a ResNet-like structure. However, implementing such an -scaled sine activation function requires a special type of initialization (Sitzmann et al., 2020) of both the MLP parameters and the uniform normalization of the dataset. Implementation details are documented in appendix D. With SIREN for the ShapeNet, we can feed time into ParameterNet to learn a compressed representation from spatial-temporal data in a mesh-agnostic way as illustrated in fig. 6. Furthermore, instead of feeding time into ParameterNet, we can build an encoder just using sensor measurements from only a few locations as input to the ParameterNet. Such applications are demonstrated in sections 3.3 and 3.2.

Figure 5: ShapeNet for multi-scale spatial data consists of SIREN (Sitzmann et al., 2020) with ResNet-like structure (Lu et al., 2021c) and -scaled sine function. Input is spatial coordinate . Output is a -dimensional vector. Biases are omitted in the figure for clarity. Green dashed line indicates the -th block. Such structure is repeated several times downstream.
Figure 6: Application of NIF with SIREN on mesh-agnostic learning of a latent representation of large-scale spatial-temporal dataset. Iso-contour of Q-criterion colored by vorticity magnitude of 3D HIT is taken for illustration here and further studied in section 3.3.

2.3 Learning mesh-agnostic linear representations

The proper orthogonal decomposition (POD) (Lumley, 1967) was introduced to the fluid dynamics community by Lumley in 1965 in order to give a mathematical definition of “large eddy” by applying a Karhunen-Loeve expansion (Loeve, 1955) to turbulent velocity fields. Formally, the first POD modes are determined by the first eigenfunctions of an integral operator with the cross-correlation kernel of the turbulent velocity field (George, 1988). In practice, the SVD is often used to obtain a discrete realization of POD modes (Brunton and Kutz, 2019). Such POD modes not only find an -dimensional linear subspace that optimally minimizes the projection error (as shown in eq. 5) but also provides a sequence of ordered basis weighted by their singular values (Djouadi, 2008).


subject to the constraints , , for , where the superscript denotes -th component and is the Kronecker delta. POD via SVD relies on a fixed mesh in order to provide a closed-form discrete approximation of the POD modes. If the mesh changes with time and/or parameters, which is the cases for many problems of interest (Teyssier, 2002; Bryan et al., 2014; Vay et al., 2004), then the SVD-based approaches are ill-suited for many downstream tasks such as modal analysis of fluid flows (Taira et al., 2017) and reduced-order modeling (Benner et al., 2015; Loiseau et al., 2021; Noack et al., 2003).

Since multi-scale features often appear in spatio-temporal data, we employ NIF with SIREN in section 2.2 for the applications considered in the rest of this paper. As shown in fig. 7, we first provide a framework based on NIF to directly approximate an optimal -dimensional linear space of classical POD theory (Djouadi, 2008). The key observation is that we can use ParameterNet with input to parameterize only the last layer weights and biases of ShapeNet while the rest of weights and biases of ShapeNet are determined by optimization. As shown in eq. 6, we arrive at an interpretable approximation of the original spatio-temporal field as a sum of products of spatial functions and temporal modes parameterized by MLP,


The notations in eq. 6 is slightly different from eq. 5 in order to highlight that NIF only approximates the -dimensional linear subspace rather than obtaining a set of ordered orthonormal spatial functions. Note that one needs to take the cell area into account when implementing eq. 6. To remove the effects of the slightly varying magnitude of the spatial basis learned with a neural network, we consider a normalizing spatial basis, such that where and . Since is the corresponding mesh-agnostic -dimensional linear representation, one can effortlessly apply any existing SVD-based frameworks (e.g., SVD-DMD (Schmid, 2010)) on datasets with arbitrary varying meshes.

Figure 7: Application of NIF on mesh-agnostic learning of linear subspace. Here a flow over cylinder at is used for illustration. Once latent subspace is learned from spatio-temporal dataset with AMR. A standard DMD (Brunton and Kutz, 2019) is performed on the latent representation. denotes the fundamental frequency. When postprocessing the DMD mode shape, we choose a Cartesian grid with 500 uniform sampling points in each direction.

2.4 Mesh-agnostic data-driven non-linear sparse reconstruction

The goal of data-driven sparse reconstruction is to use limited sensor measurements to infer the entire high-dimensional system state, given a priori information of the low-dimensional manifold where system evolves. It has been widely applied in projection-based reduced order modeling, especially for large-scale nonlinear systems (also known as “hyper-reduction”) where computing expensive nonlinear terms can be avoided. Currently POD-QDEIM (Drmac and Gugercin, 2016) is one of the most popular methods, which shows improved performance over classical compressive sensing techniques (Manohar et al., 2018). The idea of POD-DEIM (Chaturantabut and Sorensen, 2010)

is to use least-square estimators for the latent representation by only measuring a few locations (e.g., sensors).

Chaturantabut and Sorensen (2010) derived an error upper-bound of DEIM that indicates two contributions: 1) a spectral norm related to sensor selection and 2) projection error of the linear subspace. The idea of POD-QDEIM is to minimize the former contribution of the spectral norm with QR pivoting. Without loss of generality, given a scalar field on mesh points and snapshots, the data matrix is


The corresponding reduced rank- SVD is . Near-optimal sensor placement can be achieved via QR factorization with column pivoting of the transpose of spatial basis,


Note that the top rows of give a sparse measurement matrix, where are the canonical basis vectors with a unit entry at index and zero elsewhere. corresponds to the index of -th best sensor location. One can recover the full field by least-square estimation of the latent representation from at those sensors.

Given the success of POD-QDEIM, we can turn our attention to its error contribution, i.e., the projection error of the linear subspace, by replacing POD with NIF. We first use the optimized sensor location determined by POD-QDEIM. Then, as shown in fig. 8, given sensor measurements as input for ParameterNet, and the ground true field at , we end up with a standard supervised learning problem. Once the model is trained, it can be used to predict the full spatial field at any location for a given measurement from sensors.

Figure 8: Application of NIF on mesh-agnostic data-driven sparse reconstruction.

3 Applications

The code and data for the following applications is available at The Python package for NIF is available at

3.1 Learning parametric solutions of Kuramoto–Sivashinsky equation

We apply the data-fitting parametric surrogate modeling framework in eq. 4 on the 1D Kuramoto–Sivashinsky equation with periodic boundary condition,


with varying parameter as shown in fig. 4. For the K-S equation, we fix the initial condition as and vary from 0.2 to 0.28 which is not chaotic777Although most often the K-S equation is simulated on chaotic regime, it is also famous for rich bifurcation phenomenon as its parameters change (Papageorgiou and Smyrlis, 1991).. The training data consists of 20 points in the parameter space (i.e., 20 simulations with distinct ). The testing data consists of 59 simulations with a finer sampling of . As shown in fig. 4, the system response when one varies from 0.2 to 0.28 is relatively smooth without any chaos. This makes it a well-posed problem for regression. The training data is preprocessed with standard normalization. Details are given in section B.1.

For NIF, we take 4 layers with units for ParameterNet as 2-30-30-2-6553 and 5 layers with units 1-56-56-56-1 with ResNet-like skip connection for ShapeNet. We empirically found such ResNet-like skip connections can help accelerate the convergence. Note that 6553 corresponds to the total number of weights and biases in the aforementioned ShapeNet. The swish activation function (Ramachandran et al., 2017) is adopted. As a comparison, we use a standard MLP with 5 layers with units 3-100-100-100-1, which is around the same number of model parameters with as input and output . This can be viewed as a straightforward idea using PINNs (Raissi et al., 2020)

as the regression without minimizing the PDE loss. Note that the same skip connections are employed as well. The model parameters are initialized with a truncated normal with standard deviation of 0.1 for both cases

888We empirically find Such “small weights initialization” is an easy way to help the convergence of NIF. Systematic work in (Chang et al., 2019) on initialization of hypernetwork might further improve the result.

. For the standard MLP case, two extra cases with tanh and ReLU activations are considered. We implemented both models in Tensorflow

(Abadi et al., 2016). Note that we heavily rely on einsum in implementing NIF. We adopt the Adam optimizer (Kingma and Ba, 2014)

with a learning rate of 1e-3, batch size of 1024 and 40000 epochs. To take training randomness into account and remove outliers, we take the average of 4 well converged trials for both network structures. As shown in

table 1, NIF with Swish activations achieved better performance on both training and testing data than three MLP counterparts.

Model Train RMSE Test RMSE
NIF (Swish) 0.64
MLP (Swish)
NIF (tanh) 0.99
MLP (tanh)
Table 1: Comparison between standard MLP and NIF in section 2.1 for surrogate modeling of 1D parametric PDE. RMSE below is averaged over all parameter .

For simplicity, we fix the Swish activation function in the following. We first vary the size of the training data and retrain the models to evaluate data efficiency. We change the number of sampling points in parameter space for training data from the previous 20 to 15, 24, 29. As shown in fig. 9, NIF with Swish activation performs consistently better than MLP with Swish activation. To achieve the same level of testing error, NIF requires approximately half of the training data. Finally, we vary the number of model parameters from 7000 to 34000 while fixing the number of points in parameter space to 20. We then retrain the models to evaluate model expressiveness. As displayed in fig. 9, given the same number of parameters, NIF lowers the testing error by half compared to its MLP counterpart. Details of the comparisons are given in section C.1.

Figure 9: Trend of testing error against (Left) varying the size of training data; (Right): varying the number of model parameters.

3.2 Nonlinear dimensionality reduction of Rayleigh-Taylor instability from adaptive mesh

To highlight the advantage of NIF on multi-scale problems, here we compare the performance of SVD, CAE and a NIF-based autoencoder on reducing the dimensionality of the density field from the classical 2D Rayleigh-Taylor (R-T) instability. In this R-T problem, the interface between the high density fluid above and the low density fluid below is initially perturbed with a single mode profile of density. The CFD simulation is performed with CASTRO (Almgren et al., 2010), which is a compressible hydrodynamic code with AMR. Since the mesh changes with time, CAE or SVD can only be applied after projecting the data from the adaptive mesh onto a static fine mesh. Such preprocessing can introduce the so-called projection error and can be computationally challenging, especially in 3D. In the following, we take the static fine mesh as 128256 for SVD and CAE. In contrast, NIF directly takes the pointwise raw data on the adaptive mesh for training.

The goal here is to encode the flow state onto an -dimensional latent space, from which one can faithfully reconstruct the flow state. Note that is the dimension of the latent subspace. Since we only take data from a single initial condition and the collection of data ends before the symmetry breaks, the minimal latent space dimension that preserves the information in this case is one. We sample the flowfield uniformly in time and split such single trajectories into 84 training and 28 testing snapshots in a way that the testing snapshots fall in between the training snapshots. Details of data preparation are provided in section B.2.

Note that the structure of NIF in fig. 1 is only for a decoder rather than an encoder. Hence, unlike SVD and CAE, we still need to choose an encoder that feeds information to the network in order to let it discern one snapshot from the other. One can choose either a convolution layer with coarse data on Cartesian meshes just like CAE, or a neural network with

a cluster of point-wise measurements at certain locations. Here we choose the latter: we consider 32 uniformly distributed sensors along the vertical axis in the middle of the flowfield.

For NIF, we take two ResNet blocks with 64 units in fig. 5 followed by a linear bottleneck layer of units as ParameterNet. ShapeNet contains 2 ResNet-like blocks in fig. 5 with 128 units. ParameterNet takes the 32 sensor measurements as input and outputs 66,561 parameters as the weights and biases of the ShapeNet. While ShapeNet takes pointwise coordinates as input and outputs the prediction of density at . The output dimension of ParameterNet is 66,561 as the total number of weights and biases of the ShapeNet. To enforce a smooth latent representation, we use Jacobian and approximated Hessian regularization (Rifai et al., 2011) together with an initialization of small weights for the encoder, which we find empirically to be helpful.

For CAE, We choose a typical deep convolutional architecture used for fluid flows (Wiewel et al., 2019) with detailed setup in section C.2.1. Gradient-based optimization is performed with an Adam optimizer. The learning rate is 2e-5 for NIF and 1e-3 for CAE with a batch size of 3150 for NIF and 4 for CAE999Note that pointwise data is fed to NIF while image snapshot data projected on a uniform Cartesian mesh is fed to CAE.. The total learning epoch is 10,000 for CAE and 800 for NIF.

Figure 10: Comparison of SVD, CAE with 32x64 (low), 64x128 (middle) and 128x256 (high) resolution for dimensionality reduction on testing data of 2D Rayleigh-Taylor instability on an adaptive mesh. (1) evolution of flowfield reconstruction with at five selected time in the testing phase. (2) Zoomed near the left vortex on the last snapshot. (3) Trend of temporally averaged RMSE with varying rank for all models.

In order to make quantitative comparisons, we project all of the predictions on testing data together with ground true data onto a very fine mesh with resolution of 256512 by nearest-neighbor (for CAE and SVD) or direct sampling (for NIF). As shown in fig. 10 where a rank 1 reduction is performed, the prediction of NIF is better than SVD and CAE with varying static mesh resolution from , and . When increases from to , it is expected that the errors from non-linear models do not change much due to the nature of single realization while that from the linear method decreases. On average, predictions of the CAE with three increasing resolutions lead to 10, 5, 1.7 times more error than that of NIF model. Further error analysis in section C.2.2 shows that the NIF-based autoencoder outperforms SVD mainly due to the lack of nonlinear expressiveness of SVD and it is better than CAE because of the excess projection error from the uniform Cartesian grid.

3.3 Learning spatially compressed representation for 3D homogeneous isotropic turbulence

Next, we apply NIF in section 2.2 to learn a spatially compressed representation of 3D multi-scale spatio-temporal field. Our goal is to find a vector-valued continuously differentiable function which “fits” the original spatial-temporal data. is linearly determined by the time-dependent reduced coordinates . Note that is several order of magnitude smaller than the number of mesh points. If it is achieved, one can efficiently send a snapshot of turbulence data at any time by just transferring a -dimensional vector to the receiver. While the receiver just needs a “skeleton” ShapeNet and a single linear decoder layer (i.e., the last layer of ParameterNet) at local device in order to decode from the sender into a continuous differentiable spatial field. It is important to note that the last layer of ParameterNet is a very wide linear layer, with the width on the order of the total number of weights and biases of ShapeNet.

As an illustrative example, we use a temporally evolving (20 snapshots) spatial () velocity field of homogeneous isotropic turbulence (HIT) from Johns Hopkins University. Details of data preparation are given in section B.4. It should be highlighted that distinct from CAE or SVD, the model complexity of NIF is not directly related to the resolution that the data is stored but rather the intrinsic spatial complexity of the data itself. In this example, as for network architecture, ShapeNet has 4 ResNet-like blocks as hidden layers with width as 200 while ParameterNet has 1 ResNet-like block with width as 50 followed by a linear layer with width and a linear layer that maps the -dimensional vector to all the weights and biases of ShapeNet. The total number of trainable parameters is 1,297,365, which is only inside ParameterNet. For training, we use an Adam optimizer with a learning rate of 1e-5 and batch size of 1600.

First, we test our model by comparing the first component of the ground true velocity field versus the reconstruction from NIF. As displayed in fig. 11, NIF reconstructs the ground true velocity even with small-scale structures very well. Since most modern studies on turbulence are descriptive from a statistical viewpoint, it is important to verify that the PDF is well preserved after compression. As shown in fig. 12, the PDFs of various quantities are approximately well preserved. For a more stringent test, we verify the model performance by visually comparing the iso-contour of Q-criterion and vorticity magnitude. As displayed in fig. 13, most of the high order quantity is well preserved by the model with only small visual difference. Lastly, it is important to highlight that the original dataset require a storage of an array with size while the total number of parameters need to be trained is which is of the former. Further, such compression ratio can be improved with more recent neural network pruning techniques (Han et al., 2015) and more inputs to ParameterNet. We leave this exciting topic for future study.

Figure 11: Evaluation of compressed reconstruction from NIF on the velocity field of the three dimensional homogeneous isotropic turbulence at three difference slices () and from JHU dataset (Li et al., 2008). Left: ground true sliced field. Middle: reconstruction from NIF. Right: amplified error between ground true and reconstruction from NIF.
Figure 12: Evaluation of compressed reconstruction from NIF on the PDF of velocity magnitude, , , of the 3D HIT from JHU dataset (Li et al., 2008) at the first time instance.
Figure 13: Comparison of compressed reconstruction of 3D HIT (left) and ground true at first snapshot (right). Flowfield is visualized with a contour of Q-criterion colored with vorticity magnitude.

3.4 Efficient spatial query of 3D homogeneous isotropic dataset

When we are querying large scientific datasets, typically from PDE solvers, we perform many more queries in space than queries in time or parameter space. For example, spatial statistics in the homogeneous direction are often collected in the analysis of turbulent physics. Visualization of vortex indicators requires intensive spatial query, e.g., spatial differentiation. The primary reason is that the spatial degree of freedom is typically much larger than either the temporal or parametric degrees of freedom.

Since the structure of NIF isolates the spatial complexity independently from any other factors, we can efficiently get the spatial field data without unnecessary repeated computations related to other factors such as time or system parameters. On the contrary, it could be a waste of resources if one uses a single feedforward neural network with multiple SIREN layers (Sitzmann et al., 2020) that takes all information as input with the output still being the field of interests (here we refer it simply as “SIREN”). It is because such a single network will need to mix and learn all of the complexities, e.g., multi-scale, chaos and bifurcations. While the number of spatial query is typically on the order of 10,000 in 2D and 1 million in 3D, one has to repeatedly perform temporal and/or parametric query for different spatial points if they adopt a single network with all information as input for spatial query intensive tasks, e.g., learning representation of video (Sitzmann et al., 2020) or temporally evolving volumetric field (Lu et al., 2021c). Therefore, this can lead to a potentially larger network with longer inference time under the same level of accuracy.

Since once the output of ParameterNet is given, the spatial inference can be performed with only run inference on the ShapeNet. We use the same HIT data with resolution (see section 3.3). Our model setup is the same as before except that the width of the ShapeNet and that of the SIREN change from 36, 52, 75, 105 to 150 and the width of the ParameterNet increases from 3 to 10. As shown in the top left of fig. 14, NIF uses a smaller network for spatial evaluation compared to SIREN counterpart under the same level of reconstruction error. This is because SIREN takes as input so capacity of the network is also spent on learning temporal variation.

Figure 14: Efficiency comparison between NIF and SIREN in term of CPU runtime and memory consumption when performing spatial query test on half of the first snapshot of the homogeneous isotropic turbulence. Top left: variation of reconstruction error with network size; Top right: variation of CPU runtime with network size; Bottom left: variation of memory consumption with network size. Bottom right: relative reduction in computational cost of NIF compared with that of SIREN. Note that the reconstruction error is computed based on the mean-squared error of last batch for every epoch with data shuffling.

To further compare the performance, we measure CPU runtime and memory consumption of the spatial query part in the code with the standard Python package time and Fil-memory-profiler (Turner-Trauring, 2019). We denote the spatial query of three velocity components as a forward computation while is denoted as a backward computation. To make it a fair comparison, for NIF we take the inference time on ParameterNet (although it contributes less than 1% to the total computational time here) into consideration as well. Note that in the top right and bottom left subfigures of fig. 14, there is not much difference in terms of the scaling between NIF and SIREN, which is simply the similarity between the ShapeNet and the SIREN except the first layer of SIREN takes as inputs while that of NIF takes . We also note that the backward computation requires nearly twice the computational cost as that of forward computation. However, with the same accuracy NIF requires a smaller network than SIREN. Hence, given the same reconstruction error, the width required for NIF and SIREN can be determined. The bottom right of fig. 14 indicates that NIF leads to 27% less neural network width, 30% less CPU time in forward and backward passes and 26%/27% less memory consumption in forward/backward computations for the task of querying homogeneous isotropic turbulence data.

3.5 Modal analysis on adaptive mesh data: flow over cylinder

Next, we test this NIF-based framework to learn a linear subspace for a classical modal analysis problem in fluid mechanics: vortex shedding behind a cylinder. As shown in fig. 7, the simulation is performed with AMR, which is frequently seen in computational fluid dynamics for efficient simulation and data-storage for a wide range of flows containing moving sharp gradients (e.g., unsteady shocks/flames/vortices). Here we first collect 100 snapshots of a spatial field consisting of two velocity component and , together with , coordinate in each cell and time . Then, we follow eq. 6 to extract a rank-10 linear subspace from the spatio-temporal field. NIF with SIREN is employed in both ParameterNet and ShapeNet. Details of the data preparation and the reconstruction performance are given in section B.3. Given the learned latent 10-dimensional time series as shown in fig. 7, we perform a standard dynamic mode decomposition (DMD) (Schmid, 2022) to extract isolated spatio-temporal modes with distinct frequencies shown. The DMD mode shapes in fig. 7 agree qualitatively well with other existing studies (Pan et al., 2021; Chen et al., 2012). Note that the latent representation contains time and spatial functions contains as input arguments. Thus, at the postprocessing stage, one can easily evaluate these functions above at any time and/or spatial location for any resolution one desires.

3.6 Data-driven sparse reconstruction of sea surface temperature

As shown in fig. 8, we apply the above NIF-based framework to reconstruct and predict sea surface temperature data (Reynolds et al., 2002) given sparse localized measurements. In order to compare with the state-of-the-art, we use a similar setup from Manohar et al. (2018). We take snapshots from 1990 to 2006 as training data and that of the next 15 years, until 2021, as testing data. As mentioned in the previous subsection, we first use POD-QDEIM on the training data to find the best- sensor locations on the sea. Besides, as shown in the top left of fig. 15, we empirically find POD-QDEIM performs the best at surprisingly low 5 sensors. In order to perform an exhaustive evaluation on NIF-based framework, we vary from 5 to 600. Due to the appearance of multi-scale structure in sea surface temperature, we use NIF with SIREN in section 2.2. For to , we take 2 ResNet-like blocks in ShapeNet with width 60 and 2 blocks in ParameterNet with width 60. For , we take 2 blocks in ShapeNet still with width 60 and 2 blocks in ParameterNet with width . For all cases, we fix in analogous to the equality between rank and number of sensors in POD-QDEIM. We use Adam optimizer for mini-batch training with learning rate as 1e-5 and batch size as 7000 for 150 epochs. Details of data preparation are in section B.5.

To further evaluate our framework, we make a side-by-side comparison with the state-of-the-art POD-QDIEM in both reconstruction and prediction of sea surface temperature using the same information from the best- sparse sensor locations from POD-QDEIM. As displayed in the top right of fig. 15, the space-time mean-squared error on training data of both NIF-based framework and POD-QDEIM decrease as the number of sensors increase while that of our framework decays much more quickly than that of POD-QDEIM. The approximated decay rate is shown in the bottom left of fig. 15. We find that our NIF-based framework shows a consistent decay rate as the number of sensors increases. On the other hand, POD-QDEIM struggle to decrease training error only after with a similar decay rate of with a faster rate of after . Also, it is interesting to note that as more and more sensors are used, the POD-QDEIM generalization is worse and worse while our framework in general performs better and better. As shown in the bottom right of fig. 15, given the number of sensors considered in this work, our NIF-based model surpass the best POD-QDEIM model after using 200 sensors, which corresponds to using more than of all possible sensor locations on the sea. Finally, as mentioned before, the most generalizable model of POD-QDEIM is using 5 sensors which results in a space-time M.S.E as 0.71 (additional parameter sweeps are shown in the top left of fig. 15). While the model of our NIF-based framework with smallest testing error takes 600 sensors and results in a space-time M.S.E as 0.46, which is 34% smaller than the best model of POD-QDEIM.

Apart from comparing space-time mean squared error, we also compute a standard deviation of spatially mean squared error along time axis as an indicator for robustness of model performance (see error bars in fig. 15). Note that the axis of the top right figure in fig. 15 is in log scale. Therefore, for the range of number of sensors considered, training error bar of our framework is significantly smaller than that of POD-QDEIM, which indicates our framework can faithfully reconstruct the training data with higher confidence. This is particularly important for hyper-reduction in projection-based ROMs (Carlberg et al., 2011). The testing error bar of our framework is also smaller than that of POD-QDEIM, which means that our framework has higher robustness in predicting unseen data as well. Additional discussions are in section C.3.

Figure 15: Comparison between NIF-based sparse sensing and POD-QDEIM. Top left: training and testing error of POD-QDEIM with a complete parameter sweep for the number of sensors from to . Top right: mean-squared error with standard deviation for varying number of sensors from 5 to 600. Error bar represents plus and minus one temporally standard deviation of spatially mean square error. Bottom left: approximated decay rate of training error with respect to increasing number of sensors (POD-QDEIM versus our NIF-based framework). Bottom right: comparison of testing error among NIF-SS, POD-QDEIM and best POD-QDEIM model.

4 Conclusions

High-dimensional spatial complexity is a major bottleneck of computational and storage costs in many physical science and engineering fields where the physics relies on a set of complex partial differential equations (PDEs). Existing frameworks, such as SVD and CAE, suffer from various challenges arising from complex data structures in real world scientific computing. In this work, a mesh-agnostic representation for parametric spatial-temporal datasets, called neural implicit flow (NIF), is proposed. The key feature of NIF is its ability to separate spatial complexity from other factors such as temporal and parametric complexity, which naturally follows the philosophy of manifold-based learning of PDEs. Compared to existing SVD and CAE frameworks, of which the performance is either restricted by unnecessary projection, linearity, intractable memory scaling for large-scale 3D dataset, or inefficiency for adaptive meshes, NIF enables scalable mesh-agnostic nonlinear dimensionality reduction with improved performance. As a comparison, table 2 shows a summary of the capabilities and challenges of SVD, CAE, and NIF. Specifically, we demonstrate the advantages of NIF in terms of four derived mesh-agnostic frameworks: data-fit surrogate modeling for parametric PDEs, compressed representation of multi-scale spatial-temporal data, learning linear representations, and nonlinear sparse sensing. To the best of our knowledge, nonlinear dimensionality reduction of 3D turbulence with over 2 million cells and complex multi-scale dynamics with an AMR solver is achieved for the first time.

4.1 Disadvantages

Intrinsic data complexity:

For spatially complex data, one requires a correspondingly large ShapeNet to accommodate the spatial structures. Larger networks require longer time to converge. As a result, one has to manually decide the size of ShapeNet and ParameterNet for specific problems, whereas SVD and CAE are much easier to configure.

Long training time:

Unlike SVD and CAE where best practices are well established (Maulik and Mengaldo, 2021; He et al., 2019), training a hypernetwork of a deep neural network still requires some trial and error. A typical SVD runtime takes a few seconds to minutes. Training CAE usually takes 0.5hrs to arrive at a decent accuracy. While NIF usually takes above an hour to days depending on the data complexity, which in turns affects the size of model. First, because the input of NIF is pointwise, the total number of training data can be huge for 3D datasets. Second, in order to connect the two networks, there are expensive tensor operations between 1D and a 2D tensor (output reshaped from ParameterNet). Note that both of the tensors are batch dependent in NIF, which can decrease cache performance. For example in the 3D turbulence case, it takes up to 1.3 hours to process one single epoch and almost 5 days on a 16GB GPU to obtain a good accuracy.

Memory intensive:

The use of hypernetwork leads to more complex network topology than the standard feedforward neural net with a comparable model parameter size. As a result, it creates larger memory footprints, which unfortunately limits the maximal batch size and leads to longer training time.

Lack of invariance:

Despite of some promising results (Wang et al., 2022), it is generally more challenging to embed invariance into coordinate-input neural networks than graph-based approaches (Li et al., 2020a). Such lack of invariance may worsen the generalization of data-fit regression especially when the amount of training data is small.

4.2 Advantages

Intrinsic data complexity:

On the flip side, model complexity does not scale with the “superficial” complexity of the data, e.g., the number of mesh points. Finer mesh points only lead to more training data while model complexity can keep the same. Meanwhile, mesh-based models (e.g., CAE) still suffer from growing model complexity as the number of mesh points increases.

Heterogeneous data sources:

Since it is mesh-agnostic in nature, it becomes very convenient to fuse information from different data sources. For example, PIV data are mostly on a perfect uniform Cartesian mesh, whereas CFD data are commonly on a well designed body-fitted mesh. Sometimes different types of CFD solver would use different mesh, e.g., multi-level Cartesian mesh for LBM/IBM (Taira and Colonius, 2007).

Well-established manifold-based ROM:

Thanks to the decoupling of spatial complexity, we have a direct analogy of SVD but with a mesh-agnositic, nonlinear and scalable version for 3D datasets. It is straightforward to extend the established ROM frameworks (Carlberg et al., 2011; Bruna et al., 2022), modal analysis (Schmid, 2022; Taira et al., 2020; Rowley et al., 2009) to more realistic and mesh-complex datasets.

Efficient spatial query:

Postprocessing of PDE data, e.g., turbulence (Li et al., 2008), often involves intensive spatial query than temporal or other parametric query. Our design of NIF leads to a compact network for spatial complexity, which improves the efficiency for massive spatial query on either the point value or the spatial derivative.

5 Prospects

NIF has the potential to extend existing data-driven modeling paradigms based on SVD and CAE to the real world of scientific computing, where raw spatial-temporal data can be three dimensional, large-scale, and stored on arbitrary adaptive meshes. High-fidelity large-scale data from modern scientific computing codes (Almgren et al., 2010; Nonaka et al., 2012) can be reduced to effective low dimensional spaces, where existing modal analysis (Taira et al., 2020; Towne et al., 2018; McKeon and Sharma, 2010) data-driven flow control (Duriez et al., 2017) and surrogate-based modeling/optimization (Koziel and Leifsson, 2013) can be performed seamlessly on the arbitrary adaptive mesh. However, as is typical for nonlinear dimensionality reduction methods (e.g., CAE), the latent representation is more difficult to interpret than its linear counter part (e.g., SVD), and each time the latent representation can be different. This raises new questions on how to guide the latent representation to be interpretable for experts. Another exciting direction is to construct projection-based ROMs in such a latent space by minimizing the residual of known governing equations. This has been demonstrated very recently on the material point method (Chen et al., 2021)

and for high-dimensional PDE with active learning

(Bruna et al., 2022). Besides dimensionality reduction, NIF can be used to design efficient “decoders” for transferring large spatial scientific data. It essentially trades reduction in storage/transfer and ease in data fusion of heterogeneous meshes with off-line training a NIF model and additional function calls on the ShapeNet.

Property/Model SVD CAE Neural Implicit Flow
Resolution strong Convergence to discrete data weak Requires uniform mesh strong Continuous field
Variable geometry strong weak strong
Scalablity strong Efficient randomized SVD available fair/weak Affordable in 2D Resolution restricted in 3D strong Point-wise mini-batch training; Number of parameters required only scale with intrinsic complexity
Parametric/temporal variation of domain discretization weak Requires the same mesh throughout weak Requires the same uniform mesh throughout strong arbitrary mesh
Training easiness strong fair/strong fair
Interpretability of representation strong weak fair last-layer parameterization learns linear subspace
Expressiveness weak Linear, not ideal for advection dominated flows fair Nonlinear but cannot capture multi-scale efficiently strong Nonlinear with multi-scale capability
Table 2: Capabilities and challenges of representation learning: SVD, CAE, and Neural Implicit Flow.

The authors thank Kevin Carlberg, Karthik Duraisamy, Zongyi Li, and Lu Lu for valuable discussions. The authors acknowledge funding support from the Air Force Office of Scientific Research (AFOSR FA9550-19-1-0386). The authors acknowledge funding from the National Science Foundation AI Institute in Dynamic Systems grant number 2112085. We also appreciate the generous support from Prof. Karthik Duraisamy on using the following computing resources, which were provided by the NSF via the grant “MRI: Acquisition of ConFlux, A Novel Platform for Data-Driven Computational Physics”. This work also used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al., 2014), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges-2 system, which is supported by NSF award number ACI-1928147, at the Pittsburgh Supercomputing Center (PSC).

Appendix A POD and CAE

As shown in fig. 16, state-of-the-art methods for dimensionality reduction of parametric spatial temporal data rely on SVD and CAE. Dimensionality via SVD requires the data on a single fixed mesh. First, it flattens the spatial data into a long column vector. Second, these column vectors are stacked over time. Finally, SVD is performed on such stacked matrix and the right singular vectors are the latent representation. CAE treated the flowfield as image. First, one performs a uniform pixelation. Second, one feed the processed images into convolutional autoencoders. The corresponding latent representation is formed by the bottleneck layer in fig. 16.

Figure 16: SVD and CAE for dimensionality reduction of spatio-temporal field from PDE. (1) SVD relies on the assumption that spatio-temporal field is sampled from a single fixed mesh for all parameters across all time. Here the latent representation is formed by the right singular vectors. (2) CAE treats spatio-temporal fields as images by pre-processing with uniform pixelation. Latent space is the flattened vector after several convolution and pooling operations.

Appendix B Data preparation

b.1 1D parametric K-S

Given , , we use ETD-RK4 (Kassam and Trefethen, 2005) method to solve 1D Kuramoto-Sivashinsky equation in eq. 9, with periodic boundary condition and a fixed initial condition . Spatial discretization is performed with 1024 points. Time integration . The raw dataset has space-time resolution as . We subsample 4 times in space and 1000 times in time, i.e., . Note that K-S has a rich dynamics when parameter changes (Papageorgiou and Smyrlis, 1991). In order to make the problem well-posed, we choose a regime where the system is not chaotic. Parametric variation of the solution in is displayed in fig. 17. Training data is prepared with 20 uniform sampling with , ending up with data points. Testing data is prepared with uniform samples within , which leads to data points. Still, each data point has 4 components . Each component is normalized with zero mean and unit standard deviation.

Figure 17: Parametric variation of spatial temporal field generated from Kuramoto Sivashinsky system in at different .

b.2 2D Rayleigh-Taylor

Rayleigh-Taylor instability happens when a heavy fluid floats above a light fluid in a gravitational field. The interface between heavy and light fluid is unstable and “mushroom”-like multi-scale structure will grow as small-scale structures grows much faster than large-scale structures. In order to efficiently simulate the system, adaptive mesh refinement is necessary. The goal of dimensionality reduction of parametric fluid system is to reduce the spatial complexity so that one may perform modeling, design or control in the reduced coordinate efficiently. However, without given the range of system parameters in the data, it is often difficult to make statement on the true dimensionality of a flow. In order to simplify our discussion, we choose only to simulate a single realization of R-T with fixed system parameter where the dimensionality can be reduced to one.

Using CASTRO, we consider one level of AMR with an initial Cartesian grid of 256 points along vertical axis and 128 points along horizontal axis during the simulation, which efficiently captures the evolution of vortex generation and advection of density fronts. AMR refinement ratio is 2 in each direction. We regrid at every 2 steps. The simulation domain is a rectangular with 0.5 long in -axis and 1.0 in -axis. Top and bottom boundary condition is slip wall while left and right are periodic. Time span of the simulation is from 0 to 3.5. The density for heavy fluid is 2 while that for the light fluid is 1. They are separated by a horizontally trigonometric and vertically tanh interface, which is a perturbation to the unstable system. The exact expression follows the equation (46-47) in the original CASTRO paper (Almgren et al., 2010).

We save the simulation data at every 10 steps. Training data starts from 83rd to 249th snapshot (corresponding to time from 1.004 to 2.879) with skipping on the even snapshot: 83, 85, …, 247, 249. Testing data starts from 86th to 248th snapshot with an incremental of six: 86, 92, …, 242, 248. Training and testing data are not overlapping but they have no distribution shift. In total, training data has 84 snapshots and testing data as 28 snapshot. The number of adaptive mesh points ranges from 40,976 to 72,753 across time. For POD and CAE, original data on adaptive mesh is projected onto three different Cartesian grids: , , using the filter ResampleToImage in ParaView (Ahrens et al., 2005). Finally, original data is also projected onto a very fine Cartesian mesh as “ground truth” in order to make quantitative comparison for predictions from different models.

As for NIF, firstly, note that we use a feedforward neural network encoder with sparse sensor as input. As shown in the left of fig. 18, we collect measurements from 32 equally spaced sensor along the vertical axis in the middle of the flowfield. At test time, input data that feed to the encoder is displayed right of fig. 18, where we can see a clear density front moving towards the bottom. Secondly, since NIF takes pointwise data, each data point is a 35-dimensional row vector. The first 32 are sensor values at while the rest three are and density at that location . Because we need to go through all grid points in the adaptive mesh at time , there will be as many data points with repeated 32 columns as the total number of points in the adaptive mesh at time . In fact, we end up with 4,190,928 rows with 35 columns for the training data. Thirdly, since we are using NIF with SIREN in our ShapeNet, we normalize between -1 and 1 while the rest 33 columns are normalized with zero mean and unit standard deviation.

Figure 18: Illustration of sensor measurements for NIF-autoencoder in the R-T instability. Left: 32 sensor locations are distributed evenly on the middle vertical axis. Right: all 32 sensor values for different time in the testing data.

b.3 2D Cylinder Flow with

We generated the data using PeleLM (Nonaka et al., 2012) to solve a incompressible N-S. The cylinder geometry is represented using embedding boundary method. The setup is given in our the Github repo. We set the cylinder centered at the original. The computational domain is (streamwise) direction from -0.02 to 0.1 while direction from -0.04 to 0.04. The boundary condition is set as inflow: Dirichlet, outflow Neumann. Side: periodic. We set m/s, viscosity Pas, kg/. The radius of cylinder m. Thus, Reynolds number is . The simulation is performed from s to s with sampling s. The flow is initialized with uniform streamwise flow superimposed with a side velocity in order to break the symmetry so that the flow can quickly fall on to the limit cycle. To remove the initial transient effect, we collect the last 100 snapshots. The AMR is based on vorticity where we enable a finer mesh with half of the grid size if the magnitude of local vorticity monitored exceed 3000. Overall, we collect data (we can easily collect AMR data pointwise using ParaView) after the system falls on to the limit cycle and sampled the last 100 snapshots. Moreover, in order to remove the effect from the exit boundary and only focus on the region of interests, we only sample cell-centered data within the domain of interests with m and m as shown in fig. 19. Since we are using NIF with SIREN in appendix D, we normalize the into uniform distribution between -1 and 1. For cell area , we scale it with a factor of since we are using single precision in Tensorflow. For output velocity , we first normalize them into zero mean. Next we normalize

into unit variance and normalize

with the same factor. Finally, we arrange the collected pointwise data into a big matrix, with 966,514 rows and 6 columns,

where wide tilde means normalized variable.

Figure 19: Configuration and mesh of CFD simulation on 2D cylinder flow at with AMR. Contour of magnitude of vorticity is shown. Domain of interests is marked in light gray with white edges.

b.4 3D Forced Homogeneous Isotropic Turbulence with

We use the forced isotropic turbulence dataset from JHU Turbulence dataset (Li et al., 2008) with Taylor-scale Reynolds number around 433. The simulation is computed by solving 3D incompressible Navier-Stokes equation with pseudo-spectral method. To sustain the turbulence, energy is injected into the large scales. After the system reaches the equilibrium state, the original dataset consists snapshots collected at every 0.02 nondimensionalized time. Here we collect -th snapshot, in total 20 snapshots. Then we slice a block with resolution from the original HIT with the 20 snapshots, which is . Since we are using NIF with SIREN in appendix D, we normalize the range of so that the input signal is uniformly distributed between -1 and 1. For target velocity , we simply normalize them into zero mean and unit variance.

b.5 Sea Surface Temperature

We obtain the weekly averaged sea surface temperature data since 1990 to present from NOAA website 101010 At the time when we worked on this problem, there is in total 1642 snapshots (1990-2021) with each snapshot corresponds to a weekly averaged temperature field. Each temperature snapshot contains 180 uniformly spaced latitude and 360 uniformly spaced longitude coordinates. It should be noted coordinates correspond to land should be masked 111111 Thus, each masked snapshot contains 44219 points. We use temperature field of the first 832 weeks (16 years) as training data with a total number of 36,790,208 data points. Each data point is a -dimensional row vector, where is the number of sensors and 3 corresponds to . Locations of sensors are obtained via POD-QDEIM or equivalently data-driven sensor placement in (Manohar et al., 2018). Temperature field in the rest of time are testing data. Still, spatial coordinate and is normalized between -1 and 1. Target temperature is normalized into zero mean and unit variance.

Appendix C Additional discussions

c.1 Data-fit surrogate modeling for 1D Kuramoto-Sivashinsky

Recall that the advantages of NIF have been demonstrated in section 3.1. In order to further compare the model performance, we further compute the RMSE for each parameter for four configurations: (1) NIF-Swish, (2) MLP-Swish, (3) NIF-tanh, (4) MLP-tanh. As displayed in fig. 20, NIF with Swish generalizes better than other configurations especially when . As seen in section B.1, such parameter regime corresponds to the traveling waves solutions with a transition time increasing with . Meanwhile, NIF-tanh generalizes slightly better tanh MLP-tanh but similar with MLP-Swish. Hence, we only compare NIF-Swish with MLP-Swish in the rest of the paper.

Figure 20: Comparison among four model configurations (NIF-Swish, MLP-Swish, NIF-tanh, MLP-tanh) in terms of RMSE on the dataset of 1D parametric KS system with .

Here we consider further changing the size of training data to verify if NIF-Swish still performs better than MLP-Swish. As shown in fig. 21, we prepare another three set of datasets with 15, 24, and 29 samples of parameters, which correspond to simulations of K-S system at those parameters.Figure 21 shows the same phenomenon as before that NIF (Swish) generalizes better than MLP (Swish) when .

Model ShapeNet/Vanilla MLP ParameterNet Number of training parameters
NIF (Swish)-1 1-30-30-30-1 2-30-30-2-1951 6,935
MLP (Swish)-1 3-58-58-58-1 7,135
NIF (Swish)-2 1-38-38-38-1 2-29-29-2-3079 10,254
MLP (Swish)-2 3-70-70-70-1 10,291
NIF (Swish)-3 1-56-56-56-1 2-30-30-2-6553 20,741
MLP (Swish)-3 3-100-100-100-1 20,701
NIF (Swish)-4 1-60-60-60-1 2-47-47-2-7501 24,996
MLP (Swish)-4 3-110-110-110-1 24,971
NIF (Swish)-5 1-70-70-70-1 2-60-60-2-10151 34,415
MLP (Swish)-5 3-130-130-130-1 34,711
Table 3: Comparison between standard MLP and NIF in section 2.1 for surrogate modeling of 1D parametric PDE. RMSE below is averaged over all parameter .
Figure 21: Comparison between NIF-Swish and MLP-Swish in terms of RMSE on the dataset of 1D parametric KS system with . Training data contains different numbers of simulations from 15 to 29.

Finally, we consider verifying the above phenomenon by changing the number of trainable parameters away from the baseline in section 3.1 as described in table 3. Again, we still train for 4 converged runs and compute the average error. Still, Figure 22 shows the same phenomenon that explains the difference between NIF (Swish) and MLP (Swish).

Figure 22: Comparison between NIF-Swish and MLP-Swish in terms of RMSE on the dataset of 1D parametric KS system with . Training data contains 20 simulations but Model parameters vary from 7k to 34k.

c.2 Comparison of autoencoders for 2D Rayleigh-Taylor instability

c.2.1 Detailed setup of convolutional autoencoder

The encoder first uses three consecutive stride-2 convolutional layer followed by batch normalization and Swish activations with the number of output channel as 16, 32, 64. Output of the above is followed by flatten layer and a linear dense layer with

output units. Then such units are feed to the decoder, which first contains a linear dense layer followed by a reshape layer and two consecutive dilated transposed convolutional layer followed by batch normalization and Swish activation with the number of output channel as 32, 16. Finally, the last layer is a dilated transpose convolutional layer that maps to the same size as input snapshot. For the best results, we do not use any pooling layers throughout.

Layer (type) Output Shape Parameter Count
Convolution 2D (None, 128, 64, 16) 160
Batch Normalization (None, 128, 64, 16) 64
Swish Activation (None, 128, 64, 16) 0
Convolution 2D (None, 64, 32, 32) 4640
Batch Normalization (None, 64, 32, 32) 128
Swish Activation (None, 64, 32, 32) 0
Convolution 2D (None, 32, 16, 64) 18496
Batch Normalization (None, 32, 16, 64) 256
Swish Activation (None, 128, 64, 16) 0
Flatten (None, 32768) 0
Dense (None, 8) 262152
Dense (None, 32768) 294912
Reshape (None, 32, 16, 64) 0
ConvolutionTranspose 2D (None, 64, 32, 32) 18464
Batch Normalization (None, 64, 32, 32) 128
Swish Activation (None, 64, 32, 32) 0
ConvolutionTranspose 2D (None, 128, 64, 16) 4624
Batch Normalization (None, 128, 64, 16) 64
Swish Activation (None, 128, 64, 16) 0
ConvolutionTranspose 2D (None, 256, 128, 1) 145
Table 4: CAE architecture with input shape .

c.2.2 Evolution of fitting, projection and total error

As mentioned before, we can quantitatively compare the performance of three methods of dimensionality reduction by projecting those outputs on a very fine mesh with 256512 resolution. Recall that the shape of POD output is 128256, and the shape of CAE output varyes from 3264, 64128, 128256. Hence, we use nearest-neighbor to obtain the projection onto 256512 using the transform.resize function in scikit-image package (Van der Walt et al., 2014). While for NIF, we simply perform spatial query at those coordinates of cell nodes, which are , , for , . This can be generated in Python with numpy.linspace(2.5e-7, 0.5, 256, endpoint=True) and numpy.linspace(5e-7, 1, 512, endpoint=True).

Unless highlighted, the following data fields are projected onto 256512 mesh and become 2D arrays varying with time . Value of the field at each coordinate and time can be indexed by .

  • : ground true data from adaptive mesh,

  • : ground true data sampled on the Cartesian mesh,

  • : ground true data sampled on the Cartesian mesh,

  • : ground true data sampled on the Cartesian mesh,

  • : output prediction from POD on the Cartesian mesh,

  • : output prediction from CAE on the Cartesian mesh,

  • : output prediction from CAE on the Cartesian mesh,

  • : output prediction from CAE on the Cartesian mesh,

  • as the output prediction from NIF evaluated on the 256512 mesh.

Without loss of generality, let’s take the CAE with training data on the mesh for example. Notice that


Hence, we define the following three error metrics at each time :

  • fitting error: spatially averaged mean square of fitting difference,

  • projection error: spatially averaged square of project difference,

  • total error: spatially averaged square of total difference


We can define the same metrics for other models. Evolution of the above three error metrics for all the models on training and testing time steps are displayed in the first row of fig. 23. Fitting error (red) contributes the most in POD while projection error is more than two orders of magnitude smaller. This implies the lack of expressiveness in POD leads to its inferior performance. In contrast, projection error (green) contributes most in CAE while fitting error remains unchanged with varying resolution of projection from 32 to 128256. This indicates that CAE is mainly restricted by the projection error introduced during preprocessing in this example. Meanwhile, NIF learns the latent representation directly from the raw data on adaptive mesh. Therefore, fitting error is the same as total error. Moreover, we observe that projection error grows near-exponentially in time. This is because of the nature of R-T instability that energy from large scale structures transfer to small-scale as time goes. Such small-scale vortex generates even further smaller vortex. Eventually, the Cartesian grid becomes unable to resolve the flow, which ends up with a significant amount of projection error. As shown in the rest of fig. 23, such phenomenon persists even changing rank . The fact that the dimensionality of this R-T data is one is consistent with fig. 23 from which only POD methods improves when is increased.

Figure 23: Evolution of fitting, projection and total error for all five dimensionality reductions methods on learning low dimensional latent subspace with (top), (middle), (bottom).

c.3 Sparse reconstruction on sea surface temperature

Comparison on temporal variation of spatially mean-squared error between our framework and POD-QDEIM (Manohar et al., 2018) with different number of sensors is shown in fig. 24. We can visually confirm the error variation is much higher in POD-QDEIM compared to our models. This in turn means the model prediction from POD-QDEIM should be accompanied with larger uncertainties. As one increases the number of sensors, training error of both models decay drastically. Meanwhile, testing error of POD-QDEIM increases and that of NIF-based framework visually stays around the same level.

Figure 24: History of spatially mean-squared error of NIF-based framework and POD-QDEIM on sea surface temperature from 1990 to 2021 with different number of sensors .

Comparison on contours of sea surface temperature among ground true, our framework and POD-QDEIM for the very first week of 1990 (in training dataset) is displayed in fig. 25. We can roughly see the inferior performance of POD-QDEIM comes from at least two sources:

  1. POD-QDEIM overshoots for the average temperature in the middle of the Pacific Ocean.

  2. POD-QDEIM misses small scales structures while NIF-based framework captures them well.

As the number of sensors increases, both models performs better and better on training data.

Figure 25: Sea surface temperature contour of the first week of 1990 from ground true (left column), POD-QDEIM (middle column) and our framework (right column) with varying number of sensors. Black dots indicates optimized sensor location from POD-QDEIM.

In order to further analyze the model generalization performance on sea surface temperature data, we compute the temporal average of squared error on unseen testing data (2006 to 2021) and plot its corresponding spatial distribution with varying number of sensors in fig. 26. Evidently, except at 5 sensors where NIF-SS shows a similar testing error compared to POD-QDEIM, NIF-SS generalizes much better than POD-QDEIM regardless of the number of sensors . As increases, the error distribution of both two frameworks tends to contain more small-scale patches.

An inspection on error distribution from NIF-based framework shows interesting correlations between regions that are difficult to predict and ocean circulation patterns. Let’s focus on the NIF-SS with error magnified 5 times (third column) in fig. 26. For example, when , we see the regions that are most difficult to predict by NIF-SS happen mostly at the location of Gulf stream, North Pacific gyre and Norwegian current.

Figure 26: Spatial distribution of temporally mean squared error of NIF-based framework and POD-QDEIM with varying number of sensors. “NIF-SS x 5” means the error is magnified 5 times to increase visibility.

Appendix D Implementation of NIF with SIREN

We adopt SIREN (Sitzmann et al., 2020), which is standard MLP with -scaled sine activations, as our ShapeNet. In this work we take and find it to be sufficient for all cases. The initialization of SIREN is given in section D.1. Based on SIREN, we also borrow the ResNet-like structure (Lu et al., 2021c) to further improve training performance, which is presented in section D.2. Section D.3 describes how we connect ParameterNet with ShapeNet. Finally, practical advice on training NIF with SIREN is given in section D.4.

d.1 Initialization of SIREN

First of all, one should normalize each component of input for SIREN as approximately. denotes uniform distribution on interval , . For example, in eq. 14, we normalize one coordinate component as


For the output , we choose the standard normalization (zero mean and unit variance).

Second, SIREN requires a special initialization of weights and biases to achieve superior performance. Without loss of generality, consider a standard MLP equipped with -scaled as activation functions and units structure as ----. / denotes the dimension of input/output layer. denotes the dimension of hidden layer. Next, we initialize weights and biases of input layer component-wise in eq. 15 as


where subscript denotes the -th row -th column component. We initialize all other layer weights and biases following eq. 16,


Note that -scaled sine activation is defined as .

d.2 ResNet-like block

After first layer, we use a design of ResNet-like block (Lu et al., 2021c) to build ShapeNet. The configuration is displayed in fig. 5. Without loss of generality, denote as the input of -th such block, we have


where is the input for the next -th block.

d.3 Building ParameterNet

The final step is to connect the output of ParameterNet to ShapeNet. Note that now only ParameterNet contains undetermined parameters while the parameters in ShapeNet are subject to the output of ParameterNet. Therefore, we only need to carefully design the initialization of last layer weights and biases of ParameterNet in order to be consistent with the requirement in section D.1. Recall that the network structure of NIF is a special case of hypernetworks of MLP that focuses on decoupling spatial complexity away from other factors. We take the initialization scheme of hypernetworks in SIREN (Sitzmann et al., 2020). The idea is to simply multiply the last layer initial weights by a small factor, e.g., , while keeping the last layer initial biases to match the distribution in section D.1.

d.4 Practical advice for training

We empirically found successful and stable training NIF with SIREN using Adam optimizer requires 1) small learning rate typically around to , 2) large batch size, which often takes to be filling up the whole GPU memory. When debugging, it is recommended to monitor training error at every 5 epoch and plot the prediction at the same time. If one found training error, e.g., mean-square error, stuck at relatively high in the first 40 epoch and the prediction is completely useless, then one should further decrease the learning rate or increasing the batch size. We found NIF with SIREN also tend to fit large scale structure first then small scales.

However, it is not always possible to increase the batch size given limited GPU memory especially in the fully 3D case. We found that if we fully parameterize the weights and biases of ShapeNet (as oppose to only parameterizing the last layer in section 2.3), the memory footprints of NIF becomes relatively large due to the tensor multiplication with einsum in implementation. Therefore, the maximal number of epochs affordable can be limited, which can lead to too long training time or unstable training if even one set learning rate as low as in the worst case.

If one doesn’t have access to good GPU resources, a more economic remedy on a single GPU is to use small-batch training, which is well-known to have better generalization compared to large-batch training (Masters and Luschi, 2018). However, here we are only looking for stable training of NIF with SIREN that uses small batch sizes to fit in a single GPU. We empirically found L4 optimizer (Rolinek and Martius, 2018) can have stable training performance in the small batch regime compared to Adam for the same batch size. Empirically, we found L4 optimizer can reduce minimal batch size required for stable training by at least 10 times compared to Adam optimizer, while only has a slight increase in training cost. Thus, one can increase the capacity of NIF by 10 times without sacrificing much in the overall training time and performance. Note that such capacity can be crucial in training large-scale 3D fully turbulence dataset with limited GPU resources.

For all the results shown in this paper, we have used Nvidia Tesla P100 (16 GB), Nvidia GeForce RTX 2080 GPU (12 GB), and Nvidia A6000 GPU (48 GB). If available, the simplest remedy is to use data-parallelism with multiple GPUs. We have implemented data-parallel capability for NIF in our Github repo ( on multiple GPUs and it scales well. We leave the complete study on distributed learning with NIF for future work.


  • M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. (2016) Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §3.1.
  • S. E. Ahmed, S. Pawar, O. San, A. Rasheed, T. Iliescu, and B. R. Noack (2021a) On closures for reduced order models—a spectrum of first-principle to machine-learned avenues. Physics of Fluids 33 (9), pp. 091301. Cited by: §1.
  • S. E. Ahmed, O. San, A. Rasheed, and T. Iliescu (2021b) Nonlinear proper orthogonal decomposition for convection-dominated flows. Physics of Fluids 33 (12), pp. 121702. Cited by: §1.
  • J. Ahrens, B. Geveci, and C. Law (2005)

    Paraview: an end-user tool for large data visualization

    The visualization handbook 717 (8). Cited by: §B.2.
  • A. S. Almgren, V. E. Beckner, J. B. Bell, M. Day, L. H. Howell, C. Joggerst, M. Lijewski, A. Nonaka, M. Singer, and M. Zingale (2010) CASTRO: a new compressible astrophysical solver. i. hydrodynamics and self-gravity. The Astrophysical Journal 715 (2), pp. 1221. Cited by: §B.2, §1, §3.2, §5.
  • D. Amsallem, S. Deolalikar, F. Gurrola, and C. Farhat (2013) Model predictive control under coupled fluid-structure constraints using a database of reduced-order models on a tablet. In 21st AIAA Computational Fluid Dynamics Conference, pp. 2588. Cited by: §2.1.
  • Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner (2019) Learning data-driven discretizations for partial differential equations. Proceedings of the National Academy of Sciences 116 (31), pp. 15344–15349. Cited by: §1.
  • J. Bell and M. Day (2011) Adaptive methods for simulation of turbulent combustion. In Turbulent Combustion Modeling, pp. 301–329. Cited by: §1.
  • P. Benner, S. Gugercin, and K. Willcox (2015) A survey of projection-based model reduction methods for parametric dynamical systems. SIAM review 57 (4), pp. 483–531. Cited by: §1, §1, §2.1, §2.3.
  • J. Berg and K. Nyström (2018) A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, pp. 28–41. Cited by: §1.
  • M. J. Berger and J. Oliger (1984) Adaptive mesh refinement for hyperbolic partial differential equations. Journal of computational Physics 53 (3), pp. 484–512. Cited by: §1.
  • A. P. S. Bhalla, R. Bale, B. E. Griffith, and N. A. Patankar (2013) A unified mathematical framework and an adaptive numerical method for fluid–structure interaction with rigid, deforming, and elastic bodies. Journal of Computational Physics 250, pp. 446–476. Cited by: §1.
  • S. Bhatnagar, Y. Afshar, S. Pan, K. Duraisamy, and S. Kaushik (2019)

    Prediction of aerodynamic flow fields using convolutional neural networks

    Computational Mechanics 64 (2), pp. 525–545. Cited by: §1, §2.1.
  • J. Bruna, B. Peherstorfer, and E. Vanden-Eijnden (2022) Neural galerkin scheme with active learning for high-dimensional evolution equations. arXiv preprint arXiv:2203.01360. Cited by: §4.2, §5.
  • S. L. Brunton and J. N. Kutz (2019) Data-driven science and engineering: machine learning, dynamical systems, and control. Cambridge University Press. Cited by: §1, §1, Figure 7, §2.3.
  • S. L. Brunton, B. R. Noack, and P. Koumoutsakos (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52, pp. 477–508. Cited by: §1.
  • S. L. Brunton, J. L. Proctor, and J. N. Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, pp. 201517384. Cited by: §1.
  • G. L. Bryan, M. L. Norman, B. W. O’Shea, T. Abel, J. H. Wise, M. J. Turk, D. R. Reynolds, D. C. Collins, P. Wang, S. W. Skillman, et al. (2014) Enzo: an adaptive mesh refinement code for astrophysics. The Astrophysical Journal Supplement Series 211 (2), pp. 19. Cited by: §2.3, footnote 1.
  • S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis (2021) DeepM&Mnet: inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Computational Physics 436, pp. 110296. Cited by: §2.
  • K. Carlberg, C. Bou-Mosleh, and C. Farhat (2011) Efficient non-linear model reduction via a least-squares petrov–galerkin projection and compressive tensor approximations. International Journal for Numerical Methods in Engineering 86 (2), pp. 155–181. Cited by: §1, §1, §3.6, §4.2.
  • K. Carlberg, C. Farhat, J. Cortial, and D. Amsallem (2013) The gnat method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. Journal of Computational Physics 242, pp. 623–647. Cited by: §1.
  • K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton (2019) Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences 116 (45), pp. 22445–22451. Cited by: §1.
  • W. M. Chan (2009) Overset grid technology development at nasa ames research center. Computers Fluids 38 (3), pp. 496–503. Cited by: §1.
  • O. Chang, L. Flokas, and H. Lipson (2019) Principled weight initialization for hypernetworks. In International Conference on Learning Representations, Cited by: footnote 8.
  • S. Chaturantabut and D. C. Sorensen (2010)

    Nonlinear model reduction via discrete empirical interpolation

    SIAM Journal on Scientific Computing 32 (5), pp. 2737–2764. Cited by: §2.4.
  • K. K. Chen, J. H. Tu, and C. W. Rowley (2012) Variants of dynamic mode decomposition: boundary condition, koopman, and fourier analyses. Journal of nonlinear science 22 (6), pp. 887–915. Cited by: §3.5.
  • P. Y. Chen, M. Chiaramonte, E. Grinspun, and K. Carlberg (2021) Model reduction for the material point method via learning the deformation map and its spatial-temporal gradients. arXiv preprint arXiv:2109.12390. Cited by: §5, footnote 2.
  • S. M. Djouadi (2008) On the optimality of the proper orthogonal decomposition and balanced truncation. In 2008 47th IEEE Conference on Decision and Control, pp. 4221–4226. Cited by: §2.3, §2.3.
  • Z. Drmac and S. Gugercin (2016) A new selection operator for the discrete empirical interpolation method—improved a priori error bound and extensions. SIAM Journal on Scientific Computing 38 (2), pp. A631–A648. Cited by: item 4, §2.4.
  • K. Duraisamy, G. Iaccarino, and H. Xiao (2019) Turbulence modeling in the age of data. Annual Review of Fluid Mechanics 51, pp. 357–377. Cited by: §1.
  • T. Duriez, S. L. Brunton, and B. R. Noack (2017) Machine learning control-taming nonlinear dynamics and turbulence. Springer. Cited by: §5.
  • S. Esmaeilzadeh, K. Azizzadenesheli, K. Kashinath, M. Mustafa, H. A. Tchelepi, P. Marcus, M. Prabhat, A. Anandkumar, et al. (2020) Meshfreeflownet: a physics-constrained deep continuous space-time super-resolution framework. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15. Cited by: §1.
  • J. L. Flanagan (2013) Speech analysis synthesis and perception. Vol. 3, Springer Science & Business Media. Cited by: §1.
  • M. Frangos, Y. Marzouk, K. Willcox, and B. van Bloemen Waanders (2010) Surrogate and reduced-order modeling: a comparison of approaches for large-scale statistical inverse problems. In Large‐Scale Inverse Problems and Quantification of Uncertainty, pp. 123–149. External Links: ISBN 9780470685853, Document, Link, Cited by: §2.1.
  • N. Geneva and N. Zabaras (2020) Modeling the dynamics of pde systems with physics-constrained deep auto-regressive networks. Journal of Computational Physics 403, pp. 109056. Cited by: §1.
  • W. K. George (1988) Insight into the dynamics of coherent structures from a proper orthogonal decomposition dy. In International seminar on wall turbulence, Cited by: §2.3.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §2.
  • D. Ha, A. Dai, and Q. V. Le (2016) Hypernetworks. arXiv preprint arXiv:1609.09106. Cited by: §2.
  • S. Han, H. Mao, and W. J. Dally (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Cited by: §3.3.
  • T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li (2019) Bag of tricks for image classification with convolutional neural networks. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 558–567. Cited by: §4.1.
  • J. S. Hesthaven, G. Rozza, B. Stamm, et al. (2016) Certified reduced basis methods for parametrized partial differential equations. Vol. 590, Springer. Cited by: §1.
  • P. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley (2012) Turbulence, coherent structures, dynamical systems and symmetry. Cambridge university press. Cited by: §1.
  • K. Hornik, M. Stinchcombe, H. White, et al. (1989) Multilayer feedforward networks are universal approximators.. Neural networks 2 (5), pp. 359–366. Cited by: §2.2.
  • T. J. Hughes (2012) The finite element method: linear static and dynamic finite element analysis. Courier Corporation. Cited by: §1.
  • M. Kalia, S. L. Brunton, H. G. Meijer, C. Brune, and J. N. Kutz (2021) Learning normal form autoencoders for data-driven discovery of universal, parameter-dependent governing equations. arXiv preprint arXiv:2106.05102. Cited by: §1.
  • G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang (2021) Physics-informed machine learning. Nature Reviews Physics 3 (6), pp. 422–440. Cited by: §1.
  • A. Kassam and L. N. Trefethen (2005) Fourth-order time-stepping for stiff pdes. SIAM Journal on Scientific Computing 26 (4), pp. 1214–1233. Cited by: §B.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.1, §3.1.
  • N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar (2021) Neural operator: learning maps between function spaces. arXiv preprint arXiv:2108.08481. Cited by: §2.
  • S. Koziel and L. Leifsson (2013) Surrogate-based modeling and optimization. Springer. Cited by: §5.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §1.
  • J. N. Kutz (2017) Deep learning in fluid dynamics. Journal of Fluid Mechanics 814, pp. 1–4. Cited by: §1.
  • H. Lange, S. L. Brunton, and J. N. Kutz (2021) From fourier to koopman: spectral methods for long-term time series prediction.. J. Mach. Learn. Res. 22, pp. 41–1. Cited by: §1.
  • S. Lee and D. You (2019) Data-driven prediction of unsteady flow over a circular cylinder using deep learning. Journal of Fluid Mechanics 879, pp. 217–254. Cited by: §1.
  • Y. Li, E. Perlman, M. Wan, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink (2008) A public turbulence database cluster and applications to study lagrangian evolution of velocity increments in turbulence. Journal of Turbulence 9 (), pp. N31. External Links: Document, Link, Cited by: §B.4, Figure 11, Figure 12, §4.2.
  • Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2020a) Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895. Cited by: §1, §1, Figure 2, §2, §4.1.
  • Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2020b) Neural operator: graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485. Cited by: §1, Figure 2, §2.
  • Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2021a) Markov neural operators for learning chaotic systems. arXiv preprint arXiv:2106.06898. Cited by: §2.
  • Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart, K. Bhattacharya, and A. Anandkumar (2020c) Multipole graph neural operator for parametric partial differential equations. Advances in Neural Information Processing Systems 33, pp. 6755–6766. Cited by: §1, Figure 2, §2.
  • Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, and A. Anandkumar (2021b) Physics-informed neural operator for learning partial differential equations. arXiv preprint arXiv:2111.03794. Cited by: §2.
  • M. Loeve (1955) Probability theory: foundations, random sequences. New York, NY: Van Nostrand. Cited by: §2.3.
  • J. Loiseau, S. Brunton, and B. Noack (2021) From the pod-galerkin method to sparse manifold models. De Gruyter. Cited by: §2.1, §2.3.
  • Z. Long, Y. Lu, X. Ma, and B. Dong (2018) Pde-net: learning pdes from data. In International Conference on Machine Learning, pp. 3208–3216. Cited by: §1, §2.
  • L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis (2021a) Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence 3 (3), pp. 218–229. Cited by: §1, Figure 2, §2.
  • L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis (2021b) A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data. arXiv preprint arXiv:2111.05512. Cited by: §1.
  • Y. Lu, K. Jiang, J. A. Levine, and M. Berger (2021c) Compressive neural representations of volumetric scalar fields. arXiv preprint arXiv:2104.04523. Cited by: §D.2, Appendix D, item 3, Figure 5, §3.4.
  • H. F. Lui and W. R. Wolf (2019) Construction of reduced-order models for fluid flows using deep feedforward neural networks. Journal of Fluid Mechanics 872, pp. 963–994. Cited by: §1.
  • J. L. Lumley (1967) The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation. Cited by: §2.3.
  • B. Lusch, J. N. Kutz, and S. L. Brunton (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 9 (1), pp. 4950. External Links: Document Cited by: §1.
  • K. Manohar, B. W. Brunton, J. N. Kutz, and S. L. Brunton (2018) Data-driven sparse sensor placement for reconstruction: demonstrating the benefits of exploiting known patterns. IEEE Control Systems Magazine 38 (3), pp. 63–86. Cited by: §B.5, §C.3, §2.4, §3.6.
  • Z. Mao, L. Lu, O. Marxen, T. A. Zaki, and G. E. Karniadakis (2021) DeepM&Mnet for hypersonics: predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. Journal of Computational Physics 447, pp. 110698. Cited by: §2.
  • A. Mardt, L. Pasquali, F. Noé, and H. Wu (2020) Deep learning markov and koopman models with physical constraints. In Mathematical and Scientific Machine Learning, pp. 451–475. Cited by: §1.
  • A. Mardt, L. Pasquali, H. Wu, and F. Noé (2018) VAMPnets for deep learning of molecular kinetics. Nature communications 9 (1), pp. 1–11. Cited by: §1.
  • D. Masters and C. Luschi (2018) Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612. Cited by: §D.4.
  • A. Mathuriya, D. Bard, P. Mendygral, L. Meadows, J. Arnemann, L. Shao, S. He, T. Kärnä, D. Moise, S. J. Pennycook, et al. (2018) CosmoFlow: using deep learning to learn the universe at scale. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 819–829. Cited by: §1.
  • R. Maulik and G. Mengaldo (2021) PyParSVD: a streaming, distributed and randomized singular-value-decomposition library. In 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7), pp. 19–25. Cited by: §4.1.
  • R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, and D. Livescu (2020) Time-series learning of latent-space dynamics for reduced-order model closure. Physica D: Nonlinear Phenomena 405, pp. 132368. Cited by: §1.
  • B. J. McKeon and A. S. Sharma (2010) A critical-layer framework for turbulent pipe flow. Journal of Fluid Mechanics 658, pp. 336–382. Cited by: §5.
  • L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger (2019) Occupancy networks: learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470. Cited by: §1, §1, §2.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §1.
  • B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng (2020) Nerf: representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pp. 405–421. Cited by: §1.
  • A. Mohan, D. Daniel, M. Chertkov, and D. Livescu (2019) Compressed convolutional lstm: an efficient deep learning framework to model high fidelity 3d turbulence. arXiv preprint arXiv:1903.00033. Cited by: §1, §2.1.
  • A. T. Mohan and D. V. Gaitonde (2018) A deep learning based approach to reduced order modeling for turbulent flow control using lstm neural networks. arXiv preprint arXiv:1804.09269. Cited by: §1.
  • T. Murata, K. Fukami, and K. Fukagata (2020) Nonlinear mode decomposition with convolutional neural networks for fluid dynamics. Journal of Fluid Mechanics 882. Cited by: §1.
  • B. R. Noack, K. Afanasiev, M. MORZYŃSKI, G. Tadmor, and F. Thiele (2003) A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. Journal of Fluid Mechanics 497, pp. 335–363. Cited by: §1, §2.3.
  • A. Nonaka, J. Bell, M. Day, C. Gilet, A. Almgren, and M. Minion (2012) A deferred correction coupling strategy for low mach number flow with complex chemistry. Combustion Theory and Modelling 16 (6), pp. 1053–1088. Cited by: §B.3, §5.
  • S. E. Otto and C. W. Rowley (2019) Linearly recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 558–593. Cited by: §1.
  • S. Pan, N. Arnold-Medabalimi, and K. Duraisamy (2021) Sparsity-promoting algorithms for the discovery of informative koopman-invariant subspaces. Journal of Fluid Mechanics 917. Cited by: §3.5.
  • S. Pan and K. Duraisamy (2018a) Data-driven discovery of closure models. SIAM Journal on Applied Dynamical Systems 17 (4), pp. 2381–2413. Cited by: §1.
  • S. Pan and K. Duraisamy (2018b) Long-time predictive modeling of nonlinear dynamical systems using neural networks. Complexity 2018. Cited by: §1.
  • S. Pan and K. Duraisamy (2020) Physics-informed probabilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability. SIAM Journal on Applied Dynamical Systems 19 (1), pp. 480–509. Cited by: §1.
  • D. T. Papageorgiou and Y. S. Smyrlis (1991) The route to chaos for the kuramoto-sivashinsky equation. Theoretical and Computational Fluid Dynamics 3 (1), pp. 15–42. Cited by: §B.1, footnote 7.
  • J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove (2019) Deepsdf: learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174. Cited by: §1, §1, §2.
  • B. Peherstorfer and K. Willcox (2016) Data-driven operator inference for nonintrusive projection-based model reduction. Computer Methods in Applied Mechanics and Engineering 306, pp. 196–215. Cited by: §1, §1.
  • T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. W. Battaglia (2020) Learning mesh-based simulation with graph networks. arXiv preprint arXiv:2010.03409. Cited by: §1, §1.
  • E. Qian, B. Kramer, B. Peherstorfer, and K. Willcox (2020) Lift & learn: physics-informed machine learning for large-scale nonlinear dynamical systems. Physica D: Nonlinear Phenomena 406, pp. 132401. Cited by: §1, §2.1.
  • N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville (2019) On the spectral bias of neural networks. In International Conference on Machine Learning, pp. 5301–5310. Cited by: §2.2.
  • M. Raissi, A. Yazdani, and G. E. Karniadakis (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1, §1, §3.1.
  • P. Ramachandran, B. Zoph, and Q. V. Le (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941. Cited by: §3.1.
  • R. W. Reynolds, N. A. Rayner, T. M. Smith, D. C. Stokes, and W. Wang (2002) An improved in situ and satellite sst analysis for climate. Journal of climate 15 (13), pp. 1609–1625. Cited by: §3.6.
  • S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio, Y. Dauphin, and X. Glorot (2011) Higher order contractive auto-encoder. In Joint European conference on machine learning and knowledge discovery in databases, pp. 645–660. Cited by: §3.2.
  • M. Rolinek and G. Martius (2018) L4: practical loss-based stepsize adaptation for deep learning. arXiv preprint arXiv:1802.05074. Cited by: §D.4.
  • C. W. Rowley, T. Colonius, and R. M. Murray (2004) Model reduction for compressible flows using pod and galerkin projection. Physica D: Nonlinear Phenomena 189 (1-2), pp. 115–129. Cited by: §1.
  • C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and D. S. Henningson (2009) Spectral analysis of nonlinear flows. Journal of fluid mechanics 641, pp. 115–127. Cited by: §4.2.
  • O. San, R. Maulik, and M. Ahmed (2019) An artificial neural network framework for reduced order modeling of transient flows. Communications in Nonlinear Science and Numerical Simulation 77, pp. 271–287. Cited by: §1.
  • A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia (2020) Learning to simulate complex physics with graph networks. In International Conference on Machine Learning, pp. 8459–8468. Cited by: §1.
  • P. J. Schmid (2010) Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics 656, pp. 5–28. Cited by: §2.3.
  • P. J. Schmid (2022) Dynamic mode decomposition and its variants. Annual Review of Fluid Mechanics 54. Cited by: §3.5, §4.2.
  • J. Schmidhuber (1992) Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Computation 4 (1), pp. 131–139. Cited by: §2.
  • J. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of computational physics 375, pp. 1339–1364. Cited by: §1.
  • V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein (2020) Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33. Cited by: §D.3, Appendix D, item 3, §1, Figure 5, §2.2, §2, §2, §3.4.
  • J. Sobieszczanski-Sobieski and R. T. Haftka (1997) Multidisciplinary aerospace design optimization: survey of recent developments. Structural optimization 14 (1), pp. 1–23. Cited by: §2.1.
  • L. Sun, H. Gao, S. Pan, and J. Wang (2020) Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. Computer Methods in Applied Mechanics and Engineering 361, pp. 112732. Cited by: §1, §1.
  • M. Sussman (2005) A parallelized, adaptive algorithm for multiphase flows in general geometries. Computers structures 83 (6-7), pp. 435–444. Cited by: §1.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Proc. NIPS, Montreal, CA. External Links: Link Cited by: §1.
  • R. Szeliski (2010) Computer vision: algorithms and applications. Springer Science Business Media. Cited by: §1.
  • K. Taira, S. L. Brunton, S. T. Dawson, C. W. Rowley, T. Colonius, B. J. McKeon, O. T. Schmidt, S. Gordeyev, V. Theofilis, and L. S. Ukeiley (2017) Modal analysis of fluid flows: an overview. Aiaa Journal, pp. 4013–4041. Cited by: §1, §2.3.
  • K. Taira and T. Colonius (2007) The immersed boundary method: a projection approach. Journal of Computational Physics 225 (2), pp. 2118–2137. Cited by: §4.2.
  • K. Taira, M. S. Hemati, S. L. Brunton, Y. Sun, K. Duraisamy, S. Bagheri, S. T. Dawson, and C. Yeh (2020) Modal analysis of fluid flows: applications and outlook. AIAA Journal 58 (3), pp. 998–1022. Cited by: §4.2, §5.
  • N. Takeishi, Y. Kawahara, and T. Yairi (2017) Learning koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pp. 1130–1140. Cited by: §1.
  • M. Tancik, P. P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, and R. Ng (2020) Fourier features let networks learn high frequency functions in low dimensional domains. arXiv preprint arXiv:2006.10739. Cited by: §1, §2.2, §2.2.
  • R. Teyssier (2002) Cosmological hydrodynamics with adaptive mesh refinement-a new high resolution code called ramses. Astronomy & Astrophysics 385 (1), pp. 337–364. Cited by: §2.3.
  • A. Towne, O. T. Schmidt, and T. Colonius (2018) Spectral proper orthogonal decomposition and its relationship to dynamic mode decomposition and resolvent analysis. Journal of Fluid Mechanics 847, pp. 821–867. Cited by: §5.
  • J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, et al. (2014) XSEDE: accelerating scientific discovery. Computing in science & engineering 16 (5), pp. 62–74. Cited by: §5.
  • I. Turner-Trauring (2019) The fil memory profiler for python. GitHub. Note: Cited by: §3.4.
  • S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, and T. Yu (2014) Scikit-image: image processing in python. PeerJ 2, pp. e453. Cited by: §C.2.2.
  • J. Vay, P. Colella, J. Kwan, P. McCorquodale, D. Serafini, A. Friedman, D. Grote, G. Westenskow, J. Adam, A. Heron, et al. (2004) Application of adaptive mesh refinement to particle-in-cell simulations of plasmas and beams. Physics of Plasmas 11 (5), pp. 2928–2934. Cited by: §2.3.
  • H. Wang, R. Planas, A. Chandramowlishwaran, and R. Bostanabad (2022) Mosaic flows: a transferable deep learning framework for solving pdes on unseen domains. Computer Methods in Applied Mechanics and Engineering 389, pp. 114424. Cited by: §4.1.
  • Q. Wang, N. Ripamonti, and J. S. Hesthaven (2020) Recurrent neural network closure of parametric pod-galerkin reduced-order models based on the mori-zwanzig formalism. Journal of Computational Physics 410, pp. 109402. Cited by: §1.
  • S. Wang, H. Wang, and P. Perdikaris (2021a) Learning the solution operator of parametric partial differential equations with physics-informed deeponets. arXiv preprint arXiv:2103.10974. Cited by: §2.
  • S. Wang, H. Wang, and P. Perdikaris (2021b)

    On the eigenvector bias of fourier feature networks: from regression to solving multi-scale pdes with physics-informed neural networks

    Computer Methods in Applied Mechanics and Engineering 384, pp. 113938. Cited by: §2.2, §2.2.
  • S. Wiewel, M. Becher, and N. Thuerey (2019) Latent space physics: towards learning the temporal evolution of fluid flow. In Computer graphics forum, Vol. 38, pp. 71–82. Cited by: §3.2.
  • J. Xu and K. Duraisamy (2020) Multi-level convolutional autoencoder networks for parametric prediction of spatio-temporal dynamics. Computer Methods in Applied Mechanics and Engineering 372, pp. 113379. Cited by: §1, §2.1.
  • J. Xu, A. Pradhan, and K. Duraisamy (2021) Conditionally parameterized, discretization-aware neural networks for mesh-based modeling of physical systems. Advances in Neural Information Processing Systems 34. Cited by: §1.
  • Z. J. Xu, Y. Zhang, T. Luo, Y. Xiao, and Z. Ma (2019) Frequency principle: fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523. Cited by: §2.2.
  • E. Yeung, S. Kundu, and N. Hodas (2019) Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. Cited by: §1.
  • K. Yu, B. Dorschner, and T. Colonius (2022) Multi-resolution lattice green’s function method for incompressible flows. Journal of Computational Physics, pp. 110845. Cited by: §1.
  • M. J. Zahr and C. Farhat (2015) Progressive construction of a parametric reduced-order model for pde-constrained optimization. International Journal for Numerical Methods in Engineering 102 (5), pp. 1111–1135. Cited by: §1.
  • L. Zhang, J. Han, H. Wang, R. Car, and E. Weinan (2018) Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Physical review letters 120 (14), pp. 143001. Cited by: §1.
  • Y. Zhu and N. Zabaras (2018) Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics 366, pp. 415–447. Cited by: §1, §2.