Log In Sign Up

Multiscale and Nonlocal Learning for PDEs using Densely Connected RNNs

Learning time-dependent partial differential equations (PDEs) that govern evolutionary observations is one of the core challenges for data-driven inference in many fields. In this work, we propose to capture the essential dynamics of numerically challenging PDEs arising in multiscale modeling and simulation – kinetic equations. These equations are usually nonlocal and contain scales/parameters that vary by several orders of magnitude. We introduce an efficient framework, Densely Connected Recurrent Neural Networks (DC-RNNs), by incorporating a multiscale ansatz and high-order implicit-explicit (IMEX) schemes into RNN structure design to identify analytic representations of multiscale and nonlocal PDEs from discrete-time observations generated from heterogeneous experiments. If present in the observed data, our DC-RNN can capture transport operators, nonlocal projection or collision operators, macroscopic diffusion limit, and other dynamics. We provide numerical results to demonstrate the advantage of our proposed framework and compare it with existing methods.


page 1

page 2

page 3

page 4


Finite Difference Nets: A Deep Recurrent Framework for Solving Evolution PDEs

There has been an arising trend of adopting deep learning methods to stu...

Deep-learning based discovery of partial differential equations in integral form from sparse and noisy data

Data-driven discovery of partial differential equations (PDEs) has attra...

Particles to Partial Differential Equations Parsimoniously

Equations governing physico-chemical processes are usually known at micr...

A new reduced order model of linear parabolic PDEs

How to build an accurate reduced order model (ROM) for multidimensional ...

Taylor Swift: Taylor Driven Temporal Modeling for Swift Future Frame Prediction

While recurrent neural networks (RNNs) demonstrate outstanding capabilit...

Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

Fast and accurate solution of time-dependent partial differential equati...

Implicit Neural Spatial Representations for Time-dependent PDEs

Numerically solving partial differential equations (PDEs) often entails ...

1 Introduction

Data-driven discovery of PDEs is experiencing unprecedented development over the past few years, wherein various kinds of PDEs (featuring, for example, time dependence and nonlinearity) have been studied. In this work, we consider the learning problem for a class of PDEs that involve multiple time/spatial scales and nonlocal operators – kinetic equations. These are an important class of equations in multiscale modeling hieriarchy which bridges microscopic atomistic models (such as N-body Newton equations) and macroscopic continuum models (such as Navier-Stokes equations). For a variety of scientific problems ranging from gas/plasma dynamics, radiative transfer to social/biological systems, kinetic equations have demonstrated their ability to accurately model the dynamics of complex systems [Villani02]. To the best of our knowledge, learning of multiscale kinetic equations, albeit important, has never been explored in the literature.

Specifically, we are interested in developing an efficient symbolic neural network to fit time-dependent data for a class of multiscale kinetic equations. The overall goal is to identify an explicit formula of the map that determines the evolution for and . Therefore, a symbolic neural network with parameters and

is constructed and the following loss function is minimized to find the best parameter set:


Note that approaches the correct model as . The choice of the norm above is flexible. In this paper, we focus on the -norm because our numerical experiments show that it is slightly better than others, e.g., the -norm. Due to the multiscale and nonlocal feature of our target equations, existing learning schemes may not be efficient. We will propose novel symbolic neural networks, new formulations of the loss function in (1), and new regularization methods in this paper to tackle this challenge.

Our first main contribution is a new symbolic neural network built with multiscale and nonlocal features. The key idea for capturing multiscale phenomena is to construct as a sum of different components at different scales of order , where is an integer degree and is a trainable multiscale separator defined by:


with as a trainable parameter. In particular, we propose


where and is the network at the

-th scale. Thus, unlike conventional deep learning recovery algorithms as in

[DBLP:journals/corr/abs-1711-10561, DBLP:journals/corr/abs-1711-10566, BinD18, harlim2020machine, WU2020109307, zslg18], our algorithm is aware of different scales and thus more accurately captures different components at scale .

The key idea to make capable of capturing nonlocal phenomena is to incorporate nonlocal operators in in (3) to construct . Conventionally, is typically constructed as a linear combination of mathematical operators in a pre-specified dictionary, and the combination coefficients are learned via minimizing (1) with sparsity regularization to obtain sparse linear combinations as in [Kaiser2018SparseIO, Schaeffer2017ExtractingSH, Mangan2017ModelSF, Brunton3932, zslg18]. For high-dimensional problems, constructing such a dictionary can be very costly. Hence, we will apply symbolic recurring neural network (RNN) of mathematical operators as in [BinD17, BinD18]

without specifying a large dictionary. Intuitively, due to the high expressiveness of our symbolic RNNs, the class of RNNs with different parameters can form a large dictionary without pre-specifying a costly dictionary. It might be computationally more efficient to use symbolic RNNs to classify the dynamics of data and choose a trainable symbolic model to model data.

The most basic elements in our RNN are a set of (either local or nonlocal) mathematical operators modeling the dynamics in kinetic equations, such as transport, collision, and diffusion operators. The trainable compositions of these basic operators form a basis of our RNN, i.e., each term in (3) is a trainable linear combination of the compositions defined below:


where with entries in . More precisely, we have


where coefficients depend on trainable parameters , and

is a set of index vectors

specified by our symbolic RNN as we shall see later. Similar to polynomial regression [doi:10.1080/00224065.1997.11979762, GERGONNE1974439], our RNN returns a multivariate polynomial of the operators . Due to the expressive power of neural networks [yarotsky2017, Shen2, Shen3, li2020curse, Poggio2017WhyAW, liang2021reproducing], our symbolic RNN of a small size can generate a sufficiently large index vector set . The formulation in (5

) is also natural in physics, equations derived from asymptotic analysis often have recursive structure similar to the compositional operators in (

5), e.g., see [SR09].

Our second main contribution is to propose novel loss functions based on high-order IMEX schemes to discretize of the integral in (1). The most typical numerical method, the forward-Euler scheme, results in the loss function:


which is commonly used in the discovery of governing equations. Though higher order approximations using multistep methods have been investigated in [DBLP:journals/corr/abs-1711-10561, keller2020discovery, Raissi2018MultistepNN, du2021discovery], there is no existing research on the effectiveness of IMEX schemes in the literature of discovering governing equations. For kinetic equations, since they often contain non-stiff terms as well as stiff terms, the IMEX schemes are the natural choices and have demonstrated their power in various applications [PR05, DP13, DP17]. We will consider both IMEX Runge-Kutta schemes such as IMEX-ARS scheme [ARS97] and IMEX multistep schemes such as IMEX-BDF scheme [HR07]. These propagation schemes together with the RNNs will make up our “densely connected recurrent neural network” (DC-RNN).

Our third main contribution is to propose physics-based regularization to the loss function in (1) to improve optimization efficiency and avoid over-fitting. First, a physically correct model is usually described with a small number of mathematical operators in (5), while an over-fitting model would have a large number of operators for a better fitting capacity. Thus, inspired by the lasso approaches in [Tibshirani1996RegressionSA, BSPJ15, 8573778], we propose sparse regularization to avoid over-fitting and remove undesirable features in the governing equation, e.g., adding a -norm penalty term to the coefficients in (5). Second, a micro-macro decomposition of kinetic equations [LM:08] is applied to transfer a challenging recovery problem with a single PDE to an easier recovery problem with a coupled PDE system, enforcing our recovery results to be more physically meaningful. Furthermore, the microscopic part, denoted as (see Section 2), satisfies


which will be used as a constraint of our recovery. Finally, in most cases, kinetic equations have spatial-dependent coefficients, which motivates us to design spatial-dependent parameters in (5) and the regularity in terms of can also be considered as a regularization penalty.

To summarize, the main highlights of our learning algorithm are as follows:

  • DC-RNN built for transport (local) and collision (nonlocal) operators typically involved in kinetic equations.

  • Multiscale-aware RNN structures and learning rates for the recovery of time-dependent PDEs.

  • Novel optimization loss function inspired by high-order IMEX for multiscale equations.

  • Physics-aware loss function and regularization specialized for kinetic equations.

  • Efficient arithmetic and memory cost.

We structure this manuscript as follows. In Section 2, an exemplary PDE for our learning problem is introduced to motivate our algorithm. In Section 3, we mathematically formulate an ansatz that we will use to fit data to PDEs. In Section 4, our physics-aware loss function is introduced to learn PDEs from data. In Section 5, we will carry out several numerical experiments to test our algorithm. Finally, concluding remarks are made in Section 6.

2 Model Equation: the Linear Transport Kinetic Equation

We now present a model equation, the linear transport equation, to motivate our learning algorithm. The linear transport equation is a prototype kinetic equation describing particles such as neutrons or photons interacting with a background medium [Chandrasekhar, Davison]. This equation highlights some of the challenging aspects that an efficient learning algorithm should account for. That is, our model equation will allow us to understand the hypothesis space (the set of functions describing kinetic equations) better. This will lead us to devise ways to capture multiple scales, nonlocal operators, and regularity conditions. In addition, we will be able to discern appropriate numerical techniques needed to carry out our learning algorithm.

In the simple 1D case, the linear transport equation reads



is the probability density function of time

, position , and velocity ; is a projection or collision operator; and are the scattering and absorption coefficients; and is a given source. Finally, is a dimensionless parameter indicating the strength of the scattering. Indeed, when , the equation (8) is in the fully kinetic regime (all operators balance); when , the scattering is so strong that (8) approaches a diffusion limit. To see this, consider the so-called micro-macro decomposition of :


where is the macro part (density) of the solution, and is the micro part. A crucial condition we use is


Equation (10) is the conservation condition and will be numerically indispensable since it allows us to impose exact conditions satisfied by kinetic equations. Substituting (9) into (8), one can derive the following coupled system for and , equivalent to (8):


where denotes the identity operator.

In (12), if , one obtains


which, when substituted into (11), yields


So follows the dynamics of a diffusion equation. We now go through a few things that we can learn from the linear transport equation following the notations used in Section 1.

Involved Basic Mathematical Operators: Identity, Advection, and Projection. Notice that each of Equations (8), (11), and (12) can be recovered from the ansatz:


for , , or . For example, the equation for can be recovered provided


with coefficients


Thus, at the very minimum, our hypothesis space in Equation (3) should involve operators in (16). We expect to see these operators for general kinetic equations. Potentially one can also have cubic or higher order nonlinearities in our hypothesis space. Therefore, we want to generalize Equation (15) to involve greater number of compositions.

Functions of . Equations (8), (11), and (12) involve the functions , , and . Therefore the coefficients should be allowed to depend on .

Scale Disparity. If we want to determine the correct order of each term, then we need to make an asymptotic expansion:


where it is understood that the upper index labels the order of the scale. The multiscale phenomenon here is the main motivation of the multiscale model in Equation (3).

Exact Conditions. Typically, adding regularization to machine learning problems can vastly improve the outcome of the prediction. There is one obvious constraint for our target kinetic equation: Equation (10). An added feature about this condition is that it is independent of and thus helpful for modeling dynamics between the small and large scale limits.

Sparsity. The large number of basis terms in our hypothesis space means that we might have overfitting issues. Thus, the following sparsity regularization term could be considered:


which can be enforced by adding the following regularization term to our loss function in Equation (1):


since is the actual parameters to be optimized in our model in Equation (3).

Numerical Techniques. Finally, we need to consider numerical methods for arriving at the correct set of trainable parameters. For this reason, we will use the IMEX schemes for propagating , where the time rate of change is given by an ansatz like Equation (15). We will use gradient descent, specifically the Adam algorithm, to update trainable parameters.

In sum, the discussion above illustrates the motivation of our optimization problem, model design, and regularization terms introduced in Section 1.

3 Formulating an Ansatz to Fit Data to Kinetic Equations

In this section, we will construct an ansatz capable of representing Equations (11), (12). For simplicity, let us focus on the case when the spatial variable is one-dimensional. It is easy to generalize to high-dimensional cases. We start by introducing notations which will be used throughout the paper.

Notation. The functions involved in (11) or (12) are multidimensional, e.g. and . The values of and will be defined on a mesh and for , , and . To further simplify the notation, we will use to denote as a scalar function evaluated at the -th position corresponding to . The upper index in will correspond to time with denoting evaluated at a time sequence. Matrices will be written with capital letters while operators applied to the data will mainly be written using script letters.

3.1 Operator Evaluation

We will describe the evaluation of commonly used operators in Equations (11), (12), and other kinetic equations below.

1) Identity operator. The identity operator is defined by


The evaluation of at the point simply follows .

2) Pseudo-upwind for the advection operator. We define the advection operator acting on as the dot-product:


where is a velocity distribution. We note that many stable schemes use an upwind stencil for the advection operator. The first-order upwind stencil gives:


which is the evaluation of the advection operator in (22) at the point in the one-dimensional case. This stencil is suitable for a first-order-in-time IMEX-scheme. For higher-order IMEX schemes, one should use higher-order stencils.

3) Projection operators. We define the projection operator as an integral with respect to the variable of a function . In one-dimension, we have:


which can be discretized as a finite sum using Gauss quadrature:

with quadrature weights . Note that the data corresponding to

is represented by a two-index tensor

with corresponding to the values and , respectively. The above quadrature maps to a one-index tensor . To make dimensions consistent, we extend this to a two-index tensor by for each .

4) Other differential operators. Higher-order differential operators such as the Laplacian will be computed by using central difference formulas.

3.2 Ansatz for Fitting PDEs to Data

In this section, we will form an ansatz that will be used to fit a PDE to data, i.e., identifying the governing PDE to which the observed data is a discrete solution. We will consider the following two typical examples for simplicity. The generalization to other cases is simple.

Scalar equation ansatz. Let us consider a first-order in time PDE, then the equation ansatz is built as


with split into multiscale components following our main model in (3):

The integer depends on the number of multiscale components for the problem being considered. If one only expects one fast scale and one slow scale component, is set to . For slow, medium, and fast scales, is set to , etc. is a learnable scaling number defined in (2) restricted to but not necessarily equal to . The operators will be differential operators acting on and constructed as in (5). The construction detail will be provided in the next section.

Two-component vector equation ansatz. For vectorized equations, we build an ansatz for each component individually as


The are generally different operators for each ,, and following the construction in (5). Each has an individual set of network parameters. is a learnable scaling number as in the previous example. For the remainder of the manuscript, will denote the right hand side of the -equation. will denote the right hand side of the -equation. The construction of using an RNN structure will be presented in detail in the next section.

Remark 1

Equation (25) is our chosen ansatz. There are many alternative ways to construct an ansatz. For example, we only consider the linear combination of and it is also possible to explore the products of .

3.3 Building a Dictionary Using RNNs

We construct the operators in Equation (24) for the single-component case. For the multicomponent case as in (25), we construct in the same manner. The only difference is that each will have a different set of parameters depending on . We will omit the index for clarity. To begin with, we will need to supply the RNN with a few basic mathematical operators as mentioned in the introduction. In particular, we consider operators , , and discussed in Section 3.1. It is potentially better to include more basic mathematical operators such as , , , etc.

Next, a symbolic RNN will be introduced to generate a complicated operator using basic mathematical operators. Given basic mathematical operators , we build a -layer RNN for each by successively applying a weight matrix to the operator vector

and then adding a bias vector

times :


Because and are operators, they can be applied to generate a more expressive formulation with a special “composition” denoted as defined below:


where denotes the standard composition.

Now we define :


where the weight matrices are given by:


for and,

with each . The biases are given by:

with .

The operator built by the recursive compositions in (28) is a symbolic RNN operator, the evaluation of which on a given function follows in the basic evaluation rules introduced in Section 3.1. will have to be evaluated at both data sets and , since our model problem depends on both and . A diagrammatic representation of this RNN is shown in Figure LABEL:NNfig.