Log In Sign Up

MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework

by   Chiyu "Max" Jiang, et al.

We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MeshfreeFlowNet on the task of super-resolution of turbulent flows in the Rayleigh-Benard convection problem. Across a diverse set of evaluation metrics, we show that MeshfreeFlowNet significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MeshfreeFlowNet and show that it efficiently scales across large clusters, achieving 96.80 less than 4 minutes.


page 1

page 2

page 5

page 6

page 10

page 11

page 12

page 13


Fast Spatio-Temporal Residual Network for Video Super-Resolution

Recently, deep learning based video super-resolution (SR) methods have a...

3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks

In video super-resolution, the spatio-temporal coherence between, and am...

The SPDE approach for spatio-temporal datasets with advection and diffusion

In the task of predicting spatio-temporal fields in environmental scienc...

A Simple Plugin for Transforming Images to Arbitrary Scales

Existing models on super-resolution often specialized for one scale, fun...

Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time

Modern neural interfaces allow access to the activity of up to a million...

Multi-Grid Back-Projection Networks

Multi-Grid Back-Projection (MGBP) is a fully-convolutional network archi...

Code Repositories


MeshfreeFlowNet: Physical Constrained Space Time Super-Resolution

view repo

1 Introduction

In recent years, along with the significant growth of computational resources, there has been an increasing attempt towards accurately resolving the wide range of length and time scales present within different physical systems. These length and time scales often differ by several orders of magnitude. Such time and length scale disparities are commonly observed in different physical phenomena such as turbulent flows, convective diffusive systems, subsurface systems, multiphase flows, chemical processes, and climate systems [Hu2007, Hurrell2009, Bagchi2010, Liu1995, Szymczak2013, Qin2020, Quan2011, RIAZ2006, Esmaeilzadeh2020, Esmaeilzadeh2019]. Yet many key physical quantities of interest are highly dependent on correctly resolving such fine scales, such as the energy spectrum in turbulent flows.

From a numerical perspective, resolving the wide range of spatio-temporal scales within such physical systems is challenging since extremely small spatial and temporal numerical stencils would be required. In order to alleviate the computational burden of fully resolving such a wide range of spatial and temporal scales, multiscale computational approaches have been developed. For instance, in the subsurface flow problem, the main idea of the multiscale approach is to build a set of operators that map between the unknowns associated with the computational cells in a fine-grid and the unknowns on a coarser grid. The operators are computed numerically by solving localized flow problems. The multiscale basis functions have subgrid-scale resolutions, ensuring that the fine-scale heterogeneity is correctly accounted for in a systematic manner [Jenny2006, Jenny2005, Jenny2003]. Similarly, in turbulent flows [Bramkamp2004, Modest2005], climate forecast [Shen2013], multiphase systems [Luo2009, Bi2004], reactive transport [Bauer2001, Odman1991], complex fluids [Ren2005] and biological systems [Perdikaris2016], multiscale approaches attempt to relate the coarse-scale solutions to the fine-scale local solutions taking into account the fine-scale physics and dynamics. In these multiscale approaches, although the small scale physics and dynamics are being captured and accounted for at larger scales, the fine-grid solutions are extremely costly and hence impractical to solve for. Reconstructing the fine-scale subgrid solutions by having coarse-scale solutions in space and/or time still remains a challenge.

In this manuscript, we refer to such a process of reconstructing the fine-scale subgrid solutions from the coarse-scale solutions as super-resolution, where the high-resolution solutions are reconstructed using the low-resolution physical solutions in space and/or time. One key insight is that there are inherent statistical correlations between such pairs of low-resolution and high-resolution solutions that can be leveraged to reconstruct high-resolution solutions from the low-resolution inputs. Furthermore, this super-resolution process can be effectively modeled using deep learning models that learn such statistical correlations in a self-supervised manner from low-resolution and high-resolution pairs that exist in a training dataset. A successful super-resolution model should be able to efficiently represent the high-resolution outputs, effectively scale to large spatio-temporal domains as in a realistic problem setting, and allow for incorporating physical constraints in the form of PDEs to regularize the outputs to physically valid solutions. Furthermore, in order for a learning-based methodology to be applied to real problems such as Computational Fluid Dynamics (CFD) simulations and climate modeling, the methodology needs to address the High Performance Computing (HPC) challenges of scalability to large scale computing systems, for both the training and inference stages.

To this end, we propose , a novel physics-constrained, deep learning based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. first maps the low-resolution inputs to a localized latent context grid using a convolutional encoder, which can then be continuously queried at arbitrary resolutions. In summary, our main contributions are as follows:

  • We propose , a novel and efficient physics-constrained deep learning model for super-resolution tasks.

  • We implement a set of physics-based metrics that allow for an objective assessment of the reconstruction quality of the super-resolved high-resolution turbulent flows.

  • We empirically assess the effectiveness of the framework on the task of super-resolving turbulent flows in the Rayleigh–Bénard convection problem, showing a consistent and significant improvement in recovering key physical quantities of interest in super-resolved solutions over competing baselines.

  • We demonstrate the scalability of our framework to larger and more challenging problems by providing a large scale implementation of that scales to up to 128 GPUs while retaining a scaling efficiency.

2 Related Works

Recently, deep learning models have been applied for studying fluid flow problems in different applications. In particular, a certain class of research works has studied flow problems on a grid representation where dynamic flow properties (e.g., velocity, pressure, etc.) could be considered as structured data similar to images. For instance, guo2016convolutional used convolutional architectures for approximating non-uniform steady laminar flow around 2D and 3D objects. Similarly, bhatnagar2019prediction

used convolutional neural networks for prediction of the flow fields around airfoil objects and for studying aerodynamic forces in a turbulent flow regime.

Zabaras used Bayesian deep convolutional neural networks to build surrogate flow models for the purpose of uncertainty quantification for flow applications in heterogeneous porous media.

Alternatively, a different area of research has attempted to solve PDEs governing different physical phenomena by using simple fully connected deep neural networks. In order to enforce physical constraints, the PDEs under consideration are added as a term to the loss function at the training stage. For instance,


used fully connected deep neural networks with skip connections for solving different classes of PDEs such as Poisson equations and eigenvalue problems.

bar2019unsupervised used fully connected deep neural networks for solving forward and inverse problems in the context of electrical impedance tomography governed by elliptical PDEs. long2017pde proposed a framework to uncover unknown physics in the form of PDEs by learning constrained convolutional kernels. raissi2019physics applied fully connected neural networks for solving PDEs in a forward or inverse setup. smith2020eikonet deployed residual neural networks to solve the Eikonal PDE equation and perform ray tracing for an application in earthquake hypo-center inversion and seismic tomography. Using their proposed framework they solved PDEs in different contexts such as fluids, quantum mechanics, seismology, reaction-diffusion systems, and the propagation of nonlinear shallow-water waves.
More recently, alternative representations for spatial functions have been explored. Anima2020 proposed using message passing on graph networks in order to map between input data of PDEs and their solutions. They showed that the learned networks can generalize between different PDE approximation methods (e.g., Finite Difference and Finite Element methods) and between approximations that correspond to different levels of discretization. Similar to our framework in using a latent context grid that can be continuously decoded into an output field, jiang2020lig

used such representations for the computer vision task of reconstructing 3D scenes, where latent context grids are decoded into continuous implicit functions that represent the surfaces of different geometries.

In the area of fluid dynamics, for turbulent flow predictions, wang2019towards developed a physics-informed deep learning framework. They introduced the use of trainable spectral filters coupled with Reynolds-averaged Navier-Stokes (RANS) and Large Eddy Simulation (LES) models followed by a convolutional architecture in order to predict turbulent flows. jiang2020enforcing presented a differentiable physics layer within the neural networks using spectral methods in order to enforce hard linear constraints. By performing a linear projection in the spectral space, they were able to enforce a divergence-free condition for the velocity. Accordingly, they could conserve mass within the physical system both locally and globally, allowing the super-resolution of turbulent flows to be statistically accurate.
In computer vision applications, the super-resolution (SR) process has been proposed in the context of reconstructing high-resolution (HR) videos/images from the low-resolution (LR) ones. In the literature, different classical super-resolution methods have been proposed such as prediction-based methods [Irani1991, Keys1981], edge-based methods [Freedman2011, Sun2008], statistical methods [KwangInKim2010, ZhiweiXiong2010], patch-based methods [Freeman2002, HongChang2004] and sparse representation methods [JianchaoYang2010, JianchaoYang2008]. Most recently, deep learning based super-resolution models have also been actively explored and differ mainly in their types of specific intended application, architectures, loss functions, and learning principles [Lai2017, Ahn2018, Johnson2016, Bulat2018, Lim2017, Wang2018].

3 Preliminaries

In this section, we provide a detailed explanation of the problem set up, notations, data sets, along with the learning paradigm and evaluation metrics.

3.1 Problem Setup

Consider a space and time dependent partial differential equation (PDE) with a unique solution, written in the general form as


defined on a spatio-temporal domain where represents a dimensional spatial domain, and represents the one dimensional temporal domain. Here the is the operator defining the PDE within the domain (), is the source function, is the boundary condition function, is the operator defining the PDE on the boundary (), and the solution to the PDE in Eqn. 1 is .

For a partial differential equation (Eqn. 1), a problem instance is realized with a given source and boundary condition functions and . For a pair , we consider to be an operator that generates the high-resolution solution , i.e., , and to be an operator that produces the low-resolution solution , i.e., . Consider a compact normed space of operators , a set of operators , mapping low-resolution solutions to high-resolution ones. For a given compact normed space of functions , let denote the approximation gap with respect to , i.e.,


with continuous approximation error in . Here the norm is with respect to a desired space where and is a preferred measure333Note that in the max-min game of the Eqn. 2, the action of the environment player, , is revealed to the approximator player to choose . Therefore, since the game is in the favor of the approximating player, the value of the game, can be much smaller than its min-max counterpart. We require the min-max value to be desirably small when we aim to have a single model to perform well on a set of designated problems.. In this work, given a problem instance specified with , we are interested in learning

using parametric models. In order to map a low-resolution solution to its corresponding super-resolution with low approximation error, we require

to be small and comparable with a desired error level. Therefore, given a problem set , a proper and rich class of operators, , allows for a desirable approximation error.

For a given PDE in Eqn 1, defined on a hyper-rectangular domain in , we consider a spatio-temporal grid of resolutions with denoting the number of discretized points in time domain, and a data set , of points and their solution values.

t1 n1

Figure 1: Configuration of the Rayleigh–Bénard instability problem - Cold and hot plates have temperatures of and and separated with a distance of . Flow parameters , , respectively are the thermal expansion coefficient, kinematic viscosity, and thermal diffusivity. is the gravity acceleration in the direction, and , respectively are the length of the plates and their separation distance. and respectively refer to the coordinates of the Cartesian frame of reference.

t1 n1

Figure 2: An illustration of a typical solution to the Rayleigh–Bénard problem (Eqns. (3a)-(3c)). The contours respectively show the solution for temperature (), pressure (), and the two velocity components ( and ). For this simulation case , , and []. The spatial resolution is = = , and upon an adaptive time stepping scheme the presented solution above is obtained at [].

3.2 Dataset Overview

In this work, we generate the dataset as the solution to a classical fluid dynamics system with a chaotic nature. We consider the well-known Rayleigh–Bénard instability problem in 2D where a static bulk fluid (kinematic viscosity , thermal diffusivity ) is initially occupying the space between two horizontal plates (see Fig. 1). The lower and upper plates are considered to be respectively hot and cold with temperatures of and . Gradually the temperature of the bulk fluid adjacent to the hot (cold) plate increases (decreases) and due to the buoyancy effects and density gradient the bulk fluid ascends (descends), leading to the formation of vortices and growth of flow instability in a chaotic and turbulent regime. The governing partial differential equations for the Rayleigh–Bénard instability problem are


where and with and being the Rayleigh and Prandtl numbers respectively defined as and , with , , , , , and respectively being the gravity acceleration, thermal expansion coefficient, kinematic viscosity, thermal diffusivity, temperature difference between hot and cold plates, and the separation length between the plates. From a physical perspective quantifies the balance between the gravitational forces and viscous damping, and quantifies the ratio between momentum diffusivity and thermal diffusivity (balance between heat convection and conduction).

We use the Dedalus framework [Burns2019] in order to numerically solve the system of Equations (3a)-(3c) using the spectral method approach. We solve Equations (3a)-(3c) for a duration of in time with a time step size of . In a coordinate system of -axis, and -axis, we consider a plate length and a separation distance , and discretize the domain with and points respectively in the and directions.

For the simulation cases, we consider Rayleigh and Prandtl numbers respectively in the range of and . Upon solving the system of Eqns. (3a)-(3c), we create a high-resolution Rayleigh–Bénard simulation dataset , unless otherwise mentioned, with a spatial resolution of = = , and a temporal resolution of = (upon adaptive time stepping). We consider a normalized domain size of unit length in -direction with a domain aspect ratio of 4, i.e., [], and solve the Rayleigh–Bénard problem for a duration of 50 []. Then we create a low-resolution dataset , by downsampling the high-resolution data in both space and time. We use downsampling factors of and for creating the low-resolution data in the temporal and spatial dimensions respectively.

3.3 Evaluation Metrics

In this work, we use multiple physical metrics, each accounting for different aspects of the flow field, in order to report the evaluation of the model for super-resolving low-resolution data. As the specific metrics of evaluation, we report the Normalized Mean Absolute Error (NMAE) and R2 score for the physical metrics between the ground truth and the predicted high-resolution data. Such physical metrics are listed in the following.

  • Total Kinetic Energy () : the kinetic energy per unit mass associated with the flow is defined as the total kinetic energy and can be expressed as  .

  • Root Mean Square Velocity () : the square root of the scaled total kinetic energy is defined as the root mean square (RMS) velocity as  .

  • Dissipation () : is the rate at which turbulence kinetic energy is converted into thermal internal energy by molecular viscosity and defined as with and

    respectively being the rate of strain tensor and the kinematic viscosity.

  • Taylor Microscale () : is the intermediate length scale at which viscous forces significantly affect the dynamics of turbulent eddies and defined as . Length scales larger than the Taylor microscale (i.e., inertial range) are not strongly affected by viscosity. Below the Taylor microscale (i.e., dissipation range) the turbulent motions are subject to strong viscous forces and kinetic energy is dissipated into heat.

  • Taylor-scale Reynolds () : is defined as the ratio of RMS inertial forces to viscous forces and is expressed as  .

  • Kolmogorov Time () and Length () Scales : Kolmogorov microscales are the smallest scales in turbulent flows where viscosity dominates and the turbulent kinetic energy dissipates into heat. Kolmogorov time and length scale can be respectively expressed as and  .

  • Turbulent Integral Scale () : is a measure of the average spatial extent or coherence of the fluctuations and can be expressed as  .

  • Large Eddy Turnover Time () : is defined as the typical time scale for an eddy of length scale to undergo significant distortion and is also the typical time scale for the transfer of energy from scale to smaller scales, since this distortion is the mechanism for energy transfer and expressed as  .

[.65] [.35]

Figure 3:

Schematic for the training pipeline for model for continuous space-time super-resolution. An input low-resolution grid is fed to the Context Generation Network that creates a Latent Context Grid. A random set of points in the corresponding space-time domain is sampled to query the Latent Context Grid, and the physical output values at these query locations can be continuously decoded using a Continuous Decoding Network, implemented as a Multilayer Perception. Due to the differentiable nature of the MLP, any partial derivatives of the output physical quantities with respect to the input space-time coordinates can be effectively computed via backpropagation, that can be combined with the PDE constraints to produce an Equation Loss. On the other hand, the predicted value can be contrasted with the ground truth value at these locations produced by interpolating the high-resolution ground truth to produce a Prediction Loss. Gradients from the combined losses can be backpropagated to the network for training.

Figure 4:

Schematic for the continuous decoding module for . The continuous decoding network is a Multilayer Perception that inputs the spatio-temporal coordinates of a query point, along with a latent context vector, and is decoded into the required physical channels of interest. Since each query point falls into a cell bounded by 8 neighboring vertices, the query is performed 8 times, each using a different latent context vector and a different relative spatio-temporal coordinate with respect to each vertex. The values are then interpolated using trilinear interpolation to get the final value at the query point.

3.4 Systems, Platforms and Configuration

In order to further illustrate the feasibility of utilizing our proposed framework for large physical systems requiring processing orders of magnitude more computation, here we study the scalability of . We scale our model on a GPU stack on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC). We use data distributed parallelism with synchronous gradient descent, and test performance up to 16 nodes with a total of GPUs.

In a data-parallel distribution strategy, the model is replicated over all GPUs. At every step, each GPU receives its own random batch of the dataset to calculate the local gradients. Gradients are averaged across all devices with an all_reduce operation. To achieve better scaling efficiency, the communication of one layer’s gradients is overlapped with the backprop

computation of the previous layer. PyTorch

torch.distributed [torch_distributed] package provides communication primitives for multiprocess parallelism with different collective communication backends. We use NVIDIA’s Collective Communications Library (NCCL) [nccl] backend which gave us the best performance when running on GPUs, both within and across multiple nodes. The PyTorch torch.nn.parallel.DistributedDataParallel [torch_ddp] wrapper, which we use for the results in this paper, builds on top of the torhch.distributed package to provide efficient data-parallel distributed training. With this setup, we achieve more than throughput efficiency on 16 Cori GPU nodes, 128 GPUs in total. Cori has 8 V100 (Volta) GPUs per node, the GPUs are interconnected with NVLinks in hybrid cube-mesh topology [cori_gpu_topog]. The nodes are equipped with Mellanox MT27800 (ConnectX-5) EDR InfiniBand network cards.


Figure 5: A schematic for the architecture of the framework. consists of two end-to-end trainable modules: the Context Generation Network, and the Continuous Decoding Network. The Baseline (II) method that we exhaustively compare with share the same 3D U-Net structure in the encoding arm, but instead uses a convolutional decoder for producing the final output. In comparison, the allows for continuous output that can be sampled at arbitrary space-time locations.

In this work, we propose the framework as a novel computational algorithm for constructing the super-resolution solutions to partial differential equations using their low-resolution counterpart solutions.

A schematic for the training pipeline for the model is presented in Fig. 4. The model inputs a low-resolution spatio-temporal grid that can be acquired from a PDE solver, along with query locations that can be any continuous value within the range of the input domain. produces a predicted output vector at each query location.

consists of two subnetworks, namely the Context Generation Network, and the Continuous Decoding Network. The Context Generation Network produces a Latent Context Grid, which is a learned localized representation of the function. The Latent Context Grid contains latent context vectors at each grid vertex, which along with spatio-temporal coordinates , can be concatenated and fed into the Context Generation Network, modeled as a Multilayer Preceptron (MLP), to generate the physical outputs . For training the model, two loss terms, namely the prediction loss and the equation loss are used. We present a more detailed architectural breakdown for our method and our baseline method in Fig. 5. We discuss the details of the Context Generation Network (Sec. 4.1), Continuous Decoding Network (Sec. 4.2), and loss functions (Sec. 4.3) in the following subsections.

100NMAE (R2)
avg. R2
0 0.664 (0.9992) 0.752 (0.9986) 0.701 (0.9991) 0.643 (0.9978) 0.503 (0.9987) 0.884 (0.9968) 0.849 (0.9974) 1.311 (0.9925) 0.548 (0.9994) 0.9977
 0.0125 0.698 (0.9990) 0.666 (0.9987) 0.671 (0.9989) 0.554 (0.9983) 0.408 (0.9991) 0.805 (0.9971) 0.767 (0.9978) 1.048 (0.9946) 0.515 (0.9992) 0.9981
0.025 1.271 (0.9976) 0.742 (0.9988) 1.261 (0.9976) 0.623 (0.9984) 0.682 (0.9984) 0.849 (0.9972) 0.819 (0.9978) 0.821 (0.9966) 0.931 (0.9984) 0.9978
0.05 0.951 (0.9985) 0.732 (0.9987) 0.948 (0.9985) 0.611 (0.9977) 0.473 (0.9987) 0.929 (0.9962) 0.881 (0.9971) 0.811 (0.9964) 0.732 (0.9989) 0.9978
0.1 1.691 (0.9960) 1.091 (0.9970) 1.672 (0.9960) 0.902 (0.9967) 0.852 (0.9975) 1.221 (0.9940) 1.193 (0.9950) 1.269 (0.9931) 1.152 (0.9976) 0.9959
0.2 2.031 (0.9940) 2.093 (0.9926) 2.041 (0.9938) 1.623 (0.9932) 1.659 (0.9923) 1.832 (0.9912) 1.841 (0.9927) 1.632 (0.9895) 1.673 (0.9955) 0.9927
0.4 14.701 (0.7569) 4.033 (0.9652) 14.521 (0.7877) 2.374 (0.9840) 5.506 (0.9117) 4.447 (0.9486) 4.269 (0.9580) 4.352 (0.9365) 10.985 (0.8841) 0.9036
0.8 27.162 (-0.1611) 4.979 (0.9564) 27.132 (0.0733) 6.843 (0.9087) 12.604 (0.4617) 6.645 (0.9010) 6.195 (0.9198) 8.132 (0.8057) 20.661 (0.5900) 0.6062
1.0 39.101 (-2.2671) 5.722 (0.9231) 39.923 (-1.4482) 11.562 (0.6148) 21.043 (-1.5721) 5.094 (0.9271) 5.117 (0.9371) 10.109 (0.6812) 32.252 (0.0209) -0.1314
Table 1: Normalized Mean Absolute Error (NMAE) and R2-score of the flow-based evaluation metrics evaluated for the predicted vs. the ground truth high-resolution data. refers to the coefficient of the Equation loss () in the total loss function (Eqn. 10).

4.1 Context Generation Network

The Context Generation Network is a convolutional encoder that produces a Latent Context Grid from the low-resolution physical input . Denote this network as


where is the set of learnable parameters associated with this network, is the generated context grid, where is the number of latent channels or the dimensions of each latent context vector. It is worth noting that the function can be applied on a small fixed-size sub-window of , since the model is fully-convolutional. By applying it to the input in a fully-convolutional manner, the size of the domain at test time (both spatial and temporal sizes) can be orders of magnitude greater than the dimensions of .

In this work, we implement the Context Generation Network as a 3D variant of the U-Net architecture which was originally proposed by Ronneberger2015. U-Net has successfully been applied for different computer vision tasks that involve dense localized predictions such as image style transfer [Zhang2018], image segmentation [Lian2019, Chen2019], image enhancement [Chen2018], image coloring [Liu2019, Fang2019], and image generation [Esser2018, Zhang]

Different from the original U-Net architecture, we replace the 2D convolutions with 3D counterparts and utilize residue blocks instead of individual convolution layers for better training properties of the network. The U-Net comprises of a contractive part followed by an expansive part. The contractive part is composed of multiple stages of convolutional residue blocks, each followed by a max-pooling layer (of stride 2). Each residue block consists of 3 convolution layers (1x1, 3x3, 1x1) interleaved with batch normalization layers and ReLU activation. The expansive part mirrors the contractive part, replacing max-pooling with nearest neighbor upsampling. In between the layers with similar grid sizes within the contractive and expansive parts, a skip connection concatenates the features from the contractive layer with the features in the expansive layer as the input to the subsequent layers in the contractive part in order to preserve the localized contextual information.

100NMAE (R2)
Model avg. R2
Baseline (I) 69.6360 (-19.7894) 3470.132 (-57.7717) 76.338 (-14) 78.164 (-11096) 75.729 (-4934) 77.410 (-12599) 122.55 (-3147) 137.376 (-0.5190) 109.398 (-3.0092) -3541
Baseline (II) 6.489 (0.9557) 8.769 (0.8967) 6.144 (0.9593) 3.903 (0.9490) 2.489 (0.9711) 5.584 (0.9382) 6.019 (0.94019) 2.902 (0.9597) 5.076 (0.9644) 0.9482
,  0.664 (0.9992) 0.752 (0.9986) 0.701 (0.9991) 0.643 (0.9978) 0.503 (0.9987) 0.884 (0.9968) 0.849 (0.9974) 1.311 (0.9925) 0.548 (0.9994) 0.9977
,  0.698 (0.9990) 0.666 (0.9987) 0.671 (0.9989) 0.554 (0.9983) 0.408 (0.9991) 0.805 (0.9971) 0.767 (0.9978) 1.048 (0.9946) 0.515 (0.9992) 0.9981
Table 2: Comparison between the performance of framework for super-resolving the low-resolution data vs. two baseline models.
100NMAE (R2)
Dataset(s) () avg. R2
1 0.698 (0.9990) 0.666 (0.9987) 0.671 (0.9989) 0.554 (0.9983) 0.408 (0.9991) 0.805 (0.9971) 0.767 (0.9978) 1.048 (0.9946) 0.515 (0.9992) 0.9981
10 0.527 (0.9996) 0.574 (0.9989) 0.517 (0.9996) 0.3930 (0.9991) 0.3433 (0.9994) 0.540 (0.9988) 0.536 (0.9990) 0.677 (0.9985) 0.345 (0.9998) 0.9992
Table 3: For a model that has been respectively trained on 1 dataset and 10 datasets each having a different initial condition, the super-resolution performance evaluation is reported on a dataset with an unseen initial condition.

4.2 Continuous Decoding Network

One unique property of our super-resolution methodology is that the output is continuous instead of discrete. This removes the limitations in output resolution, and additionally, it allows for an effective computation of the gradients of predicted output physical quantities, enabling an easy way of enforcing PDE-based physical constraints.

The continuous decoding network can be implemented using a simple Multilayer Perceptron, see Fig.

4. For each query, denote the spatio-temporal query location to be and the latent context grid to be , where are the spatio-temporal coordinates and the latent context vector for the -th vertex of the grid. Denote as the set of neighboring vertices that bound , where for a (+1) dimensional spatio-temporal grid . Denote the continuous decoding network, implemented as a Multilayer Perception as


where is the set of trainable parameters of the Multilayer Perception network. The query value at with respect to the shared network and the latent context grid can be calculated as


where , is the trilinear interpolation weight with respect to the bounding vertex , is the stencil size corresponding to the discretization grid vertices.

Since the Continuous Decoding Network is implemented as an MLP, arbitrary spatio-temporal derivatives of the output quantities: can be effectively computed via backpropagation through the MLP. Denote the approximation of the derivative operator to be . We combine the partial derivatives to compute the equation loss as the norm of the residue of the governing equations.

4.3 Loss Function

We use a weighted combination of two losses to train our network: the norm of the difference between the predicted physical outputs and the ground truth physical outputs, which we refer to as Prediction Loss, and the norm of the residues of the governing PDEs, which we refer to as Equation Loss.

Denote the set of sample locations within a mini-batch of training samples to be where is the mini-batch of point samples for the -th low-resolution input, is the vector that represents the ground truth physical output quantities. In the case of the Rayleigh-Bénard Convection example in this study, we have where are the pressure, temperature, x-velocity and y-velocity terms respectively. The super-resolution for the learned model queried at conditioning on low-resolution input is


where is the predicted output vector. The prediction loss for a mini-batch can be formulated as


where is the Frobenius -norm of the difference. We use the L1 Norm for computing the prediction loss. The Equation loss for a mini-batch can be formulated as


which is the norm of the PDE equation residue, using the PDE definition in Eqn. 1. Finally, a single loss term for training the network can be represented as a weighted sum of the two loss terms as



is a hyperparameter for weighting the equation loss.

100NMAE (R2)
avg. R2
1.480 (0.9962) 2.444 (0.9872) 0.881 (0.9988) 31.319 (0.3788) 0.422 (0.9998) 1.947 (0.8005) 1.564 (0.9490) 32.070 (0.3881) 0.388 (0.997) 0.8328
1.299 (0.9968) 1.226 (0.9981) 1.247 (0.9972) 0.339 (0.9996) 0.325 (0.9996) 1.022 (0.9978) 1.040 (0.9981) 0.946 (0.9988) 0.735 (0.9988) 0.9983
1.242 (0.9961) 1.703 (0.9923) 1.270 (0.9955) 1.070 (0.99517) 0.931 (0.9967) 1.661 (0.9927) 1.669 (0.9927) 0.794 (0.9970) 0.446 (0.9987) 0.9952
1.336 (0.9965) 2.659 (0.9839) 1.352 (0.9965) 1.742 (0.9873) 1.504 (0.9926) 2.205 (0.9890) 2.275 (0.9885) 4.218 (0.9578) 0.831 (0.9985) 0.9878
2.115 (0.9931) 3.11 (0.9832) 2.084 (0.9933) 5.873 (0.9209) 7.483 (0.8742) 4.607 (0.9478) 4.281 (0.9594) 7.381 (0.9192) 1.077 (0.9978) 0.9543
Table 4: For a model that has been trained on 10 datasets each having a different boundary condition (Rayleigh number) as with , the super-resolution performance evaluation is reported for: a Rayleigh number within the range of boundary conditions of the training sets (i.e., ), Rayleigh numbers slightly below and above the range of boundary conditions of the training sets (i.e., and respectively), and Rayleigh numbers far below and above the range of boundary conditions of the training sets (i.e., and respectively).

t1 n1

Figure 6: Sample tuples of low-resolution input data for , the high-resolution super-resolved data by , and the ground truth high-resolution data for the 4 physical parameters of the Rayleigh–Bénard problem, i.e., as the temperature, pressure, and the and components of the velocity. The contours correspond to the model evaluation presented in Table 4 for = .

5 Experiments

In all the experiments, we use an Adam optimizer with learning rate of , and

regularization for the loss function, 3000 random samples per epoch, and train for 100 epochs.

5.1 Prediction Loss vs. Equation Loss

In this part, we investigate the influence of the importance given to the Equation loss and the prediction loss (see, Section 4.3) on the performance of . As presented in Eqn. 10, the total loss () comprises of the Prediction loss () and the Equation loss () where the Equation loss is weighted with a scaling coefficient . Accordingly, we study the influence of the hyperparameter in the loss function given in Eqn. 10 on the accuracy of . For this purpose, we consider . The values of the evaluation metrics for trained with each of the values are presented in Table 1 for a validation set. indicates a loss function which only depends on the Prediction loss, and the physical aspects (PDE imposed constraints) of the predicted high-resolution data are not accounted for. As presented in Table 1, the best performance on the validation set is achieved for a loss function with the Equation loss weighting coefficient of . Allover this work we refer to this optimum weighting coefficient as and perform the training tasks using the loss function in Eqn. 10 where the Equation loss is weighted with . As presented in Table 1, a model that is trained with , which only focuses on the data (i.e., uses Prediction loss only) and does not account for the physics and the PDE constraints (i.e., ignores Equation loss) underperforms compared to the trained model with . On the other hand models trained with a significant focus on the physical constraints only (i.e., large ) underperform in super-resolving the low-resolution data. In general, a balance between the focus of the on the model and the physical constraints leads to an optimal super-resolution performance. In achieving that, in Eqn. 10, the Prediction loss () captures the global structure of the data and the Equation loss () further guides the model in accurately reconstructing the local structure of the data.

(a) Throughput vs. Num. of GPUs
(b) Loss vs. Num. of Epochs
(c) Loss vs. Wall Time
Figure 7: Scaling performance on up to 128 GPUs on NERSC Cori GPU cluster. Scaling efficiency is close to ideal performance, at 96% with all 128 GPUs. Training convergence is greatly accelerated by scaling across more GPUs. The shaded band in Fig. 6(b)-6(c) denotes the noise interval. Throughput refers to the number of samples per second.

5.2 vs. Baseline Models

In this section, we present a comparison between the performance of our proposed framework for super-resolving low-resolution data against two baselines, namely: a classic trilinear interpolation algorithm (Baseline (I)), and a deep learning based 3D U-Net model (Baseline (II)). Specifically for the 3D U-Net model for Baseline (II), we use the same U-Net backbone as in our framework, with the difference being that while uses the 3D U-Net to generate a latent context grid, the baseline U-Net continues with 3D up-sampling and transpose-convolution operations up to the target high-resolution space (see, Fig. 5). Table 2 presents the comparison of the with the baseline models. As shown in Table 2, the Baseline (I) is a purely interpolation-based approach and fails to reconstruct the high-resolution data and resolve the fine-scale details, leading to large errors in flow-based evaluation metrics that characterize the flow dynamics. The extremely large normalized mean absolute error (NMAE) of the calculated velocity for the fine-scale solution found by Baseline (I) indicates that a merely interpolative scheme cannot accurately reconstruct the fine-scale local dynamics (e.g., the flow velocities). Moreover, the NMAE of the total kinetic energy (, as a global characterizing parameter of the turbulent dynamics, is also very large for the high-resolution solution by Baseline (I). On the other hand, the deep learning based Baseline (II) which utilizes the 3D U-Net model directly maps the low-resolution data to the high-resolution space, achieves better performance compared to Baseline (I). However, as presented in Table 2, our model performs significantly better than Baselines (I) and (II). The specification of in Table 2 refers to the weighting coefficient of the Equation loss component in the loss function in Eqn. 10. In Table 2, indicates that only the prediction loss has been considered for training the model, whereas refers to the optimum value for the weighting coefficient of the Equation loss from the ablation study (see, Table 1, Sec. 5.1).

5.3 Generalizability of

We further evaluate the robustness of for resolution enhancement of low-resolution datasets that have physical initial and boundary conditions different from the datasets the model has been trained on. We refer to such initial and boundary conditions as unseen initial/boundary conditions. In order to investigate the generalizability of on unseen initial and boundary conditions, we study the effect of each condition separately in the following setups.

5.3.1 Unseen Physical Initial Conditions

We investigate the robustness of a trained for enhancing the resolution of a low-resolution unseen data with physical initial conditions different than the training datasets that the has been trained on. Table 3 shows the performance of the on a dataset with unseen initial conditions. The first row of Table 3 shows the values of the evaluation metrics for unseen test data when is trained only on one dataset whereas the second row shows the values of the evaluation metrics when is trained on 10 datasets each with a different initial condition. As can be observed from the results, the performance of on unseen cases can be improved by training on a more diverse set of initial conditions.

5.3.2 Unseen Physical Boundary Conditions

We further investigate the robustness of a trained for super-resolving a low-resolution unseen dataset with physical boundary conditions different from the training datasets that the has been trained on. As the use of more datasets in Sec. 5.3.1 was shown to be effective in improving the performance of for super-resolution, here we use a dataset that comprises of 10 different sets of boundary conditions (i.e. different Rayleigh numbers of , corresponding to Reynolds numbers of up to 10,000). Table 4 shows the performance of such a trained for 5 different test datasets each with a different Rayleigh number boundary condition. In Table 4, ’s performance is evaluated for a Rayleigh number within the range of boundary conditions of the training sets (i.e., ), for Rayleigh numbers slightly below and above the range of boundary conditions of the training sets (i.e., and respectively), and for Rayleigh numbers far below and above the range of boundary conditions of the training sets (i.e., and respectively). As presented in Table 4, achieves a good performance not only on the boundary conditions within the range of Rayleigh number boundary conditions it has been trained on, but also on the unseen boundary conditions far out of the range of Rayleigh number boundary conditions it has been trained on. This illustrates the fact that a trained model can generalize well to unseen boundary conditions.

5.4 Scalability of

Last but not least, we study the scalability of the model to study its applicability to larger problems that require orders-of-magnitude more compute. The scaling results are presented in Fig. 7. Figs. 6(a) demonstrates that an almost-ideal scaling performance can be achieved for the model on up to 128 GPU workers, achieving approximately scaling efficiency. In Fig. 6(b), we show the convergence for the model training loss with respect to the number of epochs. In Fig. 6(c)

, we show the loss convergence with respect to total wall time, where an increasing number of GPU workers leads to a drastic decrease in total training time. As the models achieve similar levels of losses after 100 epochs, yet the training throughput scales almost linearly with the number of GPUs, we see a close to an ideal level of scaling for convergence speed on up to 16 GPUs. A small anomaly regarding the loss curve for 128 GPUs where the loss does not decrease to an ideal level after 100 epochs shows that a very large batch size could lead to diminishing scalability with respect to model convergence. This has been observed in numerous machine learning scaling studies and requires further investigations within the community.

6 Conclusion and Future Work

In this work, for the first time, we presented the , a physics-constrained super-resolution framework, that can produce continuous super-resolution outputs, would allow imposing arbitrary combinations of PDE constraints and could be evaluated on arbitrary-sized spatio-temporal domains due to its fully-convolutional nature. We further demonstrated that can recover a wide range of important physical flow quantities (e.g., including Turbulent Kinetic Energy, Kolmogorov Time and Length Scales, etc.) by accurately super-resolving turbulent flows significantly better than traditional (trilinear interpolation) and deep learning based (3D U-Net) baselines. We further illustrated the scalability of to a large cluster of GPU nodes with a high speed interconnect, demonstrating its applicability to problems that require orders of magnitude more computational resources.

Future work includes exploring the applicability of to other physical applications beyond 2D Rayleigh Bernard convection. One interesting direction to pursue is to explore the use of 4D spatio-temporal convolution operators such as those proposed by choy20194d to further extend this framework to 4D space-time simulations. That will open up a wide range of applications in Turbulence modeling, where statistical priors between the low-resolution simulation and subgrid-scale physics can be learned from D Direct Numerical Simulations (DNS). The scalability of the model on HPC clusters will be critical in learning from such large scale datasets, where single node training will be prohibitively slow. The fully convolutional nature of the framework, along with the demonstrated scalability makes it well poised for such challenges. Moreover, due to the generalizability of our PDE constrained framework, it would be interesting to apply this framework on applications beyond turbulent flows.


This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The research was performed at the Lawrence Berkeley National Laboratory for the U.S. Department of Energy under Contract No. DE340AC02- 05CH11231. K. Kashinath is supported by the Intel Big Data Center at NERSC. K. Azizzadenesheli gratefully acknowledges the financial support of Raytheon and Amazon Web Services. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships. We also acknowledge the Industrial Consortium on Reservoir Simulation Research at Stanford University (SUPRI-B).