Physics-enhanced deep surrogates for PDEs

11/10/2021
by   Raphaël Pestourie, et al.
MIT
ibm
10

We present a "physics-enhanced deep-surrogate ("PEDS") approach towards developing fast surrogate models for complex physical systems described by partial differential equations (PDEs) and similar models: we show how to combine a low-fidelity "coarse" solver with a neural network that generates "coarsified” inputs, trained end-to-end to globally match the output of an expensive high-fidelity numerical solver. In this way, by incorporating limited physical knowledge in the form of the low-fidelity model, we find that a PEDS surrogate can be trained with at least ∼ 10× less data than a "black-box” neural network for the same accuracy. Asymptotically, PEDS appears to learn with a steeper power law than black-box surrogates, and benefits even further when combined with active learning. We demonstrate feasibility and benefit of the proposed approach by using an example problem in electromagnetic scattering that appears in the design of optical metamaterials.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 6

08/28/2019

A multi-fidelity neural network surrogate sampling method for uncertainty quantification

We propose a multi-fidelity neural network surrogate sampling method for...
08/24/2020

Active learning of deep surrogates for PDEs: Application to metasurface design

Surrogate models for partial-differential equations are widely used in t...
01/21/2020

Learning to Control PDEs with Differentiable Physics

Predicting outcomes and planning interactions with the physical world ar...
12/31/2020

Multi-fidelity Bayesian Neural Networks: Algorithms and Applications

We propose a new class of Bayesian neural networks (BNNs) that can be tr...
09/18/2019

An Adaptive Parareal Algorithm

In this paper, we consider the problem of accelerating the numerical sim...
07/09/2021

Understanding surrogate explanations: the interplay between complexity, fidelity and coverage

This paper analyses the fundamental ingredients behind surrogate explana...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In mechanics, optics, thermal transport, physical chemistry, climate models, and many other fields, data-driven surrogate models—such as polynomial fits, radial basis functions, or neural networks—are widely used as an efficient solution to replace repetitive calls to slow numerical solvers 

[baker2019workshop, benner2015survey, willard2020integrating, hoffmann2019machine, pestourie2018inverse]

. However the reuse benefit of surrogate models comes at a significant cost in training time, where a costly high-fidelity numerical solver must be evaluated many times to provide an adequate training set, and this cost rapidly increases with the number of model parameters (the “curse of dimensionality”) 

[boyd2007chebyshev]. In this paper, we explore one promising route to increasing training-data efficiency: incorporating some knowledge of the underlying physics into the surrogate by training a generative neural network (NN) “end-to-end” with an approximate physics model. We call this hybrid system a “physics-enhanced deep surrogate” (PEDS), and demonstrate multiple-order-of-magnitude improvements in sample and time complexity on a test problem involving optical metamaterials—composite materials whose properties are designed via microstructured geometries [pestourie2020active]. In inverse design of metamaterials, similar geometric components may be re-used thousands or millions of times in a large structure such as an optical metasurface  [pestourie2018inverse, pestourie2020assume], making surrogate models especially attractive to accelerate computational design [bayati2021inverse, li2021inverse].

In particular, we present a PEDS architecture for modeling transmission through a microstructured multilayer “metasurface” [pestourie2020active], where the high-fidelity model solves Maxwell’s equations, in which a deep NN is combined with an fast approximate Maxwell solver based on an extremely coarse discretization, as depicted in Fig. 1 (Sec. 2). By itself, the coarse model yields error if it is applied directly to a downsampled/coarsified geometry, but it qualitatively captures key underlying physics of scattering and resonance. To obtain an accurate surrogate, we apply a deep NN to generate a coarse geometry, adaptively mixed with the downsampled input geometry, which is then used as an input into approximate solver and trained end-to-end to minimize the overall error. In this way, the NN learns to nonlinearly correct for the errors in the coarse model, but at the same time the coarse model “builds in” some knowledge of the physics and geometry. We compare the result of our PEDS model against a NN-only baseline model (Sec. 4) as well as previous “space-mapping” (SM) [bakr2000neural, zhu2016novel, feng2019coarse] approach where we combine a coarse Maxwell solver with a NN transforming only a low-dimensional parameterization of the fine geometry to a similar low-dimensional parameterization of the coarse geometry (Sec. 2).We find that PEDS not only lowers the error of the surrogate for a given amount of data, but it actually seems to improve the asymptotic rate of learning ( larger power law), so that the benefits increase as accuracy tolerance is lowered (Fig. 2 and Sec. 2). For 3.5% accuracy, PEDS requires several orders of magnitude less data than the competing approaches. We show through an ablation study that adding information from the downsampled structure increases the accuracy by 15% in a low-data regime. Furthermore, we find that PEDS gains significant additional benefits by combining it with active-learning techniques from our earlier work [pestourie2020active], and in fact the benefits of active learning seem to be even greater for PEDS than for competing approaches. Although the resulting PEDS surrogate is more expensive to evaluate than a NN by itself, due to the coarse Maxwell solver, it is still much faster than the high-fidelity Maxwell solver ( in 2D, in 3D). Furthermore, since our NN generates a coarsified version of the geometry, this output can be further examined to gain insight into the fundamental physical processes affecting the output.

PEDS should not be confused with physics-informed neural networks (PINNs), which solve the full PDE (imposed pointwise throughout the domain) for the entire PDE solution (not a surrogate for a finite set of outputs like the complex transmission) [karniadakis2021physics, lu2021physics], and which do not employ any pre-existing solver; indeed, current PINNs tend to be slower than conventional “fine” PDE solvers (e.g. based on finite elements) [shin2020convergence]

, but offer potentially greater flexibility. Universal ordinary differential equations (UODEs) 

[rackauckas2020universal] also tackle a different problem from PEDS: they identify unknown dynamics in an ODE by replacing the unknown terms with neural networks trained on data. In contrast to DeepONet [lu2021learning] and Fourier neural operators [li2020fourier], PEDS includes a numerical solver layer. Finally, in contrast to error correction techniques at the output level of the surrogate [lu2020extraction, koziel2006space], PEDS includes the solver in an end-to-end fashion during the training process.

2 Results

Figure 1: Diagram of PEDS: (Main) from the geometry parameterization, the surrogate generates a coarse structure which is combined with a downsampled version of the geometry (in a pixel averaging sense) to be fed in a coarse solver for Maxwell’s equations (symbolized by a cartoon picture of James Clerk Maxwell). (Inset) the training date is generated by solving more costly simulations directly on a fine solver (symbolized by a photograph of James Clerk Maxwell).

2.1 Physical model and solvers

Similarly to Ref. pestourie2020active, our surrogate model predicts the complex transmission of a 2D “meta-atom” unit cell with a parameterized geometry , which consists of ten layers of air holes with independent widths etched in a substrate (of dielectric constant corresponding to silica), with periodic boundary conditions in and outgoing radiation boundary conditions in the direction and an incoming normal-incident planewave from below. In terms of the vacuum wavelength of the incident wave (for the largest considered below), the period in is and the total thickness is (with hole heights of 0.75 and interstices of 0.35); the fact that the structure is several wavelengths in diameter causes the transmission to be a complicated oscillatory function that makes the surrogate training challenging [pestourie2020active]. (A “metasurface” consists of a collection of many of these meta-atoms, designed to perform some optical function such as focusing [li2021inverse]. The full solution for a metasurface can be approximated in terms of the transmissions of individual periodic ‘unit cells via a local periodic approximation [pestourie2018inverse].) A schematic unit cell with 3 holes is showed in Fig. 1, and an example 10-hole structure from the training set is shown in Fig. 2 (right).

Both the “fine” (high-fidelity) and “coarse” (low-fidelity) solvers in this paper employ finite-difference frequency-domain (FDFD) discretizations of Maxwell’s equations 

[champagne2001fdfd], using perfectly matched layers (PMLs) [sacks1995perfectly] to implement outgoing boundary conditions. FDFD essentially represents the geometry by a grid of discretized “pixels,” which is some function of the parameters (hole widths) . (In particular, each pixel’s is assigned to a sub-pixel average of the infinite-resolution structure, which both increases accuracy [oskooi2009accurate] and makes piecewise differentiable.)

An FDFD resolution of 40 pixels per wavelength is used as our “fine” solver, the source of our training data as indicated in Fig. 1 (inset). This resolution is typical for high-fidelity solvers in electromagnetism, because it is comparable to the manufacturing accuracy in nanophotonics and hence suffices for practical metalens design [li2021inverse, bayati2021inverse] within fabrication uncertainty. (Sharp/narrowband resonances can shift if one refines the resolution further, but the positions and the bandwidths of the resonances are accurate to within a few percent.) Each fine-solver data point required  s (on a 3.5 GHz 6-Core Intel Xeon E5); an analogous simulation in 3D takes tens of minutes. Our PEDS surrogate (below) uses an FDFD solver at a coarser resolution of 10 pixels per wavelength, which is about faster in 2D and faster in 3D, but has much worse accuracy; as quantified in Sec. 2, it differs from the fine solver’s transmission by % on our test set.

The upfront cost of building the training dataset is the most time-consuming part of developing a supervised surrogate model. By building some approximate “coarse” physics knowledge into the surrogate, we will show that PEDS greatly reduces the number of expensive simulations, especially when combined with active learning.

2.2 Peds

The PEDS surrogate model is shown schematically in Fig. 1, and is computed in the following stages (“layers”):

  1. Given the parameters of the geometry, a deep generative NN model yields a grid of pixels describing a coarse FDFD geometry. We call this function

  2. We also compute a coarse-grid downsampling (sub-pixel averaging) of the geometry, denoted , e.g. by downsampling .

  3. We make a weighted combination of the NN-generated and downsampled geometries: , with a weight (independent of ) sta that is another learned parameter.

  4. If there are any additional constraints/symmetries that the physical problem imposes on the geometry, they can be applied as projections . (For example, in our metasurface problem we could average with its mirror image to ensure that the generated structure is mirror-symmetric like the exact structure.)

  5. Finally, given , we evaluate the coarse solver to obtain the complex transmission .

In summary, the PEDS model is

(1)

A basic PEDS training strategy could simply minimize the mean-square error (for a training set ) with respect to the parameters of the NN and the weight 

. In our case, we employ a more complicated loss function that allows us to easily incorporate active-learning strategies 

[pestourie2020active]. We optimize the Gaussian negative log-likelihood of a Bayesian model [lakshminarayanan2016simple]

(2)

where is a Gaussian likelihood parameterized by the model parameters

, and the heteroskedastic “standard deviation

is the output of another NN (trained along with our surrogate model). In practice, rather than examining the entire training set at each training step, we follow the standard “batch” approach [goodfellow2016deep] of sampling a random subset of

and minimizing the expected loss with the Adam stochastic gradient-descent algorithm 

[kingma2014adam] (via the Flux.jl [innes:2018] software in the Julia language).

We train our model to predict the complex transmission for 3 frequencies, which are encoded with a one-hot encoding vector. The input of the model

is the concatenation of the 10 widths and the one-hot encoding of the frequency. The coarse solver is a layer of the PEDS model, which is trained end-to-end, so we must backpropagate its gradient

through the other layers to obtain the overall sensitivities of the loss function. This is accomplished efficiently using well-known “adjoint” methods [molesky2018inverse], which yield a vector-Jacobian product that is then automatically composed with the other layers using automatic differentiation (AD) (via the Zygote.jl [innes2018don] software).

Static and dynamic training

In this paper, we investigated two types of supervised end-to-end training, a static training which takes a training set sampled at random, and a dynamic Bayesian training where the training set is iteratively expanded using an active learning algorithm [pestourie2020active]. Essentially, active learning attempts to sample training points where the model uncertainty is greatest, thereby reducing the number of costly training points that must be generated (by the fine solver). Our previous work on active learning reached more than an order of magnitude improvement in data efficiency for a black-box NN, and in this paper (Sec. 2.3 below) we also substantial improvements from active learning for PEDS.

The active-learning algorithm iteratively builds a training set by filtering randomly generated points with respect to a trained measure of uncertainty [pestourie2020active]

. The hyperparameters of this algorithm are

, which is the number of points the surrogate models is initially trained with, , the number of exploration iteration, and , which are such that points are randomly generated at each iteration and only points with highest uncertainty are explored, i.e. we run the expensive fine solver to get the PDE solutions of these points. We have trained surrogates as well as ensemble of 5 independent surrogates. We found that models optimizing the negative log-likelihood perform similarly to models optimizing the mean squared error in the case static training. This is not surprising, because the mean squared error is part of the negative log-likelihood objective.

SM baseline

Our PEDS has similarities with input space mapping (SM) [koziel2008space], especially neural SM [bakr2000neural] and coarse mesh/fine SM [feng2019coarse], where the input of a fine solver is mapped into the input of a coarse solver. However, SM uses the same parameterization (e.g. the widths of the holes) for the fine solver and the coarse solver, whereas PEDS uses a much richer coarse-geometry input (a grid of material values, whose dimensionality is different in the coarse and fine geometries) and can therefore incorporate more geometric and physical inductive biases, such as symmetries and the downsampled structure. For comparison, we trained an input SM baseline model, which is a combination of neural and coarse mesh/fine mesh SM [zhu2016novel, feng2019coarse]. In this model, the NN is learning a mapping that creates modified geometry parameters which are then fed to the coarse solver (see Sec. 4 for implementation details). Since the coarsified parameterized geometry is implemented via sub-pixel averaging as described in Sec. 2.1, this function is differentiable, so the gradient can backpropagate all the way to the mapping NN.

2.3 Accuracy and performance

Figure 2: (Left) Fractional error (FE) on the test set: PEDS outperforms the other models significantly when combined with active learning (AL). SM performs poorly compared to PEDS and does not gain much accuracy from ensembling nor from active learning. (Right) Geometry of the unit cell of the surrogate model. Each of the 10 air holes have independent widths, the simulation is performed with periodic boundary conditions on the long sides, the incident light comes from the bottom and the complex transmission is measured at the top of the geometry.

We compared PEDS to a NN-only baseline (Sec. 4) and an SM baseline. In Fig. 2, we show that PEDS clearly outperforms all other models when combined with active learning. In low-data regime, it is more accurate than the baseline. Asymptotically, in high-data regime, it converges to the true value with a power law exponent better, with a slope of -0.5, in contrast to -0.1 for the baseline on the loglog plot. SM does not gain much accuracy from ensembling nor from active learning, and is even worse than the baseline NN. (We found that SM can perform comparably to the baseline NN if the coarse-solver resolution is doubled to 20, not shown here, at the expense of more computational effort.)

From a data-efficiency perspective, the PEDS+AL solver achieves 20% accuracy on the test set with less data than the baseline NN, less data than the baseline NN with AL, and orders of magnitudes less data than space mapping (SM) with AL. Only PEDS+AL reaches % accuracy, but if we extrapolate the other curves it is clear that they would require at least two orders of magnitude more data to achieve similar accuracy.

(Note that the fractional error (FE) of the coarse solver is 1.24 on the test set, which shows that the coarse solver alone is clearly insufficient to achieve reasonable accuracy.)

Timing was compared on a 3.5 GHz 6-Core Intel Xeon E5. Evaluating the baseline (with an ensemble of neural networks) takes 500 , while PEDS evaluates in , which is about a ten times slower. However the fine solver is about a hundred times slower, evaluating at . In order to simulate the data set quickly, and without loss of generality, we showed results for PEDS in 2D. Although PEDS is already faster than the fine model by two orders of magnitude, this difference will be even starker for 3D simulations. The simulation of the equivalent structure in 3D evaluates in about with the coarse model, and in with the fine model. In this occurrence, PEDS would represent a speed-up by at least four orders of magnitude. Moreover, as we discuss in Sec. 3, the general PEDS approach can be applied to a wide variety of “coarse physics” models, which can be chosen to have a wide range of performance benefits compared to a high-fidelity model.

In order to understand the effect of mixing the generated structure with a downsampled structure, we performed an ablation study on an AL ensemble model in the low-data regime (1280 training points), with results given in Table 1. The edge cases of using only the downsampled structure with the coarse solver performs the worst (1.24 error), corresponding to in (1). Conversely, using the NN generator only, corresponding to in (1), is still about 15% worse (0.20 error) than using adaptive mixing . Imposing mirror symmetry, via in (1), did not improve the accuracy of the model in this case (but is a useful option in general, since symmetry may have a larger effect on the physics in other applications).

Generative model for coarse geometry FE on test set PEDS improvement
(coarsified only) 1.24 86%
(generator only) 0.20 15%
PEDS with symmetry 0.18 5%
PEDS 0.17
Table 1: Ablation study of PEDS with ensembling and active learning for 1280 training points, showing the impact of mixing generated and coarsified geometries, as well of as imposing symmetry.

2.4 Data analysis of the PEDS model

Figure 3: (Left) First 9 principal components which explain most of the variation in the complex transmission. (Right) Coordinate of randomly generated structures on the two first principal components. Clusters can clearly discriminate the input geometries ( in blue, in orange, in green). (Insets) Example generated geometries corresponding to the three frequencies of the surrogate model. The generated geometry is smoother for the smallest frequency.

Because the trained PEDS model includes a NN that generates “equivalent” coarse-grained geometries to the input structure, it can be interesting to analyze these geometries and potentially extract physical insights.

Frequency dependence:

The neural network generates structures that are qualitatively different as a function of the input frequency (Fig. 3 (right insets)). As might be expected on physical grounds (e.g. effective-medium theory [holloway2011characterizing]

), the lowest frequency (longer wavelengths) corresponds to the smoothest generated structures, because the wavelength sets the minimum relevant lengthscale for wave scattering. To help quantify this, we performed a principal components analysis (PCA) of

for uniform random values (including random frequency). We show the first few principal components in Fig. 3 (left). The first and second components explain 67% and 13% of the variations, respectively. We show in Fig. 3

 (right) that the coordinates of the first two components are sufficient to classify generated geometries according to the input frequency.

Scattering richness:

To explore the effect of additional scattering physics produced by multiple layers of holes, we generated coarse geometries for different numbers of layers (equivalently, fixing the parameters of the “deleted” layers to zero). We then decomposed the resulting into the PCA components from above. As we increase the number of layers, the average coordinates of some principal components monotonically increases in magnitude. Since we know that more layers contain more scattering richness, the corresponding principal components geometries provides some geometrical insight into how scattering richness translates into the generated structure. From our analysis of generated structures for the smallest frequency, the first principal component geometry clearly contributes to scattering richness, with an average coordinate (across 1000 generated structures) increasing -11 to 26 as the number of layers goes from 1 to 9.

3 Discussion and outlook

The significance of the PEDS approach is that it can easily be applied to a wide variety of physical systems. It is common across many disciplines to have models at varying levels of fidelity, whether they simply differ in spatial resolution (as in this paper) or in the types of physical processes they incorporate. For example, in fluid mechanics the “coarse” model could be Stokes flow (neglecting inertia), while the “fine” model might be a full Navier–Stokes model (vastly more expensive to simulate) [ferziger2002computational], with generator NN correcting for the deficiencies of the simpler model. As another example, we are currently investigating a PEDS approach to construct a surrogate for complex Boltzmann-transport models [romano2021openbte] where the “coarse” heat-transport equation can simply be a diffusion equation. PEDS provides a data-driven strategy to connect a vast array of simplified physical models with the accuracy of brute-force numerical solvers, offering both more insight and more data efficiency than physics-independent black-box surrogates.

In addition to applying the PEDS approach to additional physical systems, there are a number of other possible technical refinements. For example, one could easily extend the PEDS NN to take an image of the fine-structure geometry rather than its parameterization, perhaps employing convolutional neural networks to represent a translation-independent “coarsification.” This type of surrogate could then be employed for topology optimization in which “every pixel” is a degree of freedom 

[molesky2018inverse]. Another interesting direction might be to develop new “coarsified” physics models that admit ultra-fast solvers but are too inaccurate to be used except with PEDS; for instance, mapping Maxwell’s equations in 3D onto a simpler (scalar-like) wave equation or mapping the materials into objects that admit especially efficient solvers (such as impedance surfaces [perez2018sideways] or compact objects for surface-integral equation methods [jin2015finite]).

4 Methods

Coarse solver and gradient

In the present work, the coarse solver is similar to the fine solver except that it uses a much coarser resolution of 10, which corresponds to a resolution of less than 5 pixels per wavelength in the worst case, instead of 40 for the fine model. The symmetry action was a simple mirror symmetry, implemented by averaging of the geometry with its mirror flip.

Implementation details of PEDS and baselines

The generator neural network of PEDS has two hidden layers with 256 nodes and relu activation functions, and outputs a flattened version of the coarse geometry of dimension 1100 with a hardtanh activation function (

). The network that outputs the variance of the models, takes the generated coarse geometry as input, has 3 hidden layers with relu activation functions and outputs a scalar with a relu activation function. The corresponding baseline, which is a neural-network only (NN-only) method, was chosen to be as close as possible to PEDS architecture, it replaces the coarse solver with a fully connected layer, and outputs two scalars with a tanh activation function. Note that it does not have the information of the downsampled structure. The mapping neural network of the input SM implementation has two hidden layers with 256 nodes and relu activation functions, and outputs the coarse geometry parameters of dimension 10 with a hardtanh activation function. The variance network is similar to PEDS except that the inputs are the SM output geometry parameters. The batch size was set to 64 and the learning rate to

. Every training went through 10 epochs.

Active learning implementation details

The active learning training [pestourie2020active] used the following parameters , , , and K took powers of 2 ranging from to .

Parallelization

In order to accelerate the training of the surrogate model, we parallelized the training at the batch loop level. For ensemble learning, we used 320 computing units which were split into 5 groups (one group per model in the ensemble) of 64 computing units. With a batch size of 64, each worker evaluates the surrogate only once per batch loop (a batch size of a multiple of 64 would work well too).

Code

The code was implemented in Julia language version 1.6, using MPI.jl for parallelization with MPI, Flux.jl for the neural network training framework, ChainRules.jl for custom differentiation rules, and Zygote.jl for other automatic differentiation.

References