Learning Composable Energy Surrogates for PDE Order Reduction

05/13/2020 ∙ by Alex Beatson, et al. ∙ 0

Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour–whether electromagnetic, thermal, or mechanical–arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverage parametric modular structure to learn component-level surrogates, enabling cheaper high-fidelity simulation. We use a neural network to model the stored potential energy in a component given boundary conditions. This yields a structured prediction task: macroscopic behavior is determined by the minimizer of the system's total potential energy, which can be approximated by composing these surrogate models. Composable energy surrogates thus permit simulation in the reduced basis of component boundaries. Costly ground-truth simulation of the full structure is avoided, as training data are generated by performing finite element analysis with individual components. Using dataset aggregation to choose training boundary conditions allows us to learn energy surrogates which produce accurate macroscopic behavior when composed, accelerating simulation of parametric meta-materials.



There are no comments yet.


page 14

page 16

page 17

page 18

page 19

page 20

page 21

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many physical, biological, and mathematical systems are most successfully modeled by partial differential equations (PDEs). Analytic solutions are rarely available for PDEs of practical importance; thus, computational methods to approximate PDE solutions are critical for tackling many problems in science and engineering. Finite element analysis (FEA) is one of the most widely used techniques for solving PDEs on spatial domains; the continuous problem is discretized and replaced by basis functions on a mesh.

The effectiveness of FEA and related methods is largely governed by the granularity of the discrete approximation, i.e., the fineness of the mesh. Complicated domains can require fine meshes that result in prohibitively expensive computations to solve the PDE. This problem is compounded when the task is one of parameter identification or design optimization. In these situations the PDE must be repeatedly solved in the inner loop of a bi-level optimization problem.

An important domain where this challenge is particularly relevant is in the modeling of mechanical meta-materials. Meta-materials are solids in which heterogeneous microstructure leads to rich spaces of macroscopic behavior. Meta-materials offer the possibility of achieving electromagnetic and/or mechanical properties that are otherwise impossible with homogenous materials and traditional design approaches (Poddubny et al., 2013; Cai and Shalaev, 2010; Bertoldi et al., 2017). In this paper we focus on the class of cellular mechanical meta-materials proposed by Overvelde and Bertoldi (2014), which promise new high-performance materials for soft robotics and other domains (see Sec 3). The simulation of these meta-materials is particularly challenging due to the complex cellular structure and the need to accurately capture small-scale nonlinear elastic behavior. Traditional finite element methods have limited ability to scale to these problems, and automated design of these promising materials demands accurate and efficient approximate solutions to the associated PDE.

We develop a framework which takes advantage of spatially local structure in large-scale optimization problems—here the minimization of energy as a function of meta-material displacements. Given that only a small subset of material displacements are of interest, we “collapse out” the remainder using a learned surrogate. Given a component with substructure defined by local parameters, the surrogate produces an accurate proxy energy in terms of the boundary alone. A single surrogate can be trained, then replicated and composed to predict the energy of a larger solid—which is the sum of energies over sub-components. This allows solving the PDE in the reduced basis of component boundaries by minimizing the sum of surrogate energies.

Other methods exist for amortizing the solution of PDEs. Some of the most common approaches use neural networks to map from PDE parameters to solutions (Zhu et al., 2019; Nie et al., 2020)

or construct reduced bases via solving eigenvalue problems or interpolating between snapshots

(Berkooz et al., 1993; Chatterjee, 2000). These approaches typically require solving full systems to produce training data. Our framework uses the modular decomposition of energy to train surrogate models on data generated by querying the finite element "expert" on the energy in small components, avoiding performing FEA on large systems which are expensive to solve.

The remaining sections of this paper are as follows. Section 2 proposes a framework for collapsed-basis optimization when the objective decomposes as a sum of terms which each depend only on local variables. Section 3 provides a brief introduction to the cellular solids of interest and the PDEs governing their behavior. In Section 4, collapsed-basis optimization is applied to the cellular meta-material domain, using the architecture described in Section 5

. The specifics of the training procedure and our use of the imitation learning technique

DAgger are explained in Section 6. Section 7 describes the software and hardware used. In Section 8, empirical evaluation demonstrates that composable energy surrogate models are able to solve cellular solid PDEs accurately and efficiently. Limitations and future work are discussed in Section 9.

2 Learning to optimize in collapsed bases

Solving PDEs like those that govern meta-material behavior can be framed as an optimization problem of finding a solution  which minimizes an energy  subject to constraints. For mechanical meta-materials, is the stored elastic potential energy in the material. We propose a method for amortizing high-dimensional optimization problems where the objective has special conditional independence structure, such as that found in solving these PDEs. Consider the general problem of solving



may be a vector in 

or may belong to a richer space of functions. Often we are only interested in either a small subset of the vector  or the values the function  takes on a small subdomain. To reflect this structure, we take the solution space to be the Cartesian product of a space of primary interest and a “nuisance” space. We denote the solutions as concatenations  where  is the object of interest, and  is the object whose value is not of interest to an application.  is roughly equivalent to auxiliary variables that often appear in probabilistic models, but that are marginalized away or discarded from the simulation. We can use this decomposition to frame Eq. 1 as a bi-level optimization problem:


Consider the collapsed objective. If it is possible to query  and its derivatives without ever representing , we may perform collapsed optimization in the reduced basis of , avoiding either performing optimization in the larger basis of  (Eq. 1), or performing bi-level optimization (Eq. 2) with an inner loop.

However,  is not usually available in closed form. We might consider approximating 

via supervised learning. In the general case, this would require solving 

for each example  we wish to include in our training set. This is the procedure used by many surrogate-based optimization techniques (Queipo et al., 2005; Forrester and Keane, 2009; Shahriari et al., 2015). The high cost of gathering each training example makes this prohibitive when  is high dimensional (such that the minimization is difficult) or when  is high dimensional (such that fitting a surrogate requires many examples).

In some cases, compositional structure in  may assist us with efficiently approximating . Many objectives may be represented as a sum:


Given this decomposition,  and  are conditionally independent given ; i.e., if we constrain  and  to take some values and perform minimization, the resulting or do not vary with the value chosen for . This follows from the partial derivative structure  for .

We propose to learn a collapsed objective , which exploits conditional independence structure by representing . This representation as a sum allows us to use  as targets for supervision, which may be found more cheaply than performing a full minimization. The learned approximations to  may be composed to form an energy function with larger domain.

The language we use to describe this decomposition is intentionally chosen to reflect the conceptual similarity of our framework to collapsed variational inference (Teh et al., 2007) and collapsed Gibbs sampling (Geman and Geman, 1984; Liu, 1994)

. In these procedures, conditional independence is exploited to allow optimization or sampling to proceed in a collapsed space where nuisance random variables are marginalized out of the relevant densities. We exploit similar structure to these techniques, albeit in a deterministic setting. Other approaches to accelerating Eq. 

2 which do not exploit (3) or directly model  include amortizing the inner optimization by predicting  (Kingma and Welling, 2013; Brock et al., 2017), or truncation of the inner loop, either deterministic (Wu et al., 2018; Shaban et al., 2018) or randomized to reduce bias (Tallec and Ollivier, 2017; Beatson and Adams, 2019).

The specific optimization procedure we accelerate is the numerical simulation of mechanical materials, where the objective corresponds to a physically meaningful energy, and the conditional independence structure arises from a spatial decomposition of the domain and the spatial locality of the energy density. We believe this spatial decomposition of the domain and energy could be generalized to learn collapsed energies for solving many other PDEs in reduced bases. This collapsed-basis approach may also be applicable to other bi-level optimization problems where the objective decomposes as a sum of local terms.

3 Mechanical meta-materials

Meta-materials are engineered materials with microstructure which results in macroscopic behavior not found in nature. They are often discussed in the context of materials achieving specific electromagnetic phenomena, such as negative refraction index solids and “invisibility cloaks” which electromagnetically conceal an object through engineered distortion of electromagnetic fields (Poddubny et al., 2013; Cai and Shalaev, 2010). However, they also hold great promise in other domains: mechanical meta-materials use substructure to achieve unusual macroscopic behavior such as negative Poisson’s ratio and nonlinear elastic responses; pores and lattices undergo reversible collapse under large deformation, enabling the engineering of complex physical affordances in soft robotics (Bertoldi et al., 2017).

Figure 1: Cellular meta-materials. Top: materials at rest. Bottom: materials under axial compression, exhibiting periodic instability that varies with pore shape. The left two structures exhibit a negative Poisson’s ratio, which does not occur in nature.

Meta-materials hold promise for modern engineering design, but are challenging to simulate as the microstructure necessitates a very fine finite element mesh, and as the nonlinear response makes them difficult to approximate with a macroscopic material model. Most work on meta-materials has thus relied on engineers and scientists to hand-design materials, rather than numerically optimizing meta-material substructure to maximize some objective (Ion et al., 2016).

We focus on building surrogate models for the two-dimensional cellular solids investigated in Overvelde and Bertoldi (2014). These meta-materials consist of square “cells” of elastomer, each of which has a pore in its center. The pore shapes are defined by parameters  and  which characterize the pore shape in polar coordinates:


is chosen such that the pore covers half the cell’s volume: . Constraints are placed on  and  to enforce a minimum material thicknesses and ensure that as in Overvelde and Bertoldi (2014).

These pore shapes give rise to complicated nonlinear elastic behavior, including negative Poisson’s ratio and double energy wells (i.e., stored elastic energy which does not increase monotonically with strain). Realizations of this class of materials are shown under axial strain in Figure 1.

The continuum mechanics behavior of these elastomer meta-materials can be captured by a neo-Hookean energy model (Ogden, 1997). Let , where  in physical problems, be a point in the resting undeformed material reference configuration, and  be the displacement of this point from reference configuration. The stored energy in a neo-Hookean solid is


where is a scalar energy density over . It is defined for material bulk and shear moduli  and  as:


where is the deformation gradient:


The pore shapes influence the structure of these equations by changing the material domain . The behaviour of these solids can be simulated by solving:


where  is known as the first Piola-Kirchoff stress, and where Eq. 9 defines a boundary condition. For example,  corresponds to a Dirichlet boundary condition; in our case, an externally imposed displacement. corresponds to an external force exerting a pressure on the boundary.

To simulate these meta-materials, the PDE in Eq. 8 is typically solved via finite element analysis. Solving the PDEs arising from large mechanical meta-material structures is computationally challenging due to fine mesh needed to capture pore geometry and due to the highly nonlinear response induced by buckling under large displacements.

Solving the PDE in Eq. 8 corresponds to finding the  which minimizes the stored energy in the material subject to boundary conditions. That is, Eqs. 8 and 9 may be equivalently be expressed in an energy minimization form:


In the next section, we use this energy-based perspective to facilitate learning energy surrogates which allow solving the PDE in a reduced basis of meta-material cell boundaries.

Figure 2: Partitioning a meta-material domain , shown under axial compression, into components to . Black lines show the skeleton . Blue points locate control points of the splines used to represent the reduced-basis solution .

4 Composable energy surrogates

We apply the idea of learning collapsed objectives to the problem of simulating two-dimensional cellular mechanical meta-material behavior. The material response is determined by the displacement field which minimizes the energy , subject to boundary conditions. We divide into regular square subregions , which we choose to be cells with arrays of pores, and denote the intersection of the subregion boundaries with  We let be the restriction of to . We take the quantity of interest to be , the restriction of to , and the nuisance variables to be the restriction of to . The partitioning of is shown in Figure 2.

The total energy in the material decomposes as a sum over the energy within each region:

Let be the restriction of to , noting . We introduce the collapsed energy of a component:

This quantity is the lowest energy achievable by displacements of the interior of the cell , given the boundary conditions specified by  on . depends on the shape of the region , i.e., on the geometry of the pores. Rather than each possible pore shape having a unique collapsed energy function, we introduce the pore shape parameter  as an argument, replacing  with . The macroscopic behavior of the material is fully determined by this single collapsed energy function . Given the true collapsed energy functions, we could accurately simulate material behavior in the reduced basis of the boundaries between each component .111So long as forces and constraints are only applied on .

We learn to approximate this collapsed energy function from data. This function may be duplicated and composed to simulate the material in the reduced basis , an approach we term composable energy surrogates (CESs). A single CES is trained to approximate the function by fitting to supervised data , where and may be drawn from any distribution corresponding to anticipated pore shapes and displacements, and the targets are generated by solving the PDE in a small region with geometry defined by and with imposed as a boundary condition. The CES may be duplicated and composed to approximate the total energy of larger cellular meta-materials.

Our ultimate goal is to efficiently solve for a reduced-basis displacement  on . Reduced-basis solving via CES may be cast as a highly-structured imitation learning problem. Consider using a gradient-based method to minimize the composed surrogate energy:


where  is the model’s prediction of , the collapsed energy of a single component. A sufficient condition for finding the correct minimum is for the "action" taken by the surrogate—the derivative of the energy approximation —to match the "action" taken by an expert—the total derivative, —along the optimization trajectory. If so, the surrogate will follow the trajectory of a valid, if non-standard, bilevel gradient-based procedure for minimizing the energy, corresponding to (2). Given an imperfect surrogate, the error in the final solution could trivially be bounded in terms of the error in approximating  by along the trajectory, and the number of gradient steps taken. This observation informs our model, training, and data collection procedures, described in the following sections.

5 Model architecture

Our CESs take the form of a neural architecture, designed to respect known properties of the true potential energy and to maximize usefulness as surrogate energy to be minimized via a gradient-based procedure. The effects of these design choices are quantified via an ablation study in the appendix.

Reduced-basis parameterization. We require a vector representation for the function . We use one cubic spline for each horizontal and vertical displacement function along each face of the square, with evenly spaced control points and “not-a-knot” boundary conditions. Our vector representation is , formed from the horizontal and the vertical displacement values at each of the control points. Splines on adjacent faces share a control point at the corner. Using control points to parameterize the function along each face requires  control points to parameterize a function around a single cell. For all experiments we use  control points along each edge, resulting in a vector with 72 entries.

Model structure and loss. Our surrogate energy model is:

where is a neural network with parameters and

removes rigid-body rotation and translation. Our loss function is 

which is the sum of losses on the th, st and nd energy derivatives:

In the above, and are the gradient and Hessian of the surrogate energy or the ground-truth energy with respect to , and is sampled independently for each training example in a batch.

Invariance to rigid body transforms. The true stored elastic energy is invariant to rigid body transformation of a solid. This invariance may be hard for a neural network to learn exactly from data. We define a module which applies Procrustes analysis, a procedure that finds and applies the rigid body transformation which minimizes the Euclidean distance to a reference, for which we use the rest configuration. This is differentiable and closed-form.

Encoding a linear elastic bias. The energy is approximated well by a linear elastic model when at rest:  for a stiffness matrix  depending on . We scale our net’s outputs by  so that it needs only capture a “scalar stiffness”  accounting for the geometry of  given  and for deviation from the linear elastic model.

Parameterizing the log-stiffness. The energy of a component is nonnegative, and the ratio of energy to a linear elastic approximation varies over many orders of magnitude. We thus parameterize the log of the scalar stiffness with our neural network rather than the stiffness.

Log-stiffness loss. We wish to find neural network parameters which lead to accurate energy predictions for many different orders of magnitude of energy and displacement. Minimizing the  loss between predicted and true energies penalizes errors in predicting large energies more than proportional errors predicting small energies. Instead, we take the  loss between the predicted log-stiffness  and the effective ground-truth log-stiffness, .

Sobolev training with gradients. When derivatives of a target function are available, training a model to match these derivatives (“Sobolev training”) can aid generalization (Czarnecki et al., 2017). Accuracy of CES’ gradients is crucial to an accurate solution trajectory. We obtain ground-truth gradients cheaply via the adjoint method (Lions, 1971). Given a solution to the PDE in with boundary conditions , the gradient  requires one linear system solve, with the same cost as one Newton step while solving the PDE (Mitusch et al., 2019). The spline is a linear map from to in the finite element basis. We can thus efficiently compute . The gradient of our model, , may be computed with one backward pass.

Sobolev training with Hessian-vector products. Given a solution and gradient, computing requires linear system solves—one for each entry in . As is much smaller than the number of Newton steps we need to solve the PDE for moderate displacements, we expect the increased fidelity from a 2nd-order approximation of the energy to be worth this added computation. Computing the full Hessian of the surrogate energy, , would require backward passes. Instead we train on Hessian-vector products, which require only a single backward pass additional to that required for the gradient.

Cosine distance loss for Sobolev training

. Ground-truth gradient and Hessian values vary over many orders of magnitude, roughly corresponding to lower and higher energy displacements. We wish our model to be robust to outliers and accurate across a range of different conditions. Rather than placing an 

loss on the gradient and Hessian-vector products as in Czarnecki et al. (2017), we minimize the cosine distance between ground truth and approximate gradients and Hessians, which is naturally bounded in .

clip, trim=.4cm .3cm .7cm .3cm

clip, trim=.4cm .3cm .7cm .3cm

clip, trim=.3cm 0cm .7cm .3cm

clip, trim=.3cm 0cm .7cm .3cm

Figure 3:

Error in solution and in estimated energy vs solution wall clock time for the composed energy surrogate and for finite element models with varying mesh sizes. Top: axial compression. Bottom: axial tension.

6 Data and training

We carry out data collection in two phases. First, we collect training and validation datasets using Hamiltonian Monte Carlo (Duane et al., 1987) to preferentially sample displacements which correspond to lower energy modes. Next, we perform dataset aggregation (Ross et al., 2011) to augment the dataset such that the learned energy model will be accurate on the states it encounters when deployed.

6.1 Solving the PDE

To collect training data, we use the reduced-basis displacement corresponding to a vector of spline coefficients as the boundary condition around a domain representing a -pore subdomain, and solve the PDE

using a load-stepped relaxed Newton’s method (Sheng et al., 2002). The relaxed Newton’s method takes the iteration:


Above, is the relaxation parameter (analogous to a step size), and is the vector of coefficients defining in the FEA basis. Newton’s method requires an initial guess which is sufficiently close to the true solution (Kythe et al., 2004). Smaller relaxation parameters yield a greater radius of convergence but necessitate more steps to solve the PDE.

The radius of convergence can also be aided by load-stepping: solving the PDE for a sequence of boundary conditions, annealing from an initial boundary condition for which we have a good initial guess (e.g., the rest configuration) to a final boundary condition , using the solution to the previous problem as an initial guess for Newton’s method for the next problem. We find that combining load stepping with a relaxed Newton’s method allows problems to be solved more efficiently than using either alone. Except where specified, we linearly anneal from rest to  over  load steps and use a relaxation parameter .

6.2 Initial dataset collection

We wish to train on a wide variety of displacement boundary conditions. Solution procedures minimize the energy: thus, lower energy modes will be encountered in the solve. We choose a shaping distribution where the density is the product of a Boltzmann density  and a Gaussian density , where 

is a macroscopic strain tensor

222See the appendix for approximating from . represented by ,

is a target macroscopic strain drawn from an i.i.d. Gaussian with standard deviation 

, and  is set to .

Given a solution to the PDE, the log-density and its displacement may be cheaply computed (the latter via the adjoint method). Making use of these gradients, we sample data points with Hamiltonian Monte Carlo (HMC). After sampling a data point, we compute the corresponding Hessian and save the tuple as a data point.

We initialize each HMC data collector by sampling a macroscopic displacement target and a random pore shape. We do not use load-stepping, instead using the solution for the

used in the previous iteration of HMC’s leapfrog integration as an initial guess for solving the PDE. We randomize HMC hyperparameters for each collector to attempt to minimize the impact of specific settings: see the appendix for exact ranges. We sample

training examples and validation examples altogether. We visualize displacements drawn from this distribution in the appendix.

6.3 Data aggregation

The procedure of deploying the surrogate defies standard i.i.d. assumptions in supervised machine learning. That is, when deployed, the surrogate encounters states determined by the energy it defines and on the boundary conditions placed on the composed body. Depending on this energy and on the boundary conditions, the surrogate may encounter states which do not resemble those sampled with HMC.

This problem—that training an agent to predict expert actions with supervised learning leads to trajectories dissimilar to those on which it was trained—is a central concern in the imitation learning literature. A number of solutions exist (Schroecker and Isbell, 2017). One such technique is dataset aggregation, or DAgger (Ross et al., 2011), an extension of earlier approaches SEARN (Daumé et al., 2009) and SMILe (Ross and Bagnell, 2010), which reduces imitation learning or structured prediction to online learning.

In DAgger, a policy is deployed and trajectories are collected. The expert is queried for ground-truth actions on the states in these trajectories. The state-action pairs are appended to the dataset, and the policy is retrained on this dataset. This process of deployment, querying, appending data, and retraining, is iterated. Under appropriate assumptions, the instantaneous regret of the learned policy vanishes with the number of iterations, i.e., the learned policy will match the expert policy on its own trajectories.

Ross et al. (2011) present DAgger as a method for discrete action spaces. Our problem has a continuous action space: the gradient of the energy in a cell. We do not investigate whether it is possible to generalize DAgger’s regret guarantees to continuous action spaces, but the intuition holds that we wish our model to “imitate” the finite element “expert” on the optimization trajectories the model produces.

We initialize our training data as described in the preceding section. We then apply DAgger by iterating: (i) training the surrogate; (ii) composing surrogates and finding the displacements which minimize the composed energy; (iii) sampling displacements which lie along the surrogate’s solution path, querying the ground-truth energy and energy derivatives using the finite element model, and adding these new data points to the dataset. We visualize displacements generated by DAgger in the appendix.

7 Software, hardware, and systems

We implement the finite element models in dolfin (Logg and Wells, 2010; Logg et al., 2012b), a Python front end to FEniCS (Alnæs et al., 2015; Logg et al., 2012a). To differentiate through finite element solutions, we use the package dolfin-adjoint (Mitusch et al., 2019)

. We implement surrogate models in PyTorch

(Paszke et al., 2019).

We use Ray (Moritz et al., 2018) to run distributed workloads on Amazon EC2. The initial dataset is collected using 80 M4.xlarge CPU spot workers. While training the surrogate, we use a GPU P3.large driver node to train the model, and 80 M4.xlarge CPU spot worker nodes performing DAgger in parallel. These workers receive updated surrogate model parameters, compose and deploy the surrogate, sample displacements along the solution path, query the finite element model for energy and derivatives, and return data to the driver node. Initial dataset collection and model training with DAgger each take about one day in wall-clock time.

8 Empirical evaluation

In this section we demonstrate the ability of Composable Energy Surrogates (CES) to efficiently produce accurate solutions. We compare wall-clock computation time and solution accuracy of CES to that of FEA at varying fidelities.

We consider the systems constructed in Overvelde and Bertoldi (2014): structures with an array of pores, corresponding to a assembly of our surrogates, each of which represents a

-pore component. We sample pore shapes from a uniform distribution over the valid shapes defined in

Overvelde and Bertoldi (2014). For DAgger, we sample macroscopic vertical axial strain magnitudes from

, and choose to apply compression with probability

(as compressive displacements involve more interesting pore collapse) or tension with probability .

We compare our composed surrogates to finite element analysis with a range of different-fidelity meshes under axial compression and tension with a macroscopic displacement of , where  is the original length of the solid. See the appendix for details of the finite element meshes. Comparison is carried out for seven pore shapes: , corresponding to circular pores, and six sampled from a uniform distribution over pore parameters defined as valid in Overvelde and Bertoldi (2014) via rejection sampling.

For the composed surrogate, we use PyTorch’s L-BFGS routine to minimize the energy, with fixed step size and PyTorch’s default criteria for checking convergence. We attempt to solve each finite element model with Newton’s method with load steps and relaxation parameters . We record the time taken for the fastest solve which converges. Under compression these solids exhibit nonlinear behavior, and only the more conservative solves converge. Under tension they behave closer to a linear elastic model, and Newton’s method converges quickly. Measurements are taken on an Amazon AWS M4.xlarge EC2 CPU instance. Using a GPU could provide further acceleration for the composed surrogate.

We measure error in the solution and in the macroscopic energy. The former is , where and are the approximation and ground-truth evaluated at the spline control points. We also measure the relative error, , where is the approximated energy of the approximate solution, and is the ground-truth energy of the ground-truth solution. As the energy function determines macroscopic behavior, accuracy of this energy is a potential indicator of a model’s ability to generalize to larger structures. The highest-fidelity finite element model is used for the ground truth and , and has an error of zero on both metrics by definition. Multiple solutions exist as energy is preserved under vertical and horizontal flips, so before comparing a solution  to the ground-truth  we programatically check these flips and use the flip which minimizes the Euclidean distance.

Figure 3 shows our evaluation. Composed energy surrogates are more efficient than high-fidelity FEA simulations yet more accurate than low-fidelity simulations. CES produces solutions with equivalent  error to FEA solutions which need an order of magnitude more variables and computation time, or produces solutions with an order of magnitude less  error than FEM solutions requiring the same computation. This gap increases to several orders of magnitude when we consider percentage error in the predicted strain energy. We visualize the ground-truth and the CES approximation in Figure 4. See the appendix for visualization of the FEM solutions and of CES for the remaining structures.

clip, trim=3.35cm 1.7cm 2.7cm 1.8cm

clip, trim=2.2cm 1.7cm 2.3cm 1.8cm

clip, trim=4.5cm 1.5cm 4.0cm 1.5cm

clip, trim=4.5cm 1.5cm 4.0cm 1.5cm

Figure 4: Meta-materials under axial compression (top) and tension (bottom), with the solution found via CES shown in red at the spline control points. The CES-approximated solution approximates the FEA solution to high visual fidelity.

9 Limitations and opportunities

Use of DAgger. We use DAgger to assist CES to match the ground-truth on the states encountered during the solution trajectory. DAgger requires one to specify in advance the conditions under which the surrogate will be deployed, limiting the surrogate’s use for arbitrary downstream tasks. Investigating CES’s ability to generalize to novel deployment conditions–and designing surrogates which can do so effectively–is an important direction for future work.

Error estimation, refinement, and guarantees. Finite element methods enjoy advantageous properties. As one uses a higher-resolution mesh or higher-order elements, the solution in the finite element basis approaches the true solution. This provides a straightforward way to estimate the error (compare to the solution in a more-refined basis) and control it (via refinement). CES currently lacks these properties.

Finite element baseline. There is an immense body of work on finite element methods and iterative solvers. We try to provide a representative baseline, but our work should not be taken as a comparison with the “state-of-the-art”. We aim to show that composable machine-learned energy surrogates enjoy some advantages over a reasonable baseline, and hold promise for scalable amortization of solving modular PDEs.

Hyperparameters. Both our method and the finite element baseline rely on a multitude of hyperparameters: the size of the spline reduced basis; the size and learning rate of the neural network; the size and degree of the finite element approximation; and the specific variant of Newton’s method to solve the finite element model. We do not attempt a formal, exhaustive search over these parameters.

Known structure. We leave much fruit on the vine in terms of engineering structure known from the true equations into our surrogate. For example, we do not engineer invariance of the energy to flips and rotations of the spline coefficients. One could also use a more expressive normalizer than , e.g. the energy predicted by a coarse-grained linear elastic model, or exploit spatially local correlation, e.g. by using a 1-d convolutional network around the boundary of the cell.

10 Conclusion

We present a framework for collapsing optimization problems with a local bilevel structure by learning composable energy surrogates. This framework is applied to amortizing the solution of PDEs corresponding to mechanical meta-material behavior. Learned composable energy surrogates are more efficient than high-fidelity FEA yet more accurate than low-fidelity FEA, occupying a new point on the Pareto frontier. We believe that learning composable energy surrogates could accelerate metamaterial design, as well as design and identification of other systems described by PDEs with parametric modular structure.

11 Acknowledgements

We would like to thank Alexander Niewiarowski for numerous helpful discussions about continuum mechanics and FEA, Ari Seff for help finding a particularly difficult bug in our data pipeline, and Maurizio Chiaramonte for inspiring early conversations about metamaterials and model order reduction. This work was funded by NSF IIS-1421780 and by the Princeton Catalysis Initiative.


  • M. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells (2015) The FEniCS project version 1.5. Archive of Numerical Software 3 (100). Cited by: §7.
  • A. Beatson and R. P. Adams (2019) Efficient optimization of loops and limits with randomized telescoping sums. arXiv preprint arXiv:1905.07006. Cited by: §2.
  • G. Berkooz, P. Holmes, and J. L. Lumley (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics 25 (1), pp. 539–575. Cited by: §1.
  • K. Bertoldi, V. Vitelli, J. Christensen, and M. van Hecke (2017) Flexible mechanical metamaterials. Nature Reviews Materials 2 (11), pp. 1–11. Cited by: §1, §3.
  • A. Brock, T. Lim, J. M. Ritchie, and N. Weston (2017) Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344. Cited by: §2.
  • W. Cai and V. M. Shalaev (2010) Optical metamaterials. Vol. 10, Springer. Cited by: §1, §3.
  • A. Chatterjee (2000) An introduction to the proper orthogonal decomposition. Current Science, pp. 808–817. Cited by: §1.
  • W. M. Czarnecki, S. Osindero, M. Jaderberg, G. Swirszcz, and R. Pascanu (2017) Sobolev training for neural networks. In Advances in Neural Information Processing Systems, pp. 4278–4287. Cited by: §5, §5.
  • H. Daumé, J. Langford, and D. Marcu (2009) Search-based structured prediction. Machine Learning 75 (3), pp. 297–325. Cited by: §6.3.
  • S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth (1987) Hybrid Monte Carlo. Physics Letters B 195 (2), pp. 216–222. Cited by: §6.
  • A. I. Forrester and A. J. Keane (2009) Recent advances in surrogate-based optimization. Progress in Aerospace Sciences 45 (1-3), pp. 50–79. Cited by: §2.
  • S. Geman and D. Geman (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (6), pp. 721–741. Cited by: §2.
  • A. Ion, J. Frohnhofen, L. Wall, R. Kovacs, M. Alistar, J. Lindsay, P. Lopes, H. Chen, and P. Baudisch (2016) Metamaterial mechanisms. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 529–539. Cited by: §3.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
  • P. K. Kythe, D. Wei, and M. Okrouhlik (2004) An introduction to linear and nonlinear finite element analysis: a computational approach. Appl. Mech. Rev. 57 (5), pp. B25–B25. Cited by: §6.1.
  • J.L. Lions (1971) Optimal control of systems governed by partial differential equations. Grundlehren der mathematischen Wissenschaften, Springer-Verlag. External Links: LCCN lc78113638 Cited by: §5.
  • J. S. Liu (1994) The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association 89 (427), pp. 958–966. Cited by: §2.
  • A. Logg, K. Mardal, G. N. Wells, et al. (2012a) Automated solution of differential equations by the finite element method. Springer. External Links: Document Cited by: §7.
  • A. Logg, G. N. Wells, and J. Hake (2012b) DOLFIN: a C++/Python finite element library. In Automated solution of differential equations by the finite element method, pp. 173–225. Cited by: §7.
  • A. Logg and G. N. Wells (2010) DOLFIN: automated finite element computing. ACM Transactions on Mathematical Software (TOMS) 37 (2), pp. 1–28. Cited by: §7.
  • S. Mitusch, S. Funke, and J. Dokken (2019) Dolfin-adjoint 2018.1: automated adjoints for FEniCS and Firedrake.

    Journal of Open Source Software

    4 (38), pp. 1292.
    Cited by: §5, §7.
  • P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, et al. (2018) Ray: a distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 561–577. Cited by: §7.
  • Z. Nie, H. Jiang, and L. B. Kara (2020)

    Stress field prediction in cantilevered structures using convolutional neural networks

    Journal of Computing and Information Science in Engineering 20 (1). Cited by: §1.
  • R.W. Ogden (1997) Non-linear elastic deformations. Dover Civil and Mechanical Engineering, Dover Publications. External Links: LCCN 97016162 Cited by: §3.
  • J. T. Overvelde and K. Bertoldi (2014) Relating pore shape to the non-linear response of periodic elastomeric structures. Journal of the Mechanics and Physics of Solids 64, pp. 351–366. Cited by: §1, §3, §8, §8.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019)

    PyTorch: an imperative style, high-performance deep learning library

    In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §7.
  • A. Poddubny, I. Iorsh, P. Belov, and Y. Kivshar (2013) Hyperbolic metamaterials. Nature photonics 7 (12), pp. 948. Cited by: §1, §3.
  • N. V. Queipo, R. T. Haftka, W. Shyy, T. Goel, R. Vaidyanathan, and P. K. Tucker (2005) Surrogate-based analysis and optimization. Progress in Aerospace Sciences 41 (1), pp. 1–28. Cited by: §2.
  • S. Ross and D. Bagnell (2010) Efficient reductions for imitation learning. In

    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics

    pp. 661–668. Cited by: §6.3.
  • S. Ross, G. Gordon, and D. Bagnell (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. Cited by: §6.3, §6.3, §6.
  • Y. Schroecker and C. L. Isbell (2017) State aware imitation learning. In Advances in Neural Information Processing Systems, pp. 2911–2920. Cited by: §6.3.
  • A. Shaban, C. Cheng, N. Hatch, and B. Boots (2018) Truncated back-propagation for bilevel optimization. arXiv preprint arXiv:1810.10667. Cited by: §2.
  • B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas (2015) Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148–175. Cited by: §2.
  • D. Sheng, S. W. Sloan, and A. J. Abbo (2002) An automatic Newton–Raphson scheme. The International Journal Geomechanics 2 (4), pp. 471–502. Cited by: §6.1.
  • C. Tallec and Y. Ollivier (2017)

    Unbiasing truncated backpropagation through time

    arXiv preprint arXiv:1705.08209. Cited by: §2.
  • Y. W. Teh, D. Newman, and M. Welling (2007)

    A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation

    In Advances in Neural Information Processing Systems, pp. 1353–1360. Cited by: §2.
  • Y. Wu, M. Ren, R. Liao, and R. Grosse (2018) Understanding short-horizon bias in stochastic meta-optimization. arXiv preprint arXiv:1803.02021. Cited by: §2.
  • Y. Zhu, N. Zabaras, P. Koutsourelakis, and P. Perdikaris (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. Cited by: §1.