1 Introduction
Many physical, biological, and mathematical systems are most successfully modeled by partial differential equations (PDEs). Analytic solutions are rarely available for PDEs of practical importance; thus, computational methods to approximate PDE solutions are critical for tackling many problems in science and engineering. Finite element analysis (FEA) is one of the most widely used techniques for solving PDEs on spatial domains; the continuous problem is discretized and replaced by basis functions on a mesh.
The effectiveness of FEA and related methods is largely governed by the granularity of the discrete approximation, i.e., the fineness of the mesh. Complicated domains can require fine meshes that result in prohibitively expensive computations to solve the PDE. This problem is compounded when the task is one of parameter identification or design optimization. In these situations the PDE must be repeatedly solved in the inner loop of a bilevel optimization problem.
An important domain where this challenge is particularly relevant is in the modeling of mechanical metamaterials. Metamaterials are solids in which heterogeneous microstructure leads to rich spaces of macroscopic behavior. Metamaterials offer the possibility of achieving electromagnetic and/or mechanical properties that are otherwise impossible with homogenous materials and traditional design approaches (Poddubny et al., 2013; Cai and Shalaev, 2010; Bertoldi et al., 2017). In this paper we focus on the class of cellular mechanical metamaterials proposed by Overvelde and Bertoldi (2014), which promise new highperformance materials for soft robotics and other domains (see Sec 3). The simulation of these metamaterials is particularly challenging due to the complex cellular structure and the need to accurately capture smallscale nonlinear elastic behavior. Traditional finite element methods have limited ability to scale to these problems, and automated design of these promising materials demands accurate and efficient approximate solutions to the associated PDE.
We develop a framework which takes advantage of spatially local structure in largescale optimization problems—here the minimization of energy as a function of metamaterial displacements. Given that only a small subset of material displacements are of interest, we “collapse out” the remainder using a learned surrogate. Given a component with substructure defined by local parameters, the surrogate produces an accurate proxy energy in terms of the boundary alone. A single surrogate can be trained, then replicated and composed to predict the energy of a larger solid—which is the sum of energies over subcomponents. This allows solving the PDE in the reduced basis of component boundaries by minimizing the sum of surrogate energies.
Other methods exist for amortizing the solution of PDEs. Some of the most common approaches use neural networks to map from PDE parameters to solutions (Zhu et al., 2019; Nie et al., 2020)
or construct reduced bases via solving eigenvalue problems or interpolating between snapshots
(Berkooz et al., 1993; Chatterjee, 2000). These approaches typically require solving full systems to produce training data. Our framework uses the modular decomposition of energy to train surrogate models on data generated by querying the finite element "expert" on the energy in small components, avoiding performing FEA on large systems which are expensive to solve.The remaining sections of this paper are as follows. Section 2 proposes a framework for collapsedbasis optimization when the objective decomposes as a sum of terms which each depend only on local variables. Section 3 provides a brief introduction to the cellular solids of interest and the PDEs governing their behavior. In Section 4, collapsedbasis optimization is applied to the cellular metamaterial domain, using the architecture described in Section 5
. The specifics of the training procedure and our use of the imitation learning technique
DAgger are explained in Section 6. Section 7 describes the software and hardware used. In Section 8, empirical evaluation demonstrates that composable energy surrogate models are able to solve cellular solid PDEs accurately and efficiently. Limitations and future work are discussed in Section 9.2 Learning to optimize in collapsed bases
Solving PDEs like those that govern metamaterial behavior can be framed as an optimization problem of finding a solution which minimizes an energy subject to constraints. For mechanical metamaterials, is the stored elastic potential energy in the material. We propose a method for amortizing highdimensional optimization problems where the objective has special conditional independence structure, such as that found in solving these PDEs. Consider the general problem of solving
(1) 
Here,
may be a vector in
or may belong to a richer space of functions. Often we are only interested in either a small subset of the vector or the values the function takes on a small subdomain. To reflect this structure, we take the solution space to be the Cartesian product of a space of primary interest and a “nuisance” space. We denote the solutions as concatenations where is the object of interest, and is the object whose value is not of interest to an application. is roughly equivalent to auxiliary variables that often appear in probabilistic models, but that are marginalized away or discarded from the simulation. We can use this decomposition to frame Eq. 1 as a bilevel optimization problem:(2) 
Consider the collapsed objective, . If it is possible to query and its derivatives without ever representing , we may perform collapsed optimization in the reduced basis of , avoiding either performing optimization in the larger basis of (Eq. 1), or performing bilevel optimization (Eq. 2) with an inner loop.
However, is not usually available in closed form. We might consider approximating
via supervised learning. In the general case, this would require solving
for each example we wish to include in our training set. This is the procedure used by many surrogatebased optimization techniques (Queipo et al., 2005; Forrester and Keane, 2009; Shahriari et al., 2015). The high cost of gathering each training example makes this prohibitive when is high dimensional (such that the minimization is difficult) or when is high dimensional (such that fitting a surrogate requires many examples).In some cases, compositional structure in may assist us with efficiently approximating . Many objectives may be represented as a sum:
(3) 
Given this decomposition, and are conditionally independent given ; i.e., if we constrain and to take some values and perform minimization, the resulting or do not vary with the value chosen for . This follows from the partial derivative structure for .
We propose to learn a collapsed objective , which exploits conditional independence structure by representing . This representation as a sum allows us to use as targets for supervision, which may be found more cheaply than performing a full minimization. The learned approximations to may be composed to form an energy function with larger domain.
The language we use to describe this decomposition is intentionally chosen to reflect the conceptual similarity of our framework to collapsed variational inference (Teh et al., 2007) and collapsed Gibbs sampling (Geman and Geman, 1984; Liu, 1994)
. In these procedures, conditional independence is exploited to allow optimization or sampling to proceed in a collapsed space where nuisance random variables are marginalized out of the relevant densities. We exploit similar structure to these techniques, albeit in a deterministic setting. Other approaches to accelerating Eq.
2 which do not exploit (3) or directly model include amortizing the inner optimization by predicting (Kingma and Welling, 2013; Brock et al., 2017), or truncation of the inner loop, either deterministic (Wu et al., 2018; Shaban et al., 2018) or randomized to reduce bias (Tallec and Ollivier, 2017; Beatson and Adams, 2019).The specific optimization procedure we accelerate is the numerical simulation of mechanical materials, where the objective corresponds to a physically meaningful energy, and the conditional independence structure arises from a spatial decomposition of the domain and the spatial locality of the energy density. We believe this spatial decomposition of the domain and energy could be generalized to learn collapsed energies for solving many other PDEs in reduced bases. This collapsedbasis approach may also be applicable to other bilevel optimization problems where the objective decomposes as a sum of local terms.
3 Mechanical metamaterials
Metamaterials are engineered materials with microstructure which results in macroscopic behavior not found in nature. They are often discussed in the context of materials achieving specific electromagnetic phenomena, such as negative refraction index solids and “invisibility cloaks” which electromagnetically conceal an object through engineered distortion of electromagnetic fields (Poddubny et al., 2013; Cai and Shalaev, 2010). However, they also hold great promise in other domains: mechanical metamaterials use substructure to achieve unusual macroscopic behavior such as negative Poisson’s ratio and nonlinear elastic responses; pores and lattices undergo reversible collapse under large deformation, enabling the engineering of complex physical affordances in soft robotics (Bertoldi et al., 2017).
Metamaterials hold promise for modern engineering design, but are challenging to simulate as the microstructure necessitates a very fine finite element mesh, and as the nonlinear response makes them difficult to approximate with a macroscopic material model. Most work on metamaterials has thus relied on engineers and scientists to handdesign materials, rather than numerically optimizing metamaterial substructure to maximize some objective (Ion et al., 2016).
We focus on building surrogate models for the twodimensional cellular solids investigated in Overvelde and Bertoldi (2014). These metamaterials consist of square “cells” of elastomer, each of which has a pore in its center. The pore shapes are defined by parameters and which characterize the pore shape in polar coordinates:
(4) 
is chosen such that the pore covers half the cell’s volume: . Constraints are placed on and to enforce a minimum material thicknesses and ensure that as in Overvelde and Bertoldi (2014).
These pore shapes give rise to complicated nonlinear elastic behavior, including negative Poisson’s ratio and double energy wells (i.e., stored elastic energy which does not increase monotonically with strain). Realizations of this class of materials are shown under axial strain in Figure 1.
The continuum mechanics behavior of these elastomer metamaterials can be captured by a neoHookean energy model (Ogden, 1997). Let , where in physical problems, be a point in the resting undeformed material reference configuration, and be the displacement of this point from reference configuration. The stored energy in a neoHookean solid is
(5) 
where is a scalar energy density over . It is defined for material bulk and shear moduli and as:
(6) 
where is the deformation gradient:
(7) 
The pore shapes influence the structure of these equations by changing the material domain . The behaviour of these solids can be simulated by solving:
(8) 
(9) 
where is known as the first PiolaKirchoff stress, and where Eq. 9 defines a boundary condition. For example, corresponds to a Dirichlet boundary condition; in our case, an externally imposed displacement. corresponds to an external force exerting a pressure on the boundary.
To simulate these metamaterials, the PDE in Eq. 8 is typically solved via finite element analysis. Solving the PDEs arising from large mechanical metamaterial structures is computationally challenging due to fine mesh needed to capture pore geometry and due to the highly nonlinear response induced by buckling under large displacements.
Solving the PDE in Eq. 8 corresponds to finding the which minimizes the stored energy in the material subject to boundary conditions. That is, Eqs. 8 and 9 may be equivalently be expressed in an energy minimization form:
(10) 
(11) 
In the next section, we use this energybased perspective to facilitate learning energy surrogates which allow solving the PDE in a reduced basis of metamaterial cell boundaries.
4 Composable energy surrogates
We apply the idea of learning collapsed objectives to the problem of simulating twodimensional cellular mechanical metamaterial behavior. The material response is determined by the displacement field which minimizes the energy , subject to boundary conditions. We divide into regular square subregions , which we choose to be cells with arrays of pores, and denote the intersection of the subregion boundaries with We let be the restriction of to . We take the quantity of interest to be , the restriction of to , and the nuisance variables to be the restriction of to . The partitioning of is shown in Figure 2.
The total energy in the material decomposes as a sum over the energy within each region:
Let be the restriction of to , noting . We introduce the collapsed energy of a component:
This quantity is the lowest energy achievable by displacements of the interior of the cell , given the boundary conditions specified by on . depends on the shape of the region , i.e., on the geometry of the pores. Rather than each possible pore shape having a unique collapsed energy function, we introduce the pore shape parameter as an argument, replacing with . The macroscopic behavior of the material is fully determined by this single collapsed energy function . Given the true collapsed energy functions, we could accurately simulate material behavior in the reduced basis of the boundaries between each component .^{1}^{1}1So long as forces and constraints are only applied on .
We learn to approximate this collapsed energy function from data. This function may be duplicated and composed to simulate the material in the reduced basis , an approach we term composable energy surrogates (CESs). A single CES is trained to approximate the function by fitting to supervised data , where and may be drawn from any distribution corresponding to anticipated pore shapes and displacements, and the targets are generated by solving the PDE in a small region with geometry defined by and with imposed as a boundary condition. The CES may be duplicated and composed to approximate the total energy of larger cellular metamaterials.
Our ultimate goal is to efficiently solve for a reducedbasis displacement on . Reducedbasis solving via CES may be cast as a highlystructured imitation learning problem. Consider using a gradientbased method to minimize the composed surrogate energy:
(12) 
where is the model’s prediction of , the collapsed energy of a single component. A sufficient condition for finding the correct minimum is for the "action" taken by the surrogate—the derivative of the energy approximation —to match the "action" taken by an expert—the total derivative, —along the optimization trajectory. If so, the surrogate will follow the trajectory of a valid, if nonstandard, bilevel gradientbased procedure for minimizing the energy, corresponding to (2). Given an imperfect surrogate, the error in the final solution could trivially be bounded in terms of the error in approximating by along the trajectory, and the number of gradient steps taken. This observation informs our model, training, and data collection procedures, described in the following sections.
5 Model architecture
Our CESs take the form of a neural architecture, designed to respect known properties of the true potential energy and to maximize usefulness as surrogate energy to be minimized via a gradientbased procedure. The effects of these design choices are quantified via an ablation study in the appendix.
Reducedbasis parameterization. We require a vector representation for the function . We use one cubic spline for each horizontal and vertical displacement function along each face of the square, with evenly spaced control points and “notaknot” boundary conditions. Our vector representation is , formed from the horizontal and the vertical displacement values at each of the control points. Splines on adjacent faces share a control point at the corner. Using control points to parameterize the function along each face requires control points to parameterize a function around a single cell. For all experiments we use control points along each edge, resulting in a vector with 72 entries.
Model structure and loss. Our surrogate energy model is:
where is a neural network with parameters and
removes rigidbody rotation and translation. Our loss function is
which is the sum of losses on the th, st and nd energy derivatives:In the above, and are the gradient and Hessian of the surrogate energy or the groundtruth energy with respect to , and is sampled independently for each training example in a batch.
Invariance to rigid body transforms. The true stored elastic energy is invariant to rigid body transformation of a solid. This invariance may be hard for a neural network to learn exactly from data. We define a module which applies Procrustes analysis, a procedure that finds and applies the rigid body transformation which minimizes the Euclidean distance to a reference, for which we use the rest configuration. This is differentiable and closedform.
Encoding a linear elastic bias. The energy is approximated well by a linear elastic model when at rest: for a stiffness matrix depending on . We scale our net’s outputs by so that it needs only capture a “scalar stiffness” accounting for the geometry of given and for deviation from the linear elastic model.
Parameterizing the logstiffness. The energy of a component is nonnegative, and the ratio of energy to a linear elastic approximation varies over many orders of magnitude. We thus parameterize the log of the scalar stiffness with our neural network rather than the stiffness.
Logstiffness loss. We wish to find neural network parameters which lead to accurate energy predictions for many different orders of magnitude of energy and displacement. Minimizing the loss between predicted and true energies penalizes errors in predicting large energies more than proportional errors predicting small energies. Instead, we take the loss between the predicted logstiffness and the effective groundtruth logstiffness, .
Sobolev training with gradients. When derivatives of a target function are available, training a model to match these derivatives (“Sobolev training”) can aid generalization (Czarnecki et al., 2017). Accuracy of CES’ gradients is crucial to an accurate solution trajectory. We obtain groundtruth gradients cheaply via the adjoint method (Lions, 1971). Given a solution to the PDE in with boundary conditions , the gradient requires one linear system solve, with the same cost as one Newton step while solving the PDE (Mitusch et al., 2019). The spline is a linear map from to in the finite element basis. We can thus efficiently compute . The gradient of our model, , may be computed with one backward pass.
Sobolev training with Hessianvector products. Given a solution and gradient, computing requires linear system solves—one for each entry in . As is much smaller than the number of Newton steps we need to solve the PDE for moderate displacements, we expect the increased fidelity from a 2ndorder approximation of the energy to be worth this added computation. Computing the full Hessian of the surrogate energy, , would require backward passes. Instead we train on Hessianvector products, which require only a single backward pass additional to that required for the gradient.
Cosine distance loss for Sobolev training
. Groundtruth gradient and Hessian values vary over many orders of magnitude, roughly corresponding to lower and higher energy displacements. We wish our model to be robust to outliers and accurate across a range of different conditions. Rather than placing an
loss on the gradient and Hessianvector products as in Czarnecki et al. (2017), we minimize the cosine distance between ground truth and approximate gradients and Hessians, which is naturally bounded in .
clip, trim=.4cm .3cm .7cm .3cm 
clip, trim=.4cm .3cm .7cm .3cm 
clip, trim=.3cm 0cm .7cm .3cm 
clip, trim=.3cm 0cm .7cm .3cm 
Error in solution and in estimated energy vs solution wall clock time for the composed energy surrogate and for finite element models with varying mesh sizes. Top: axial compression. Bottom: axial tension.
6 Data and training
We carry out data collection in two phases. First, we collect training and validation datasets using Hamiltonian Monte Carlo (Duane et al., 1987) to preferentially sample displacements which correspond to lower energy modes. Next, we perform dataset aggregation (Ross et al., 2011) to augment the dataset such that the learned energy model will be accurate on the states it encounters when deployed.
6.1 Solving the PDE
To collect training data, we use the reducedbasis displacement corresponding to a vector of spline coefficients as the boundary condition around a domain representing a pore subdomain, and solve the PDE
using a loadstepped relaxed Newton’s method (Sheng et al., 2002). The relaxed Newton’s method takes the iteration:
(13) 
Above, is the relaxation parameter (analogous to a step size), and is the vector of coefficients defining in the FEA basis. Newton’s method requires an initial guess which is sufficiently close to the true solution (Kythe et al., 2004). Smaller relaxation parameters yield a greater radius of convergence but necessitate more steps to solve the PDE.
The radius of convergence can also be aided by loadstepping: solving the PDE for a sequence of boundary conditions, annealing from an initial boundary condition for which we have a good initial guess (e.g., the rest configuration) to a final boundary condition , using the solution to the previous problem as an initial guess for Newton’s method for the next problem. We find that combining load stepping with a relaxed Newton’s method allows problems to be solved more efficiently than using either alone. Except where specified, we linearly anneal from rest to over load steps and use a relaxation parameter .
6.2 Initial dataset collection
We wish to train on a wide variety of displacement boundary conditions. Solution procedures minimize the energy: thus, lower energy modes will be encountered in the solve. We choose a shaping distribution where the density is the product of a Boltzmann density and a Gaussian density , where
is a macroscopic strain tensor
^{2}^{2}2See the appendix for approximating from . represented by ,is a target macroscopic strain drawn from an i.i.d. Gaussian with standard deviation
, and is set to .Given a solution to the PDE, the logdensity and its displacement may be cheaply computed (the latter via the adjoint method). Making use of these gradients, we sample data points with Hamiltonian Monte Carlo (HMC). After sampling a data point, we compute the corresponding Hessian and save the tuple as a data point.
We initialize each HMC data collector by sampling a macroscopic displacement target and a random pore shape. We do not use loadstepping, instead using the solution for the
used in the previous iteration of HMC’s leapfrog integration as an initial guess for solving the PDE. We randomize HMC hyperparameters for each collector to attempt to minimize the impact of specific settings: see the appendix for exact ranges. We sample
training examples and validation examples altogether. We visualize displacements drawn from this distribution in the appendix.6.3 Data aggregation
The procedure of deploying the surrogate defies standard i.i.d. assumptions in supervised machine learning. That is, when deployed, the surrogate encounters states determined by the energy it defines and on the boundary conditions placed on the composed body. Depending on this energy and on the boundary conditions, the surrogate may encounter states which do not resemble those sampled with HMC.
This problem—that training an agent to predict expert actions with supervised learning leads to trajectories dissimilar to those on which it was trained—is a central concern in the imitation learning literature. A number of solutions exist (Schroecker and Isbell, 2017). One such technique is dataset aggregation, or DAgger (Ross et al., 2011), an extension of earlier approaches SEARN (Daumé et al., 2009) and SMILe (Ross and Bagnell, 2010), which reduces imitation learning or structured prediction to online learning.
In DAgger, a policy is deployed and trajectories are collected. The expert is queried for groundtruth actions on the states in these trajectories. The stateaction pairs are appended to the dataset, and the policy is retrained on this dataset. This process of deployment, querying, appending data, and retraining, is iterated. Under appropriate assumptions, the instantaneous regret of the learned policy vanishes with the number of iterations, i.e., the learned policy will match the expert policy on its own trajectories.
Ross et al. (2011) present DAgger as a method for discrete action spaces. Our problem has a continuous action space: the gradient of the energy in a cell. We do not investigate whether it is possible to generalize DAgger’s regret guarantees to continuous action spaces, but the intuition holds that we wish our model to “imitate” the finite element “expert” on the optimization trajectories the model produces.
We initialize our training data as described in the preceding section. We then apply DAgger by iterating: (i) training the surrogate; (ii) composing surrogates and finding the displacements which minimize the composed energy; (iii) sampling displacements which lie along the surrogate’s solution path, querying the groundtruth energy and energy derivatives using the finite element model, and adding these new data points to the dataset. We visualize displacements generated by DAgger in the appendix.
7 Software, hardware, and systems
We implement the finite element models in dolfin (Logg and Wells, 2010; Logg et al., 2012b), a Python front end to FEniCS (Alnæs et al., 2015; Logg et al., 2012a). To differentiate through finite element solutions, we use the package dolfinadjoint (Mitusch et al., 2019)
. We implement surrogate models in PyTorch
(Paszke et al., 2019).We use Ray (Moritz et al., 2018) to run distributed workloads on Amazon EC2. The initial dataset is collected using 80 M4.xlarge CPU spot workers. While training the surrogate, we use a GPU P3.large driver node to train the model, and 80 M4.xlarge CPU spot worker nodes performing DAgger in parallel. These workers receive updated surrogate model parameters, compose and deploy the surrogate, sample displacements along the solution path, query the finite element model for energy and derivatives, and return data to the driver node. Initial dataset collection and model training with DAgger each take about one day in wallclock time.
8 Empirical evaluation
In this section we demonstrate the ability of Composable Energy Surrogates (CES) to efficiently produce accurate solutions. We compare wallclock computation time and solution accuracy of CES to that of FEA at varying fidelities.
We consider the systems constructed in Overvelde and Bertoldi (2014): structures with an array of pores, corresponding to a assembly of our surrogates, each of which represents a
pore component. We sample pore shapes from a uniform distribution over the valid shapes defined in
Overvelde and Bertoldi (2014). For DAgger, we sample macroscopic vertical axial strain magnitudes from, and choose to apply compression with probability
(as compressive displacements involve more interesting pore collapse) or tension with probability .We compare our composed surrogates to finite element analysis with a range of differentfidelity meshes under axial compression and tension with a macroscopic displacement of , where is the original length of the solid. See the appendix for details of the finite element meshes. Comparison is carried out for seven pore shapes: , corresponding to circular pores, and six sampled from a uniform distribution over pore parameters defined as valid in Overvelde and Bertoldi (2014) via rejection sampling.
For the composed surrogate, we use PyTorch’s LBFGS routine to minimize the energy, with fixed step size and PyTorch’s default criteria for checking convergence. We attempt to solve each finite element model with Newton’s method with load steps and relaxation parameters . We record the time taken for the fastest solve which converges. Under compression these solids exhibit nonlinear behavior, and only the more conservative solves converge. Under tension they behave closer to a linear elastic model, and Newton’s method converges quickly. Measurements are taken on an Amazon AWS M4.xlarge EC2 CPU instance. Using a GPU could provide further acceleration for the composed surrogate.
We measure error in the solution and in the macroscopic energy. The former is , where and are the approximation and groundtruth evaluated at the spline control points. We also measure the relative error, , where is the approximated energy of the approximate solution, and is the groundtruth energy of the groundtruth solution. As the energy function determines macroscopic behavior, accuracy of this energy is a potential indicator of a model’s ability to generalize to larger structures. The highestfidelity finite element model is used for the ground truth and , and has an error of zero on both metrics by definition. Multiple solutions exist as energy is preserved under vertical and horizontal flips, so before comparing a solution to the groundtruth we programatically check these flips and use the flip which minimizes the Euclidean distance.
Figure 3 shows our evaluation. Composed energy surrogates are more efficient than highfidelity FEA simulations yet more accurate than lowfidelity simulations. CES produces solutions with equivalent error to FEA solutions which need an order of magnitude more variables and computation time, or produces solutions with an order of magnitude less error than FEM solutions requiring the same computation. This gap increases to several orders of magnitude when we consider percentage error in the predicted strain energy. We visualize the groundtruth and the CES approximation in Figure 4. See the appendix for visualization of the FEM solutions and of CES for the remaining structures.
clip, trim=3.35cm 1.7cm 2.7cm 1.8cm 
clip, trim=2.2cm 1.7cm 2.3cm 1.8cm 
clip, trim=4.5cm 1.5cm 4.0cm 1.5cm 
clip, trim=4.5cm 1.5cm 4.0cm 1.5cm 
9 Limitations and opportunities
Use of DAgger. We use DAgger to assist CES to match the groundtruth on the states encountered during the solution trajectory. DAgger requires one to specify in advance the conditions under which the surrogate will be deployed, limiting the surrogate’s use for arbitrary downstream tasks. Investigating CES’s ability to generalize to novel deployment conditions–and designing surrogates which can do so effectively–is an important direction for future work.
Error estimation, refinement, and guarantees. Finite element methods enjoy advantageous properties. As one uses a higherresolution mesh or higherorder elements, the solution in the finite element basis approaches the true solution. This provides a straightforward way to estimate the error (compare to the solution in a morerefined basis) and control it (via refinement). CES currently lacks these properties.
Finite element baseline. There is an immense body of work on finite element methods and iterative solvers. We try to provide a representative baseline, but our work should not be taken as a comparison with the “stateoftheart”. We aim to show that composable machinelearned energy surrogates enjoy some advantages over a reasonable baseline, and hold promise for scalable amortization of solving modular PDEs.
Hyperparameters. Both our method and the finite element baseline rely on a multitude of hyperparameters: the size of the spline reduced basis; the size and learning rate of the neural network; the size and degree of the finite element approximation; and the specific variant of Newton’s method to solve the finite element model. We do not attempt a formal, exhaustive search over these parameters.
Known structure. We leave much fruit on the vine in terms of engineering structure known from the true equations into our surrogate. For example, we do not engineer invariance of the energy to flips and rotations of the spline coefficients. One could also use a more expressive normalizer than , e.g. the energy predicted by a coarsegrained linear elastic model, or exploit spatially local correlation, e.g. by using a 1d convolutional network around the boundary of the cell.
10 Conclusion
We present a framework for collapsing optimization problems with a local bilevel structure by learning composable energy surrogates. This framework is applied to amortizing the solution of PDEs corresponding to mechanical metamaterial behavior. Learned composable energy surrogates are more efficient than highfidelity FEA yet more accurate than lowfidelity FEA, occupying a new point on the Pareto frontier. We believe that learning composable energy surrogates could accelerate metamaterial design, as well as design and identification of other systems described by PDEs with parametric modular structure.
11 Acknowledgements
We would like to thank Alexander Niewiarowski for numerous helpful discussions about continuum mechanics and FEA, Ari Seff for help finding a particularly difficult bug in our data pipeline, and Maurizio Chiaramonte for inspiring early conversations about metamaterials and model order reduction. This work was funded by NSF IIS1421780 and by the Princeton Catalysis Initiative.
References
 The FEniCS project version 1.5. Archive of Numerical Software 3 (100). Cited by: §7.
 Efficient optimization of loops and limits with randomized telescoping sums. arXiv preprint arXiv:1905.07006. Cited by: §2.
 The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics 25 (1), pp. 539–575. Cited by: §1.
 Flexible mechanical metamaterials. Nature Reviews Materials 2 (11), pp. 1–11. Cited by: §1, §3.
 Smash: oneshot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344. Cited by: §2.
 Optical metamaterials. Vol. 10, Springer. Cited by: §1, §3.
 An introduction to the proper orthogonal decomposition. Current Science, pp. 808–817. Cited by: §1.
 Sobolev training for neural networks. In Advances in Neural Information Processing Systems, pp. 4278–4287. Cited by: §5, §5.
 Searchbased structured prediction. Machine Learning 75 (3), pp. 297–325. Cited by: §6.3.
 Hybrid Monte Carlo. Physics Letters B 195 (2), pp. 216–222. Cited by: §6.
 Recent advances in surrogatebased optimization. Progress in Aerospace Sciences 45 (13), pp. 50–79. Cited by: §2.
 Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (6), pp. 721–741. Cited by: §2.
 Metamaterial mechanisms. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 529–539. Cited by: §3.
 Autoencoding variational Bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
 An introduction to linear and nonlinear finite element analysis: a computational approach. Appl. Mech. Rev. 57 (5), pp. B25–B25. Cited by: §6.1.
 Optimal control of systems governed by partial differential equations. Grundlehren der mathematischen Wissenschaften, SpringerVerlag. External Links: LCCN lc78113638 Cited by: §5.
 The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association 89 (427), pp. 958–966. Cited by: §2.
 Automated solution of differential equations by the finite element method. Springer. External Links: Document Cited by: §7.
 DOLFIN: a C++/Python finite element library. In Automated solution of differential equations by the finite element method, pp. 173–225. Cited by: §7.
 DOLFIN: automated finite element computing. ACM Transactions on Mathematical Software (TOMS) 37 (2), pp. 1–28. Cited by: §7.

Dolfinadjoint 2018.1: automated adjoints for FEniCS and Firedrake.
Journal of Open Source Software
4 (38), pp. 1292. Cited by: §5, §7.  Ray: a distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 561–577. Cited by: §7.

Stress field prediction in cantilevered structures using convolutional neural networks
. Journal of Computing and Information Science in Engineering 20 (1). Cited by: §1.  Nonlinear elastic deformations. Dover Civil and Mechanical Engineering, Dover Publications. External Links: LCCN 97016162 Cited by: §3.
 Relating pore shape to the nonlinear response of periodic elastomeric structures. Journal of the Mechanics and Physics of Solids 64, pp. 351–366. Cited by: §1, §3, §8, §8.

PyTorch: an imperative style, highperformance deep learning library
. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §7.  Hyperbolic metamaterials. Nature photonics 7 (12), pp. 948. Cited by: §1, §3.
 Surrogatebased analysis and optimization. Progress in Aerospace Sciences 41 (1), pp. 1–28. Cited by: §2.

Efficient reductions for imitation learning.
In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, pp. 661–668. Cited by: §6.3.  A reduction of imitation learning and structured prediction to noregret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. Cited by: §6.3, §6.3, §6.
 State aware imitation learning. In Advances in Neural Information Processing Systems, pp. 2911–2920. Cited by: §6.3.
 Truncated backpropagation for bilevel optimization. arXiv preprint arXiv:1810.10667. Cited by: §2.
 Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148–175. Cited by: §2.
 An automatic Newton–Raphson scheme. The International Journal Geomechanics 2 (4), pp. 471–502. Cited by: §6.1.

Unbiasing truncated backpropagation through time
. arXiv preprint arXiv:1705.08209. Cited by: §2. 
A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation
. In Advances in Neural Information Processing Systems, pp. 1353–1360. Cited by: §2.  Understanding shorthorizon bias in stochastic metaoptimization. arXiv preprint arXiv:1803.02021. Cited by: §2.
 Physicsconstrained deep learning for highdimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. Cited by: §1.
Comments
There are no comments yet.