Most problems in materials science and engineering require the exploration of linkages between materials processing history, materials structure, and materials properties. Generally referred as Process-Structure-Property linkages kalidindiMatin, they constitute the core materials knowledge needed to drive materials innovation supporting advances in technology mgi_plan_2014. Traditional physics-based numerical methods have long been the standard for solving the governing field equations underpinning these linkages. For mechanical problems, these have included ubiquitous finite element methods morton_mayers_2005; roters2010_fea; mowei_2007_cahn; mcdowell2009 as well as FFT-based spectral methods moulinec1998; de_Geus_2017; michel2001_fft. However, standard solvers can constitute a major performance bottleneck in problems which require repeated solution over varied inputs, such as inverse problems sneider2001; wu_2007_closure; jain2014 and multi-scale materials design Parno_2016; horstemeyer_2009; chen2020.
As an alternative, machine learning (ML) provides tools to approximate unknown linkages in a parametrized fashion, with great success in many domains carleo_2019physicsML. One of the most successful classes of ML models is neural networks Schmidhuber_2015, which have been applied with excellent results both in general applications carleo_2019physicsML; krizhevvsky2012_alexnet; adler2018; He_resnet, and within materials science chen2020; yang2018Homogenization; yang2019; Schmidt2019RecentAA. Unfortunately, ML models tend to act as “black boxes” whose inner workings do not permit the depth of analysis provided by purely physics-based models buhrmester2019analysis. There is a clear demand for approaches that leverage the advantages of both methodologies in order to build reliable, scalable, and interpretable reduced-order models.
One example of such an effort is the Materials Knowledge Systems (MKS) framework brough2016_mks; Latypov2018MaterialsKS. Aimed at multiscale materials design kalidindi_bayes_2019; kalidindi2019_ela, MKS formulates the governing field equations for heterogeneous materials in a Lippmann-Schwinger (L-S) form moulinec1998; lippmannschwinger. Using regression techniques to calibrate the first-order terms of a series expansion to the L-S equation, MKS presents a generalized approach for solving a broad class of scale-bridging problems in materials design and optimization brough2017; priddy2017. However, improving the accuracy of these models requires higher-order L-S terms, which rapidly become more computationally expensive.
As an alternative, we propose an approach inspired by the intersections between iterative spectral methods lebensohn2020 and recent advances in inverse imaging adler2018; putzky2017; we cast the recurrent L-S equation as an optimization problem. Rather than employing a predefined optimization strategy, such as gradient descent or conjugate gradients Stein1952GradientMI, the optimizer is posed as a recurrent collection of convolutional neural networks. After being calibrated to available curated data (e.g., results of FEA simulations, phase field models), these networks act as proximal (or “update”) operators which take in a candidate solution and output an improved version. This iterative methodology emphasizes the underlying physics to permit greater model robustness and deeper analysis.
In this paper, we begin with a brief analysis of the L-S equation and its application to the linear elasticity problem, followed by discussion on the general L-S equation. Using this analysis, we then demonstrate how the L-S equation can be naturally posed as a machine learning problem, and how a neural network can learn proximal operations which minimize a physical quantity (e.g., stress field divergence) within a solution field. By exploring the interplay between the physical and computational interpretations of the L-S equation, we provide insight into a new class of ML models for materials science. We then analyze which aspects of our approach provide the greatest gains by exploring various model configurations, reinforcing the value of an iterative (rather than feed-forward) approach. Finally, we evaluate our methodology on the problem of elastic localization and compare it to a previous machine learning model.
2.1 Linear Elasticity and L-S
Originally developed in the context of quantum mechanics lippmannschwinger, the L-S equation – or class of equations – is an implicit integral form that can represent a fairly general space of physical phenomena. The L-S equation is especially useful in the context of physics of heterogeneous media with spatially-varying physical parameters: stiffness, conductivity, density, etc. We defer discussion of the general Lippmann-Schwinger form until Section 2.2.
As a case study, we consider the problem of computing the internal elastic strain field of a composite material undergoing bulk stresses or strains moulinec1998; lebensohn2020; landi2010. The composite microstructure is assumed to be composed of two or more distinct phases (i.e., thermodynamic material constituents), each exhibiting its own constitutive laws. This problem is herein referred to as elastic localization. An example two-phase microstructure and corresponding elastic strain field are presented in Figure 1.
Physically, elastic localization is governed by a generalized Hooke’s law relating the variation in stress , strain , and stiffness over some volume , along with the demand that the equilibrium stress field be divergence-free:
We consider constant periodic boundary conditions that correspond to the imposed volume-averaged strain, . With these choices, one can model the internal mechanical response of a representative volume element (RVE) of the material. In these models, the RVE often serves as a statistical unit cell of a larger material structure. We note that the problem statement expressed here serves as a simple illustration of the L-S approach, which has been successfully applied to more complex material systems and/or boundary conditions lebensohn2020; brough2017.
Following the work of Kröner kroner1972, this system is converted into an equivalent form (the elastic L-S equation) by splitting the local elastic stiffness into a selected (constant) reference value and a perturbation value
. Substituting these stiffness tensors into Equations (1) and (2) provides a partitioned version of the governing field equations:
Observe that the left-hand side of this equation is a linear operator, denoted as , which acts on :
Clearly is linear in ; therefore, it will have a corresponding Green’s function green1828_essay. Since divergence is uniform in space, we make the simplification . This function represents the system’s response to an impulse inhomogeneity
where denotes the Dirac-delta function.
We also partition the strain field as , where represents a (constant) reference strain determined by boundary conditions (this is also equal to the internal average strain ) and is the corresponding perturbation strain. Both and are constant, so and thus is the homogeneous solution to . For a given inhomogeniety , we can use the Green’s function to solve for the particular solution :
Now we formally treat the right-hand side of Equation (3) as the inhomogeneity to obtain the elastic Lippmann-Schwinger Equation:
Since many governing equations can be converted into a similar form, we refer to their transformed version as an “L-S form”; that is, Equation (9) is the Lippmann-Schwinger equation corresponding to the elastic governing equations. A Lippmann-Schwinger derivation corresponding to a general governing equation is presented in Section 2.2. The reader is referred to Refs. eisler_1969; wikipedia2020_greensfunction for more background on Green’s functions.
Using insights from previous work yang2019; kalidindi_bayes_2019; landi2010, we modify this form to make it amenable to a learning paradigm. First, we integrate-by-parts to shift the derivative onto and absorb it into a new operator, . Using to concisely denote a convolution, we obtain
Next, we define a binary microstructure representation that equals 1 if the material at point is of phase (or material type) , and 0 otherwise. Since each phase has its own stiffness, we can project the stiffness tensor onto each phase: and likewise . Finally, we combine the Green’s function terms with the expression to obtain yet another operator . Applying all of these modifications, the elastic L-S form becomes
The problem of elastic localization has thus been reduced to a single convolutional integral containing the microstructure , candidate strain field , and a physics-determined stencil . Curiously, the first two terms appear solely as an element-wise product between and . This is due to the fact that the strain field is constrained indirectly by the divergence-free condition on . One also observes that all effects of have been absorbed into . This corresponds to the fact that is not unique: infinitely many choices of could result in this equation. Although mathematically equivalent to the original physics (and visually more complicated), Equation (11) provides significant advantages for solution over large heterogeneous volumes. Several solution strategies have been explored to address elastic localization in heterogeneous material systems, resulting from different interpretations of the L-S equation.
Mathematically, is a set of convolutional kernels – one for each phase – encoding the underlying physics of the problem. Given a strain field, it computes the corresponding stresses and peels off the strain perturbation field required to minimize the stress divergence. Using this perspective, many existing models view the L-S equation as a fixed-point equation and solve it via root-finding moulinec1998. The rate of convergence of these methods tends to depend heavily on the variation in material properties and the choice of lebensohn2020. All of these physics-based approaches require a quantitative knowledge of the Green’s function .
From a computer science perspective, the term simply represents the strain field segmented by phase (since is a binary indicator function). Given a collection of true structure-strain pairs, one could either learn , or some approximation, to best conserve the equality. Following this view, several ML-based elastic localization models yang2019; landi2010 have been applied to learn (non-iterative) linkages between and by either approximating a series expansion of , or using a neural network to map to directly, bypassing completely. The disadvantage of these models is that they either truncate or ignore the underlying physics, trying to re-learn it from data. The tension between these two perspectives leaves room for a hybrid method which retains the iterative L-S structure, but uses ML to deduce the internal details of the operator.
2.2 General L-S equation
This section presents a derivation for the general L-S equation from a generic governing equation, drawing on the work of Moulinec moulinec1998 and Kröner kroner1972. This is provided for two reasons: to provide greater intuition and background for the L-S equation, and to motivate its compatibility with ML solvers. First, we write (in conservation form) a governing differential equation controlling a field which varies over space spanning some volume :
Observe that any governing equation can be partitioned into two coupled subequations, each governing their own subsystems. First define an operator which captures all linear (and spatially homogeneous) components of . Now define a second (possibly nonlinear) operator containing the rest of . One obtains the earlier example of elastic localization with the substitutions , , and . Although not explicitly denoted, both and may contain implicit information about the solution domain’s structure (terms such as or ).
Using these operators we can rewrite the original equation as:
This partitions the governing equation into two coupled systems: a linear homogeneous system permitting only “simple” solutions, and a nonlinear, heterogeneous system where the solution is more complicated. Before solving the complete equation, we consider the auxiliary system
for some inhomogeneity . We define the homogeneous solution to this system as , so that . Note that in general, is determined by both and the relevant boundary conditions, and for some problems there may be more than one suitable . For problems with a constant solution field on the boundary, one finds that , i.e., the reference field is the average solution everywhere.
The choice of induces a corresponding perturbation (or “particular solution”) . Because is annihilated by , note that . Since is linear, it will have a Green’s function , which captures the system’s impulse response eisler_1969. Using this we write the particular solution to the auxiliary equation as a convolution between and and reconstruct the complete solution:
Now we return to Equation (13b) and apply the auxiliary approach, this time treating the entire term as our homogeneity (and noting the attached minus sign). Plugging this into the perturbation expression for gives us:
This is the general Lippmann-Schwinger equation; since this derivation holds for any operator , we use the term “L-S form” for a given to describe the result of partitioning and substituting that into Equation (18). Referring back to the example of elastic localization, Equation (9) is the equivalent L-S form for Hooke’s law (Equation (2)). Note that the linear system only enters the L-S equation through the definitions of and . For example, if one used the trivial choice of the identity for , the corresponding Green’s function would just be the Dirac delta function, and Equation (18) would simplify to the original governing equation.
For the L-S form to be advantageous over , must capture a non-trivial amount of the underlying equation. There are three primary factors which make the L-S equation useful. First, it is partitioned: the original system is broken into two coupled physical systems. This makes it similar to a traditional “splitting method” where the linear and nonlinear components are separated, allowing the solution of the nonlinear components to be informed by the homogeneous, linear solution. The coupling between systems means that the L-S equation is also recursive: the presence of in the inhomogeneity term leads to its appearance on both sides of the equation. If one desires to solve the problem analytically, the L-S form is likely no more useful than the original governing equation. However, the implicit structure is very suitable for iterative and optimization-based solvers michel2001_fft; lebensohn2013_crystal. Finally, the L-S equation is convolutional: the integral is actually a convolution between the term and a (possibly-unknown) Green’s function . Roughly speaking, Equation (18) presents the solution field as a balance between the global homogeneous “pull” () and the localized “tug” () of the solution values in a neighborhood near . In situations where is a purely differential operator (such as elastic localization), and with appropriate boundary conditions, Equation (18) can be integrated-by-parts to shift part of onto . This can simplify the integral term so that all of the physics is contained in a single convolutional stencil.
2.3 Neural Networks Background
As one of the most popular ML tools in use, neural networks are a class of parametrized function approximators that can be calibrated to curated data Schmidhuber_2015
. At a high level, an artificial neural network (ANN) operates as an alternating sequence of tunable linear transforms and nonlinear activation functions – mimicking the operation of physical neurons in the brainmccullock1988_neuralnets; rosenblatt1957perceptron. Under certain conditions, a sufficiently large neural network can be shown to act as a universal function approximator Cybenko1989ApproximationBS; Pinkus1999ApproximationTO, motivating their use in a myriad of disciplines.
Two relevant specializations are the convolutional neural network (CNN), which uses convolution with a fixed-width stencil as its transform operation lecun1998_CNNs; krizhevvsky2012_alexnet
, and the recurrent neural network (RNN), which operates on sequential data and considers latent information carried across input iterationslipton2015rnn. These tools can be combined to model the underlying structure of various problems. A well-designed ML model trained on sufficient data can be significantly faster than an analytical equivalent and still provide reasonable accuracy streib_2020. This comes at the expense of interpretability – they return a “black-box” model which is difficult to understand and analyze buhrmester2019analysis. Additionally, the topology of these networks (e.g., number of layers, nodes per layer, activation functions) strongly determines their success Ojha_architecture, and the “best” configuration is problem-dependent and often constructed ad-hoc.
Recently there has been tremendous interest in the application of neural networks to mathematical problems carleo_2019physicsML. Specifically, variations of recurrent CNNs have been explored andrychowicz2016learning to learn Bayesian priors for image denoising putzky2017 or proximal operators for medical imaging adler2018. These image analysis methods pose the problem such that the desired output is obtained via a learned optimization procedure, where the optimizer itself is formulated as a neural network. Surprisingly, these methods often employ very simple network designs, especially compared to deeper and more elaborate structures found in mainstream ML He_resnet; Iandola_squeezenet.
3.1 L-S as learned optimization
We now explore how the perturbation expansion and L-S form allow a governing equation to be interpreted naturally as a machine learning problem. We first define a new operator representing the entire right-hand side of Equation (18). We also use to represent a problem domain’s underlying microstructure, which influences the inhomogeneity . Given a sample microstructure , we obtain the corresponding strain by minimizing the error (or loss) between and over all .
Although may not be linear itself, linear analysis methods provide a useful interpretation: for a given microstructure ,
has a (possibly non-unique) generalized eigenfunction
with unit eigenvalue. Issues regarding the solution’s uniqueness and existence can arise from the original governing equation’s nonlinearity. In the case of the elasticity problem, the governing equation is linear, so a unique solution will exist.
Now the original problem of solving the governing equation has been reduced to that of minimizing the scalar loss given a particular microstructure via a learned optimization strategy. To do this, we define a parametrized learner that performs a sequence of proximal operations which progressively improve the solution field to match the governing physics. Here represents all possible parameters of this learner. The optimal set of parameters is obtained by minimizing the expected error produced by w.r.t.
; in the case of a CNN this represents the optimal network weights, and can be obtained via standard backpropagationrumelhart1988_backprop. Given a microstructure and initial guess , we want to provide a solution field which is approximately the true solution field :
This is accomplished by a sequence of updates
where represents the perturbation proximal operator used at iteration
; given a microstructure and candidate strain field, it outputs the perturbation component corresponding to an improved estimate of the true strain field. A pseudocode and visual representation of the approach developed in this work are presented in Figure2. Ideally, after the model is trained, and have the same eigenfunctions (i.e., ).
The optimization strategy employed by the model is directly determined by the choice of
. To explore what this might look like for the elastic problem, consider an L2 loss function and plug in the elastic LS formula (Equation (11)):
The original fixed-point approach of Moulinec moulinec1998 corresponds to the choice
As an alternative, we can obtain a “steepest descent” formula by taking the gradient Stein1952GradientMI of Equation (24):
where denotes the step size at iteration and
represents the identity operator. By flattening everything into vectors and representing the convolution with an equivalent Toeplitz matrixgray_toeplitz, one can convert the product term in Equation (27) into a single linear operator acting on . The linearity of comes from the fact that the term is actually independent of . Using a constant to collect remaining terms of , the steepest descent rule becomes:
Effectively, the gradient descent rule says that the new perturbation field is obtained by correcting the previous perturbation field () using a set of convolutional operations involving and , then subtracting off a factor of such that the output perturbation field is zero-mean.
A variety of more complicated update rules have been proposed to accelerate the solution of different forms of the L-S equation lebensohn2020. From the examples above one sees that any update rule will involve various terms of , which itself contains both physical properties () as well as derivatives of the Green’s function (); more complicated update rules will simply require higher-order combinations. Therefore, if we hope to learn , it must be parametrized in a way such that it can capture global convolutional stencils, as well as various derivatives thereof. Finally, we note that although most analytical approaches employ the same operator for each iteration, the formulation permits varying across iterations.
3.2 CNNs for Lippmann-Schwinger
Most analytical minimization procedures are (by design) problem-agnostic; this means that they can be expected to work reasonably well for many problems, but may not be optimal for the problem at hand. Rather than using a predefined optimization strategy, we formulate each as a CNN that learns a proximal operator mapping a given microstructure and candidate solution field to an improved perturbation field. The central motivation behind using a CNN proximal operator is that given sufficient parameterization, it can emulate almost any optimization strategy; furthermore, that strategy will be customized to the problem at hand during training. Of course, this comes with the immense caveat that, absent any advances in theoretical ML, a learned optimizer will not have any provable convergence guarantees. The means that even as , our learned model may not converge to the true solution field for a given microstructure. In practice, however, the model can be tuned and trained until it consistently produces solutions within acceptable error tolerances.
We define the model imbued with a CNN as a recurrent localization network (RLN), which performs the following operations during both training and evaluation phases: given a microstructure and initial guess , estimate the true solution field by refining it over iterations. At iteration , the microstructure is combined with candidate solutions from previous iterations and passed through CNN . Note that outputs a perturbation field . The reference solution is already known, so there is no need to learn that. In order to simulate a multi-step solver morton_mayers_2005; ullah_multistep and estimate higher-order derivative terms, considers the last solutions via multiple input channels (rather than just ). One could potentially achieve this property, and perhaps obtain better results, by using GRU or LSTM modules putzky2017 which learn a “latent” state to pass between iterations; however, 3D convolutional implementations for these operations were not part of major ML libraries at the time of writing.
Specifically for elastic localization, and are combined via element-wise multiplication following Equation (11). To enforce computational stability, all strains are normalized by the average strain . Moreover, the output of each network has its average subtracted to enforce the constraint that the perturbation strain field is always zero-mean.
Following prior work adler2018, we define a full RLN as using a different for each iteration (although we use the same CNN structure for each), for a total of distinct networks. The idea behind this is to allow the network capture different properties at each iteration, akin to terms in a series expansion. Having significantly more tunable parameters, this configuration provides the most expressive model. By allowing different operators to be employed at different iterations, this approach also deviates the most from standard analytical optimization procedures.
Alternatively, one may wish to reuse the same intermediate network across iterations (). This approach is denoted as RLN-t since the weights are tied between iterations. This means that the same operator will be employed at each iteration; however, since a time series is fed into each (rather than a single data point), the RLN-t is still able to learn higher-order derivatives. It has the primary advantage of simplicity and efficiency, since it uses a factor of fewer parameters than the full RLN.
Finally, we test the importance of the iterative and recurrent nature by considering a single network, i.e., choosing . We call this a feed-forward localization network (FLN) and use it as a control to quantify the benefits of iteration vs. network design. Although the RLN-t and the FLN have the same number of parameters, the RLN-t uses each parameter times, effectively simulating a deeper network.
For , the proximal CNNs are all calibrated simultaneously via backpropagation. During training a batch of true structure-strain pairs are fed through in its entirety, and all proximal networks are updated simultaneously to minimize a calibration loss function (different from the solution field loss above). Rather than only consider the loss of the last iterate, we use a weighted sum of the loss of individual iterates: for some weights . The goal is to encourage the network to progress between iterations, while also finding the best possible solution. Following the analysis of Andrychowicz et al. andrychowicz2016learning we interpret the use of as a variant of Backpropogation Through Time Mozer1989_backprop_time. The choice of weights could theoretically act as a form of stabilization or even regularization. By requiring that each iteration output a reasonable candidate solution, each proximal operator is constrained to behave somewhat physically, which might help prevent overfitting. However, this means that the network is encouraged to make larger changes in early iterations, potentially reducing its final-iterate accuracy. Conversely, if only the final result is important, then intermediate iterations could explore the loss curve more. However, only updating based on the last iteration will slow the model’s training, and possibly increase the chances of the network weights converging to a poor local minimum. Clearly further experiments are required to explore these hypotheses.
Following similar works putzky2017 we chose the uniform weighting . This induces the network to make larger corrections in early iterations (to avoid carrying costly errors through several iterations) and relatively smaller corrections in later iterations. However, our numerical experiments (Section 4) indicate that perhaps a different weighting might help the network capture fine-scale microstructure features. The appropriate number of iterations depends on the specific problem; for elasticity proved sufficient in both the RLN and the RLN-t, and more iterations yielded little benefit. For the number of previous iterations to track, the value
was chosen. The above choices of hyperparameter were largely heuristic and made for simplicity, but they worked well for elasticity; for a more intensive problem a cross-validation procedure would be a better method for their selection.
3.3 Proximal operator design
Various network topologies for adler2018; putzky2017; milletari2016vnet were tested with one goal in mind: use convolutional kernels to effectively capture local interactions. The architecture that proved most successful for elasticity was based roughly on a V-Net milletari2016vnet and is presented in Figure 3. By combining information across length scales through the use of up- and down-sampling operations, this architecture is able to encode and combine both fine- and coarse-grained features. Notably, even a simple sequence of 3 convolutional layers (similar to that of Adler adler2018) worked reasonably well. As with the hyperparameters above, a cross-validation procedure could aid in picking a superior network topology for a given problem; for this work, we intentionally avoided hyperparameter optimization to emphasize the mathematical analysis and iterative approach.
For this study, most convolutional kernels are 3x3x3, with 1x1x1 kernels used for channel reduction and 2x2x2 for up/down-sampling, summing to 160,000 learnable parameters for each proximal network
. A parametric rectified linear unit (PReLU) activation functionhe2015_prelu
was used was used after the 3x3x3 convolutions. This activation is similar the regular ReLU, but has a non-zero slopefor negative inputs; is trained with all the other network weights during backpropogation. In our experiments this improved the model’s stability during training and accuracy during testing. Effectively, it allows the network to be much more expressive with only a few extra parameters – changing one scales the entire output of a channel, rather than weights representing each voxel in that channel’s filter.
4 Linear Elasticity Experiments
We now present results from the application of RLN-type models to the elastic localization problem in two-phase composites. A synthetic dataset of 20,480 microstructure/strain field pairs was generated and randomly split 40% / 20 % / 40% into disjoint train/validation/test sets, respectively. The train set was used to calibrate , the validation set served to measure its quality while training, and the test set was used to compute error metrics.
The microstructures were generated via PyMKS brough2016_mks by creating a uniform random field, applying a set of Gaussian filters, and thresholding the result. In this microstructure generation process, the filter’s width in each direction controls the characteristic size and shape of the material grains, and the threshold controls the relative volume fraction of each phase.
Combinations of these four parameters were generated via a Latin hypercube sampling procedure to generate 4096 design points for microstructure generation. For each design point, 5 random microstructures were generated in order to produce statistically similar inputs, so that the datasets had some statistical redundancy. A random sample of 8 training microstructures is displayed in Figure 4. By varying these design parameters, one can cover a vast subspace of all possible microstructures, although we note that the space of all microstructures in intractably large – ignoring circular symmetries there are possible different microstructures with
voxels. By selecting our training and testing sets using the same procedure, we demonstrate the RLN’s ability to interpolate between “familiar” microstructure samples, but not to extrapolate to microstructures which lie outside the closure of the training set. It is important to note that these microstructures do not necessarily correspond to thermodynamically favorable structures (i.e. they might appear in nature). However, this is a limitation only in data and not methodology – by using a fully-convolutional structurelong2015fully the RLN can handle any voxelized input, including experimental data.
The elastic strain fields in each microstructure were obtained from previously-built finite element models yang2019. One major simplification is that even though the imposed 3D strain field is actually a second-order tensor (and the elastic stiffness field a fourth-order tensor), the linear nature of the problem allows us to solve for each component individually and superimpose their solutions as needed landi2010. As such we apply an overall displacement to a material RVE along only the -axis and we focus solely on the term. For these simulations, a total strain of 0.1% was applied via periodic boundary conditions.
The relevant material parameters to be prescribed are thus the elastic moduli and Poisson ratios of the two phases. To roughly match most common metals, we chose . With these selections, the contrast ratio has the most dominant role on the final strain field. With the choice , one observes that . In general, as the contrast in stiffnesses increases, the problem becomes harder to solve with both iterative and data-driven methods lebensohn2020. Following prior work yang2019, we tested the RLN on contrast ratios of 10 and 50.
Each RLN model was implemented in PyTorchPaszke_2019pytorch
and trained independently for 60 epochs on an NVIDIA V100 GPUPACE, which took approximately 8 hours. The RLN models were all calibrated using the Adam optimizer, a Mean Square Error loss, and a Cosine annealing learning rate decay loshchilov2017sgdr. The last choice proved especially important since the models demonstrated great sensitivity to learning rate; introducing the cosine decay helped the model converge smoothly and quickly. After training, the epoch with the lowest validation loss was chosen as the “optimal”. In reality, the training and validation losses were tightly coupled as demonstrated in Figure 5. Note that training losses are aggregated during the epoch, whereas validation losses are computed after the epoch is complete, which causes the validation loss to occasionally appear lower.
Following Yang et al. yang2019, the accuracy of the model was evaluated for each instance using the mean absolute strain error (MASE) over all voxels:
presents the MASE distribution across each microstructure/strain pair in the test set. Note that the MASE is an aggregate measure which measures RVE-wide error; the pointwise error variation within a microstructure is explored below. The mean and standard deviation of the MASE distribution are collected in Table1 for the RLN-type models. For comparison we also present results from a recent study yang2019
using a feed-forward deep learning (DL) model to predict the strain fields; note that the DL model was trained and tested on its own dataset prior to this effort.
|Model||MASE (mean std. dev.)|
|Comparison DL model yang2019||3.07%1.22%|
|Comparison DL model||5.71%2.46%|
It is important to note that the DL model had a fundamentally different architecture: it was designed to work on voxel structures (whereas ours was tested on ), and it predicted the strain one voxel at a time (whereas ours predicts strain across the entire microstructure simultaneously). The dataset for the DL model employed large amounts of data augmentation using circular permutations, and contained significantly more input-output pairs. Finally, the DL model employed linear layers (to collapse to a single output); the size of these layers implies that the DL model used substantially more network weights than the RLN. As a result, it is difficult to compare the DL dataset and results with ours. We emphasize that the DL model represents a strong comparison model for ML elastic localization, but that objective ranking of relative performance is difficult. Nevertheless, the RLN architecture is able to produce significantly more accurate strain field estimates on the RLN dataset than the DL architecture produces on the DL dataset. Keeping these caveats in mind, the following analysis explores the difference between RLN configurations.
Looking at the aggregate statistics, the FLN performs worst of all models analyzed; it is vastly outperformed by the RLN-t even though they have the same number of parameters. This is also reflected in the learning curve in Figure 5
: the RLN-t trained faster than the FLN and converged to a more accurate model simply by applying the proximal operator repeatedly across multiple iterations. Intriguingly, the FLN results are somewhat worse than the DL results. This implies that our simple network topology and relative lack of hyperparameter tuning produced a less-powerful model than the DL. However, the RLN-t did much better, producing less error on average (and significantly less variance) than the DL control for both contrast ratios. The disparity in performance indicates that dataset and network topology alone cannot explain the improved MASE relative to the DL model – the iterative methodology produces a more powerful model.
The full RLN is the most accurate model, producing roughly half as much error and variability as the DL model for both contrast ratios. The improvement over the RLN-t has a number of possible origins: the RLN uses a factor of more parameters, and in turn it uses a different operator at each iteration. We note that after training, the full RLN increases the memory overhead but not the prediction time: up to GPU memory details, each iteration has the same computational costs regardless of weight tying. Excluding data I/O overhead, the RLN-t and the full RLN take only seconds to predict strain fields for the entire testing set (8,192 microstructures), or roughly 11 milliseconds per microstructure (c.f. 5 seconds per microstructure for FEA).
Figure 7 presents the worst (by voxel) test-set slice for the RLN model compared to the FEA-generated fields. The RLN appears to perform poorest near extreme strain peaks, especially in a 3-voxel wide cube around these peaks. This is caused by two issues. First, the microstructure is effectively undersampled in these areas. Referring back to Figure 4, one sees that many microstructures have spatial features that are only one or two voxels wide. Furthermore, the output of the RLN is effectively ‘smeared’ near these peaks due to the usage of a 3x3x3 filter. A deeper network, or one using multiple filter sizes, might be able to better capture these features.
Finally, we explore how the predicted strain field evolves across iterations, as well as its difference . This is presented in Figure 8 for the FLN, Figure 9 for the RLN-t, and Figure 10 for the full RLN. The greyscale coloring is the same scale as Figure 7 and has its maximum at the true strain response’s maximum. The redscale coloring represents a negative strain update ( decreasing between iterations).
The FLN appears to handle low-strain areas fairly well, but misses the strain peaks in the center. Note that it does not apply the same operation as the first iteration of the RLN-t; the FLN is trained to make the best possible answer within a single jump. In comparison, the RLN-t does a better job of capturing strain peaks, likely since it can build them up over several iterations. What is less clear is that it fails to capture the central strain peak as well as the full RLN – the differences between the two are evidently rather fine-scale, so we focus our analysis on the full RLN.
After the first iteration the RLN has picked up most of the relative troughs and peaks, and finer tweaks are handled by the later iterations. For example, most of the high-strain regions (greyscale areas) appear to be captured in the first two iterations, whereas low-strain regions (redscale areas) continue to be refined in later iterations.
This is due to the RLN’s ability to learn different, and nonlinear, operators at each iteration – the MSE loss function tends to magnify high-magnitude errors, even if they are very localized (i.e., strain peaks and troughs). The model is therefore encouraged to estimate these outlier areas first (corresponding tohaving much more magnitude). Of course, this puts a lot of weight on the first proximal network to capture the strain outliers correctly. One possible solution would be to adjust the loss function weighting so that early iterations are encouraged to converge gradually towards a good solution, rather than quickly towards a decent one. In other words, by allowing earlier iterations to mispredict strain peaks somewhat, the model may be more likely to escape local minima and actually obtain higher final-iteration accuracy.
This approach differs from traditional finite-element based approaches in that it seeks a solution by learning Green’s functions (and derivatives thereof) of the governing equation, rather than solving the governing equation numerically. This provides an advantage in prediction speed by requiring a rather costly (but one-time) training overhead. The computational complexity of convolving a 3D field containing voxels with a 3D stencil containing voxels is (since each voxel in the first field must be multiplied by each voxel in the stencil). For a fixed network topology,
is a constant; therefore the prediction runtime will increase linearly with the number of microstructure voxels. A spectral method using the Fast Fourier Transform will cost at least, and any numerical methods (finite element or otherwise) employing linear solvers will likely be even more expensive. We reiterate that using GPU parallelization, the RLN requires on average 11 milliseconds to predict the strain field for a given microstructure, compared to several seconds for the finite element solver. This makes it very valuable for inverse problems, where the localization problem must be solved thousands or even millions of times in order to solve a higher-level metaproblem chen2020. Once trained for a specific governing equation (e.g. linear elasticity) and set of material properties (e.g., contrast ratio), the RLN methodology can be applied to any voxelized microstructure. Although we only tested a structure, in principle a model could be trained on one structure size and used on another (possibly with reduced accuracy); this is the subject of ongoing work. Note that the model must be trained anew to predict strains for a different contrast ratio.
In this paper, we describe a new learning-based methodology for addressing Lippmann-Schwinger type physics problems. Embedding recurrent CNNs into a learned optimization procedure provides for a flexible, but interpretable, ML model. The design of proximal networks is informed by problem-specific domain knowledge; that knowledge also provides a physical interpretation of what the model is learning. Furthermore, the partitioned and convolutional structure of this approach acts as a regularizer by enforcing underlying physical properties such as mean field values and spatial invariance. The iterative scheme allows for emulation of a deeper network, vastly increasing model robustness without increasing the parameterization space. If space allows, using a different network for each iteration further improves the model’s expressiveness and accuracy.
When applied to the elasticity localization problem and using our dataset, our model produced much more accurate and interpretable results than previous deep learning models produced on similar datasets, while being much faster than analytical approaches. The CNN architecture used here was designed for simplicity, but could be improved with more advanced techniques such as inception modulesszegedy_2014inception, perceptual losses johnson2016_perceptuallosses, Fourier layers li2020fourier, or variational layers shridhar2019. Moreover, many hyperparameters, such as number of iterations and loss function weighting, can be tuned on a problem-specific basis.
This work was supported by NSF Graduate Research Fellowship DGE-1650044, and SK acknowledges support from NSF 2027105. We used the Hive cluster supported by NSF 1828187 and managed by PACE at Georgia Institute of Technology, USA. The authors thank Anne Hanna, Andreas Robertson, Nic Olsen, and Mady Veith for insightful discussions that helped shape this work.
7 Data and Software Availability
The raw and processed data required to reproduce these findings are available to download from [dataset] https://www.dropbox.com/sh/pma0npf1wr86n9i/AADFc7xWNOe6WilrJQbSHC8Va dropbox_data
. The implementation and training code for each RLN configuration, as well as trained models, are available under an open source license on Github (see Ref.github_code).