In the past decade, multi-scale simulation methods have demonstrated significant advantages for computational mechanics due to their ability to consider microscopic heterogeneities inside a material. The macroscopic material properties are strongly affected by the morphology and evolution of the microstructures, especially under extreme events with large material deformations and geometric nonlinearities. While direct simulations for large-scale heterogeneous structures are extremely expensive and uncommon in industrial applications, the representative volume element (RVE) techniques based on homogenization theory hill1965self ; feyel2000a ; geers2010multi are a type of hierarchical multi-scale simulation methods offering the numerical constitutive closure relationship at the macroscopic point.
, and fast Fourier transform (FFT)-based methodsmoulinec1998a ; de2017finite , offer high accuracy at the expense of high computational costs. A myriad of model reduction techniques have been introduced for predicting the effective mechanical properties in a manner that balances computational cost and accuracy. Analytical micromechanics methods eshelby1957determination ; hashin1963variational ; hill1965self ; mura1987micromechanics ; liu2015statistical ; liu2016extended can be regarded as one type of reduced-order models with high efficiency. However, due to a loss of detailed physics in the microscale, they normally lose accuracy or require extensive model calibrations when irregular complex morphologies, nonlinear history-dependent properties or large deformations are presented. For heterogeneous hyperelastic materials, manifold-learning methods like isomap are used for nonlinear dimensionality reduction of microscopic strain fields bhattacharjee2016nonlinear . The model reduction of history-dependent plastic materials can be more complex and challenging. Two examples are non-uniform transformation field analysis (NTFA) michel2003a ; michel2016model and variants of the principle component analysis jolliffe2002a or proper orthogonal decomposition (POD) yvonnet2007a ; kerfriden2013a ; oliver2017reduced . However, they usually require extensive a prioriliu2016self ; liu2018microstructural has demonstrated a powerful trade-off between accuracy and efficiency in predicting small-strain elasto-plastic behavior though clustering techniques, and it only requires linear elastic simulations in the offline stage.
Meanwhile, current advanced machine learning models (e.g. artificial neural networks and deep learning) have achieved great successes in broad areas of computer engineering, such as computer vision, gaming, and natural language processinghinton2012deep ; lecun2015deep ; Goodfellow-et-al-2016 ; silver2017mastering
. Although these techniques are able to construct models for complex input-output relations, their applications to mechanics of materials are still limited. Methods like tensor product approximation and neural networks have been employed to directly construct the overall strain energy density surface of the RVEle2015computational ; yvonnet2013computational ; bessa2017 ; ibanez2017data . Artificial neural networks have also been used to approximate constitutive behavior by fitting the stress-strain relation directly ghaboussi1991knowledge ; unger2008coupling ; wang2018multiscale . However, these techniques are usually problem-dependent and may suffer from ”the danger of extrapolation” beyond the original sampling space, e.g. different material laws and loading paths. Issues like material history dependency, physical invariance and conservation laws are not naturally resolved, mainly due to the loss of physics in the current machine learning models.
The paper introduces a novel multiscale material modeling method called deep material network, which represents the DNS model of RVE by a hierarchical topological structure with mechanistic building blocks, and is able to predict nonlinear material behaviors both accurately and efficiently. In the proposed material network, the above-mentioned limitations of various reduced order methods are addressed simultaneously by meeting three fundamental goals : 1) avoiding an extensive offline sampling stage (e.g. POD, NTFA and generic neural network) and only requiring linear elastic RVE analysis; 2) eliminating the need for extra calibration of the constitutive laws (e.g. NTFA, methods based on isomap) or micro-mechanical assumption of homogenization (e.g. micromechanics methods, self-consistent scheme in SCA); 3) discovering an efficient reduced representation of RVE without the loss of physics and danger of extrapolation (e.g. generic neural network). Due to its intrinsic hierarchy structure, its computational time is proportional to the number of degrees of freedom in the system. After one time offline training, the optimized material network creates a microstructural database of the RVE by virtue of its unique capability of extrapolation to unknown material and loading spaces, which is useful for multiscale concurrent simulation and material design.
In Section 2, the theory of material network, including the building block and network architect, are explained for RVEs under 2-dimensional (2D) plane strain condition. In Section 3, machine learning approaches, such as stochastic gradient descent (SGD) with backpropagation algorithm and model compression algorithms, are developed for training the material network. Extrapolations to general RVE problems with material and geometric nonlinearities are discussed in Section 4. Finally, applications to several challenging problems are addressed in Section 5, including linear elasticity with high contrast of phase properties, nonlinear history-dependent plasticity and finite-strain hyperelasticity under large deformations. Concluding remarks are given in Section 6.
2 Theory of material network
The basic concept of material network is to use a collection of connected simple building blocks to describe complex RVE responses, similar to the one of artificial neural network where neurons are connected to define an arbitrary function. The building block is chosen to be a simple structure with analytical homogenization solutions, and the architect of the material network represents the path of the homogenization process from each individual phase to the overall macroscopic material. With advanced mechanistic machine learning approaches introduced in Section3, the trained material network provides a possibility to obtain a simplified representation of the RVE with heterogeneous microstructures.
2.1 The physically based building block
The theory of material network is first developed for a two-phase linearly elastic building block in 2-dimensional (2D) plane strain condition. Under small-strain assumption, the strain and stress measures are the infinitesimal strain and the Cauchy stress , which are related by the fourth-order compliance tensor . The overall stress-strain relation of the building block can be expressed as
For materials 1 and 2, we have
where and are the volume fractions of materials 1 and 2, respectively. In Mandel notation, the stress and strain can be written as
where the subscript 3 denotes the shear direction. The compliance matrix can be written as
Similar to artificial neural networks, we want the building block to be simple and easy for the analysis. Here a two-layer structure is proposed. As shown in Figure 1. there are two operations in the building block: 1) homogenization which gives and 2) rotation of the two-layer structure which gives .
The homogenized compliance matrix before the rotation can be expressed as a function of the microscopic compliance matrix of each constituent and its morphological descriptors, in this case, the volume fractions , with . Mathematically, it can be written as
Analytical homogenized results are available for this simple two-layer structure, derived based on the equilibrium condition
and kinematic constraint
Analytical expressions of the components in the homogenized compliance matrix are
After the homogenization operation, the two-layer structure is rotated and the new compliance matrix is passed to a child node of the next material building block in the upper level. As shown in Fig 1, the rotation angle is denoted as . The matrix R defines the rotation of a second-order tensor through the angle under Mandel notation,
and it satisfies
After rotation, the new compliance matrix can be expressed as
It is equivalent to deriving all the analytical homogenization and rotation functions based on the stiffness tensor C, as we will show in Section 4 and A. However, the reason why we use the compliance tensor D here is that the expressions of the analytical solution based on D are neater for this small-strain 2D building block in plane strain condition.
Note that analytical forms for building blocks with multiple layers could also be derived for multi-phase material, however, the material network would become more complex and more partial derivatives would be evolved in the learning process. For simplicity, this paper will focus on two-phase RVEs, and the two-layer structure, as well as the corresponding binary-tree architects, will be mainly discussed. In practice, the compliance matrix are stored in a vectorized form with 6 independent variables. The data flow for one building block after the vectorization is shown in Fig.2.
Derivatives of and will be used in the stochastic gradient descent with backpropagation algorithm introduced in Section 3.2. Specifically, derivatives of with respect to the rotation angle are
In terms of the derivatives of with respect to the components in , we have
Derivatives of with respect to the volume fraction () are
Moreover, derivatives of with respect to the components in are
Similarly, forms of can be obtained by switching the phase index in Eq. (17).
In Section 5.2.3, we will extend the theory of material network to general finite-strain problems, such as heterogeneous hyperelastic RVEs. Furthermore, the concept of two-layer building block can be applied to 3-dimensional (3D) problems, which will be explored in our future work. It should be noted that the existence of an analytical homogenization function is crucial for optimizing the fitting parameters in the material network using gradient-based methods. More details on the training of material network can be found in Section 3.
2.2 Architects of material network
As shown in Fig. 3, a material network with a binary-tree structure is proposed. The depth of the layer in the tree structure is denoted by , and is the layer index. Layer 0 represents the output layer equivalent to the macroscopic homogenized material, and layer contains the input from each individual microscopic constituent. Layer is regarded as the bottom layer. Since a two-layer structure is considered as the basic homogenization building block, each node has 2 child nodes, and there are nodes at layer for .
To explain the data flows inside the material network, a homogenization building block for the -th node at layer is provided in Fig. 4. Correspondingly, the indices of its two child nodes at layer are and .
Other than the compliance matrix at the each node, a so-called weighting function is added into the data flow, which is used for tracking the volume fractions in each building block. When feed forward, the weights of the two child nodes are summed up and passed to their parent node,
By performing the summation recursively, the weight can be expressed as a summation of weights of its descendant nodes at the bottom layer ,
Therefore, only the weights at bottom layer are independent parameters. The partial derivative of with respect to can be written as
The weights at layerglorot2011deep , which have been widely used in deep learning. For , we have
The activations are defined at the bottom layer and determine all the weighting functions in the network. Therefore, are treated as the fitting parameters. The derivative of the ReLU is
Once a unit is deactivated ( and ), its gradient vanishes and the unit will never be activated again during the training process. This is good for automatically simplifying the material network and increasing the training speed, as long as the learning rate is set appropriately.
Back to the building block in Fig. 4, the volume fraction of the first child nodes in the corresponding two-layer structure can be written as
The derivative of with respect to at the bottom layer is
The compliance matrices and D at the -th node in layer become
By combining Eq. (19), (21), (23) and (26) the homogenized compliance matrix of the RVE can be written as a function of the compliance matrix from each material phase ( and ) and the fitting parameters ( and ),
The number of layers can be regarded as a hyper-parameter of the material network, and there are totally fitting parameters in a material network with depth for a 2D problem. The optimum choice of needed to be studied through numerical experiments. A shallow material network with a small may not be sufficient for capturing the RVE behavior, while a deep network with a large will increase the training time, as well as the computational cost for online extrapolation.
Other than in the bottom layer, it is also possible to directly use the volume fraction of the nodes at each layer as the fitting parameters. However, according to our tests, this would cause the vanishing gradient problem of volume fractions in the deep layers and might also result in unwanted early deactivation of sub-tree structures, whereas, the training process based on
in the bottom layer, it is also possible to directly use the volume fraction of the nodes at each layer as the fitting parameters. However, according to our tests, this would cause the vanishing gradient problem of volume fractions in the deep layers and might also result in unwanted early deactivation of sub-tree structures, whereas, the training process based onis more continuous and stable.
3 Machine learning of deep material network
3.1 Cost function and dataset for training
The procedure for training the material network is illustrated in Fig. 5. The dataset for training is generated through high-fidelity DNS of an RVE, or experiments (this can be challenging due to limited experimental tests). The inputs, outputs and fitting parameters for this training/optimization problem are listed below, with the number of variables given in the parentheses,
Fitting parameters ():
When generating the input data, the components in and cannot be assigned arbitrarily. For a physical material, situations resulting in negative strain energy density should be avoided. For the ease of data generation, the two material phases are considered to be orthotropical elastic,
To remove the redundancy due to the scaling effect, we have
The other inputs are selected randomly as
stands for uniform distribution. Poisson ratios are selected to guarantee the compliance matrices are positive definite for the strain energy density to be positive,
Design of experiments based on the Latin hypercube is performed to generate the input space.
To quantify how well we are training the network, a cost function based on the mean square errors (MSE) is proposed:
Here, and are the parameters to be fitted, is the index of the sample (or data point), and is the total number of training samples. Note that the cost function is normalized by the squared norm of to remove the scaling effect. The operator denotes the matrix norm. For a general second-order matrix B of dimension , the matrix norm is defined as
and it is also called the Frobenius norm of matrix B. Since the trace of a matrix product is invariant under cyclic permutation, it can be proved that the norm of a compliance matrix under Mandel notation is invariant under arbitrary rotation. By using Eq. (11) and the symmetry of the compliance matrix, we have
This guarantees that the cost function defined in Eq. (30), as well as the optimized fitting parameters, is independent of the choice of the coordinate system.
To constrain the magnitude of activations and make the optimization problem well-posed, an additional term is appended to the cost function ,
where is defined as
The hyper-parameter determines the magnitude of . Although the hyper-parameter will not alter the optimum fitting parameters for a given training dataset, in practice, it should be set appropriately to expedite the gradient descent algorithm in the training process.
3.2 Stochastic gradient descent and backpropagation algorithm
To minimize the cost function, the stochastic gradient descent (SGD) with backpropagation algorithm will be adopted. The gradient vector of the cost function can be written as
Due to the simplicity of , we will only focus on deriving the gradients of the cost function in this section.
In the context of training the neural networks, backpropagation is commonly used to adjust the weight of neurons by calculating the gradient of the cost function. Sometime, this technique is also called backward propagation of errors, because the error is calculated at the output and distributed backward through the network layers. Similar technique can be utilized to train the material network. The heart of backpropagation is the chain rule in calculus, by which the time for computing those partial differentiates can be greatly reduced and less memory is needed. The derivation and implementations of backpropagation for training the material network will be discussed below.
As shown in Fig. 2, each independent component of compliance matrix is regarded as a neuron in the network. After the vectorization, the components in the compliance matrices D for all the nodes at layer are denoted by , with the index ranging from to . Meanwhile, we use to represent the components in for all the nodes at layer . In the algorithm, we define the error of neuron in layer by
and the error by
In the output layer, the components of are given by
Since takes a quadratic form in Eq. (30), can be easily computed. For , the error and can be computed by
By combining Eq. (38), (39) and (40), we can compute the error and at any layer in the network. Within one step, the error of cost function propagates backward from the output layer to the bottom layer . After the backpropagation of error for compliance matrices, we can compute the rate of change of the cost function with respect to a rotation angle at layer by
where can be found in Eq. (13). Meanwhile, the rate of change of the cost function with respect to an activation in the bottom layer can be computed by
Finally, we conclude the backpropagation algorithm for finding gradients of the cost function with respect to the fitting parameters in the material network.
Input: Set the initial values for and
Feedforward: For each layer , compute and
Output error : Compute the vector using Eq. (38)
In a standard gradient descent method, the gradients of each sample are computed and averaged to get based on Eq. (30),
To accelerate the training speed, the stochastic gradient descent (SGD) is used to train the material network after the gradient vector is obtained. Instead of computing and averaging the gradients over all the samples at each step, a small number
of samples are randomly picked to estimate the gradient. In this way, the original dataset is divided into several mini-batches which will be used in a sequence of learning steps. If we label the samples in a mini-batch by , the gradient of cost function for the corresponding learning step can be approximated by
Writing out the gradient decent updating rule in terms of the components, we have
is a positive parameter known as the learning rate. Here, an epoch is defined as each time the algorithm has processed all the mini-batches and seen all the samples in the original dataset. In practice, after an epoch is completed, the dataset will be randomly shuffled and prepared for the next epoch to minimize the sample bias.
All fitting parameters are initialized randomly following a uniform distribution at the beginning of the SGD algorithm,
In order to reduce the influence of the constraint term at early training stage, the hyper-parameter is chosen to be
This will help to avoid unwanted early deactivation of the material network. Additionally, the learning rate is an important hyper-parameter that determines the performance of SGD. In the paper, we have utilized a Bold driver algorithm to dynamically adapting the learning rate, and it has effectively improved the training speed in our numerical study. In future, more sophisticated algorithms, like annealing, can be used to tune the learning rate.
3.3 Model compression and parameter reduction
The speed and convergence rate of the network training process can be improved by removing the redundancy in the network, which is called model compression. On the other hand, a less complex network is always beneficial for the extrapolation process in the online stage. As discussed in Section 2.2, the rectified linear unit enables automatical deactivation of nodes during the training; hence it already has a function of network compression. Moreover, two additional approaches are also introduced for further model compression: 1)deletion of the parent node with only one child node; 2) subtree merging based on the similarity search. Illustrations of these two model compression operations are shown in Fig. 6.
In Fig. 6 (a), node 2 is deleted as its volume fraction is equal to 1. Meanwhile, the rotation angle of node 1 needs to be updated to avoid a sudden jump in the cost function,
As shown in Fig. 6 (b), merging of the subtree structures is based on the similarity search. The comparison of the two subtrees, and , are performed between all descendant layers of their root nodes and . The differences are evaluated by
where the index denotes the order of nodes in the subtree. If the differences of and are both below the tolerances, it is said that
and will be merged left to compress the network and reduce the number of fitting parameters. Afterwards, the parameters in the new subtree become
Note that a deletion operation will follow right after the merging operation since becomes 1 upon the merging of the two subtrees in Fig. 6 (b).
The merging operation should be performed on an ordered material network. At any layer of an order material network, the following condition should be satisfied,
Each time before the similarity search and subtree merging, the whole material network will be reordered based on the weighting functions. To save the training time, the network compression operations are performed every 10 epochs in our study.
4 Prediction and extrapolation
4.1 Nonlinear small-strain plasticity
Since the material network can capture the essential topological structure of an RVE, it can be used for online prediction of nonlinear plasticity, more than just linear elasticity. Each node at the bottom layer can be regarded as an individual material node with independent degrees of freedom (DOFs), precisely, the infinitesimal strain . As a result, the total number of DOFs in the material network at the beginning of training is proportional to
. On the other hand, the ReLU activation function and several compression algorithms introduced in Section3.3 are used to decrease the model complexity during the training. For the final trained network, the number of DOFs becomes proportional to the number of active/remaining nodes at the bottom layer, denoted by ,
In the bottom layer, the Cauchy stress at the -th node is denoted by . At each loading step, its stress-strain relation can be written as
The compliance matrix and incremental stress are given by the local constitutive law,
where is the vector of history-dependent internal variables. Once the compliance matrix and incremental stress are obtained, we can compute the residual strain in the bottom layer by
Note that the residual strain comes from the material non-linearity. In other words, if all the materials are linear elastic, there will be no residual strain and the homogenization procedure will be same as the one described for the training process.
In a network with material non-linearity, both of the compliance matrix D and residual strain are feed forward from the bottom layer to the output layer . In Fig. 4, the stress-strain relations for the -th and -th nodes on layer at each loading step can be written as
and the stress-strain relation of their parent node at layer is
In the forward homogenization process, the compliance matrix and residual strain at the parent node are calculated from the ones in its child nodes,