A Deep Material Network for Multiscale Topology Learning and Accelerated Nonlinear Modeling of Heterogeneous Materials

by   Zeliang Liu, et al.

The discovery of efficient and accurate descriptions for the macroscopic constitutive behavior of heterogeneous materials with complex microstructure remains an outstanding challenge in mechanics. The difficulty of finding the macroscopic responses becomes apparent when material or geometric nonlinearities (e.g. irreversible plasticity, large deformations) are considered. In this paper, a new data-driven multiscale material modeling method, which we refer to as deep material network, is developed based on mechanistic homogenization theory of representative volume element (RVE) and advanced machine learning techniques. Inspired by the basic concept in artificial neural networks, we use a collection of connected simple building blocks with analytical homogenization solutions to describe complex overall material responses. With linear elastic RVE data from offline direct numerical simulations (DNS), the material network can be effectively trained using stochastic gradient descent with backpropagation algorithm, enhanced by various model compression methods. Furthermore, extrapolations of the trained network to a wide range of problems are also validated through numerical experiments, including linear elasticity with high contrast of phase properties, nonlinear history-dependent plasticity and finite-strain hyperelasticity under large deformations. By discovering a proper topological representation of RVE with fewer degrees of freedom, this intelligent material model is believed to open new possibilities of high-fidelity efficient concurrent simulations for a large-scale heterogeneous structure. It also provides a mechanistic understanding of structure-property relations across material length scales and enables the development of parametrized microstructural database for material design and manufacturing.


page 16

page 20

page 25

page 26


Exploring the 3D architectures of deep material network in data-driven multiscale mechanics

This paper extends the deep material network (DMN) proposed by Liu et al...

Intelligent multiscale simulation based on process-guided composite database

In the paper, we present an integrated data-driven modeling framework ba...

An Adaptive Quasi-Continuum Approach for Modeling Fracture in Networked Materials: Application to Modeling of Polymer Networks

Materials with network-like microstructure, including polymers, are the ...

A Manifold Learning Approach to Accelerate Phase Field Fracture Simulations in the Representative Volume Element

The multiscale simulation of heterogeneous materials is a popular and im...

Learning Deep Implicit Fourier Neural Operators (IFNOs) with Applications to Heterogeneous Material Modeling

Constitutive modeling based on continuum mechanics theory has been a cla...

Lossless Multi-Scale Constitutive Elastic Relations with Artificial Intelligence

The elastic properties of materials derive from their electronic and ato...

1 Introduction

In the past decade, multi-scale simulation methods have demonstrated significant advantages for computational mechanics due to their ability to consider microscopic heterogeneities inside a material. The macroscopic material properties are strongly affected by the morphology and evolution of the microstructures, especially under extreme events with large material deformations and geometric nonlinearities. While direct simulations for large-scale heterogeneous structures are extremely expensive and uncommon in industrial applications, the representative volume element (RVE) techniques based on homogenization theory hill1965self ; feyel2000a ; geers2010multi are a type of hierarchical multi-scale simulation methods offering the numerical constitutive closure relationship at the macroscopic point.

In terms of RVE analysis and homogenization, direct numerical simulations (DNS), such as finite element method (FEM) feyel2000a ; belytschko2008a , meshfree methods wu2012three ; wu2016immersed

, and fast Fourier transform (FFT)-based methods

moulinec1998a ; de2017finite , offer high accuracy at the expense of high computational costs. A myriad of model reduction techniques have been introduced for predicting the effective mechanical properties in a manner that balances computational cost and accuracy. Analytical micromechanics methods eshelby1957determination ; hashin1963variational ; hill1965self ; mura1987micromechanics ; liu2015statistical ; liu2016extended can be regarded as one type of reduced-order models with high efficiency. However, due to a loss of detailed physics in the microscale, they normally lose accuracy or require extensive model calibrations when irregular complex morphologies, nonlinear history-dependent properties or large deformations are presented. For heterogeneous hyperelastic materials, manifold-learning methods like isomap are used for nonlinear dimensionality reduction of microscopic strain fields bhattacharjee2016nonlinear . The model reduction of history-dependent plastic materials can be more complex and challenging. Two examples are non-uniform transformation field analysis (NTFA) michel2003a ; michel2016model and variants of the principle component analysis jolliffe2002a or proper orthogonal decomposition (POD) yvonnet2007a ; kerfriden2013a ; oliver2017reduced . However, they usually require extensive a priori

simulations for interpolating nonlinear responses, and their extrapolation capability for new material inputs is usually limited, Recently, the self-consistent clustering analysis (SCA)

liu2016self ; liu2018microstructural has demonstrated a powerful trade-off between accuracy and efficiency in predicting small-strain elasto-plastic behavior though clustering techniques, and it only requires linear elastic simulations in the offline stage.

Meanwhile, current advanced machine learning models (e.g. artificial neural networks and deep learning) have achieved great successes in broad areas of computer engineering, such as computer vision, gaming, and natural language processing

hinton2012deep ; lecun2015deep ; Goodfellow-et-al-2016 ; silver2017mastering

. Although these techniques are able to construct models for complex input-output relations, their applications to mechanics of materials are still limited. Methods like tensor product approximation and neural networks have been employed to directly construct the overall strain energy density surface of the RVE

le2015computational ; yvonnet2013computational ; bessa2017 ; ibanez2017data . Artificial neural networks have also been used to approximate constitutive behavior by fitting the stress-strain relation directly ghaboussi1991knowledge ; unger2008coupling ; wang2018multiscale . However, these techniques are usually problem-dependent and may suffer from ”the danger of extrapolation” beyond the original sampling space, e.g. different material laws and loading paths. Issues like material history dependency, physical invariance and conservation laws are not naturally resolved, mainly due to the loss of physics in the current machine learning models.

The paper introduces a novel multiscale material modeling method called deep material network, which represents the DNS model of RVE by a hierarchical topological structure with mechanistic building blocks, and is able to predict nonlinear material behaviors both accurately and efficiently. In the proposed material network, the above-mentioned limitations of various reduced order methods are addressed simultaneously by meeting three fundamental goals : 1) avoiding an extensive offline sampling stage (e.g. POD, NTFA and generic neural network) and only requiring linear elastic RVE analysis; 2) eliminating the need for extra calibration of the constitutive laws (e.g. NTFA, methods based on isomap) or micro-mechanical assumption of homogenization (e.g. micromechanics methods, self-consistent scheme in SCA); 3) discovering an efficient reduced representation of RVE without the loss of physics and danger of extrapolation (e.g. generic neural network). Due to its intrinsic hierarchy structure, its computational time is proportional to the number of degrees of freedom in the system. After one time offline training, the optimized material network creates a microstructural database of the RVE by virtue of its unique capability of extrapolation to unknown material and loading spaces, which is useful for multiscale concurrent simulation and material design.

In Section 2, the theory of material network, including the building block and network architect, are explained for RVEs under 2-dimensional (2D) plane strain condition. In Section 3, machine learning approaches, such as stochastic gradient descent (SGD) with backpropagation algorithm and model compression algorithms, are developed for training the material network. Extrapolations to general RVE problems with material and geometric nonlinearities are discussed in Section 4. Finally, applications to several challenging problems are addressed in Section 5, including linear elasticity with high contrast of phase properties, nonlinear history-dependent plasticity and finite-strain hyperelasticity under large deformations. Concluding remarks are given in Section 6.

2 Theory of material network

The basic concept of material network is to use a collection of connected simple building blocks to describe complex RVE responses, similar to the one of artificial neural network where neurons are connected to define an arbitrary function. The building block is chosen to be a simple structure with analytical homogenization solutions, and the architect of the material network represents the path of the homogenization process from each individual phase to the overall macroscopic material. With advanced mechanistic machine learning approaches introduced in Section

3, the trained material network provides a possibility to obtain a simplified representation of the RVE with heterogeneous microstructures.

2.1 The physically based building block

The theory of material network is first developed for a two-phase linearly elastic building block in 2-dimensional (2D) plane strain condition. Under small-strain assumption, the strain and stress measures are the infinitesimal strain and the Cauchy stress , which are related by the fourth-order compliance tensor . The overall stress-strain relation of the building block can be expressed as


For materials 1 and 2, we have


where and are the volume fractions of materials 1 and 2, respectively. In Mandel notation, the stress and strain can be written as



where the subscript 3 denotes the shear direction. The compliance matrix can be written as

Figure 1: Illustration of the two-layer building block. The compliance matrix after the homogenization operation is , and the one after the rotation operation is denoted by .

Similar to artificial neural networks, we want the building block to be simple and easy for the analysis. Here a two-layer structure is proposed. As shown in Figure 1. there are two operations in the building block: 1) homogenization which gives and 2) rotation of the two-layer structure which gives .

The homogenized compliance matrix before the rotation can be expressed as a function of the microscopic compliance matrix of each constituent and its morphological descriptors, in this case, the volume fractions , with . Mathematically, it can be written as


Analytical homogenized results are available for this simple two-layer structure, derived based on the equilibrium condition


and kinematic constraint


Analytical expressions of the components in the homogenized compliance matrix are



After the homogenization operation, the two-layer structure is rotated and the new compliance matrix is passed to a child node of the next material building block in the upper level. As shown in Fig 1, the rotation angle is denoted as . The matrix R defines the rotation of a second-order tensor through the angle under Mandel notation,


and it satisfies


After rotation, the new compliance matrix can be expressed as


Combining the homogenization and rotation operations in Eq. (5) and (11) yields the completed homogenization function of the two-layer building block,

Remark 1

It is equivalent to deriving all the analytical homogenization and rotation functions based on the stiffness tensor C, as we will show in Section 4 and A. However, the reason why we use the compliance tensor D here is that the expressions of the analytical solution based on D are neater for this small-strain 2D building block in plane strain condition.

Note that analytical forms for building blocks with multiple layers could also be derived for multi-phase material, however, the material network would become more complex and more partial derivatives would be evolved in the learning process. For simplicity, this paper will focus on two-phase RVEs, and the two-layer structure, as well as the corresponding binary-tree architects, will be mainly discussed. In practice, the compliance matrix are stored in a vectorized form with 6 independent variables. The data flow for one building block after the vectorization is shown in Fig.


Figure 2: Detailed data flows of the components in the compliance matrices inside a building block.

Derivatives of and will be used in the stochastic gradient descent with backpropagation algorithm introduced in Section 3.2. Specifically, derivatives of with respect to the rotation angle are




In terms of the derivatives of with respect to the components in , we have


Derivatives of with respect to the volume fraction () are


Moreover, derivatives of with respect to the components in are


Similarly, forms of can be obtained by switching the phase index in Eq. (17).

In Section 5.2.3, we will extend the theory of material network to general finite-strain problems, such as heterogeneous hyperelastic RVEs. Furthermore, the concept of two-layer building block can be applied to 3-dimensional (3D) problems, which will be explored in our future work. It should be noted that the existence of an analytical homogenization function is crucial for optimizing the fitting parameters in the material network using gradient-based methods. More details on the training of material network can be found in Section 3.

2.2 Architects of material network

As shown in Fig. 3, a material network with a binary-tree structure is proposed. The depth of the layer in the tree structure is denoted by , and is the layer index. Layer 0 represents the output layer equivalent to the macroscopic homogenized material, and layer contains the input from each individual microscopic constituent. Layer is regarded as the bottom layer. Since a two-layer structure is considered as the basic homogenization building block, each node has 2 child nodes, and there are nodes at layer for .

Figure 3: Illustration of material network with depth N=3. The nodes in each dashed box form a building block.

To explain the data flows inside the material network, a homogenization building block for the -th node at layer is provided in Fig. 4. Correspondingly, the indices of its two child nodes at layer are and .

Figure 4: Data flow in the building block for the -th node at layer . The volume fraction is defined upon the weights.

Other than the compliance matrix at the each node, a so-called weighting function is added into the data flow, which is used for tracking the volume fractions in each building block. When feed forward, the weights of the two child nodes are summed up and passed to their parent node,


By performing the summation recursively, the weight can be expressed as a summation of weights of its descendant nodes at the bottom layer ,


Therefore, only the weights at bottom layer are independent parameters. The partial derivative of with respect to can be written as


The weights at layer

are activated through the rectified linear unit (ReLU)

glorot2011deep , which have been widely used in deep learning. For , we have


The activations are defined at the bottom layer and determine all the weighting functions in the network. Therefore, are treated as the fitting parameters. The derivative of the ReLU is


Once a unit is deactivated ( and ), its gradient vanishes and the unit will never be activated again during the training process. This is good for automatically simplifying the material network and increasing the training speed, as long as the learning rate is set appropriately.

Back to the building block in Fig. 4, the volume fraction of the first child nodes in the corresponding two-layer structure can be written as


The derivative of with respect to at the bottom layer is


The compliance matrices and D at the -th node in layer become




By combining Eq. (19), (21), (23) and (26) the homogenized compliance matrix of the RVE can be written as a function of the compliance matrix from each material phase ( and ) and the fitting parameters ( and ),


The number of layers can be regarded as a hyper-parameter of the material network, and there are totally fitting parameters in a material network with depth for a 2D problem. The optimum choice of needed to be studied through numerical experiments. A shallow material network with a small may not be sufficient for capturing the RVE behavior, while a deep network with a large will increase the training time, as well as the computational cost for online extrapolation.

Remark 2

Other than

in the bottom layer, it is also possible to directly use the volume fraction of the nodes at each layer as the fitting parameters. However, according to our tests, this would cause the vanishing gradient problem of volume fractions in the deep layers and might also result in unwanted early deactivation of sub-tree structures, whereas, the training process based on

is more continuous and stable.

3 Machine learning of deep material network

3.1 Cost function and dataset for training

The procedure for training the material network is illustrated in Fig. 5. The dataset for training is generated through high-fidelity DNS of an RVE, or experiments (this can be challenging due to limited experimental tests). The inputs, outputs and fitting parameters for this training/optimization problem are listed below, with the number of variables given in the parentheses,

Inputs (12):

Outputs (6):

Fitting parameters ():

Figure 5: Training procedure of material network. The training data may be generated from high-fidelity DNS and experiments.

When generating the input data, the components in and cannot be assigned arbitrarily. For a physical material, situations resulting in negative strain energy density should be avoided. For the ease of data generation, the two material phases are considered to be orthotropical elastic,


To remove the redundancy due to the scaling effect, we have


The other inputs are selected randomly as


stands for uniform distribution. Poisson ratios are selected to guarantee the compliance matrices are positive definite for the strain energy density to be positive,

Design of experiments based on the Latin hypercube is performed to generate the input space.

To quantify how well we are training the network, a cost function based on the mean square errors (MSE) is proposed:


Here, and are the parameters to be fitted, is the index of the sample (or data point), and is the total number of training samples. Note that the cost function is normalized by the squared norm of to remove the scaling effect. The operator denotes the matrix norm. For a general second-order matrix B of dimension , the matrix norm is defined as


and it is also called the Frobenius norm of matrix B. Since the trace of a matrix product is invariant under cyclic permutation, it can be proved that the norm of a compliance matrix under Mandel notation is invariant under arbitrary rotation. By using Eq. (11) and the symmetry of the compliance matrix, we have


This guarantees that the cost function defined in Eq. (30), as well as the optimized fitting parameters, is independent of the choice of the coordinate system.

To constrain the magnitude of activations and make the optimization problem well-posed, an additional term is appended to the cost function ,


where is defined as


The hyper-parameter determines the magnitude of . Although the hyper-parameter will not alter the optimum fitting parameters for a given training dataset, in practice, it should be set appropriately to expedite the gradient descent algorithm in the training process.

3.2 Stochastic gradient descent and backpropagation algorithm

To minimize the cost function, the stochastic gradient descent (SGD) with backpropagation algorithm will be adopted. The gradient vector of the cost function can be written as



Due to the simplicity of , we will only focus on deriving the gradients of the cost function in this section.

In the context of training the neural networks, backpropagation is commonly used to adjust the weight of neurons by calculating the gradient of the cost function. Sometime, this technique is also called backward propagation of errors, because the error is calculated at the output and distributed backward through the network layers. Similar technique can be utilized to train the material network. The heart of backpropagation is the chain rule in calculus, by which the time for computing those partial differentiates can be greatly reduced and less memory is needed. The derivation and implementations of backpropagation for training the material network will be discussed below.

As shown in Fig. 2, each independent component of compliance matrix is regarded as a neuron in the network. After the vectorization, the components in the compliance matrices D for all the nodes at layer are denoted by , with the index ranging from to . Meanwhile, we use to represent the components in for all the nodes at layer . In the algorithm, we define the error of neuron in layer by


and the error by


In the output layer, the components of are given by


Since takes a quadratic form in Eq. (30), can be easily computed. For , the error and can be computed by


All the derivatives have been previously defined. The expressions of and can be found in Eq. (15) and (17), respectively.

By combining Eq. (38), (39) and (40), we can compute the error and at any layer in the network. Within one step, the error of cost function propagates backward from the output layer to the bottom layer . After the backpropagation of error for compliance matrices, we can compute the rate of change of the cost function with respect to a rotation angle at layer by


where can be found in Eq. (13). Meanwhile, the rate of change of the cost function with respect to an activation in the bottom layer can be computed by


where denotes the Hadamard (element-wise) product. Analytical forms of are provided in Eq. (16), and is defined in Eq. (24).

Finally, we conclude the backpropagation algorithm for finding gradients of the cost function with respect to the fitting parameters in the material network.

  1. Input: Set the initial values for and

  2. Feedforward: For each layer , compute and

  3. Output error : Compute the vector using Eq. (38)

  4. Backpropagate the error: For each , compute and iteratively using Eq. (39) and (40)

  5. Output: Compute the gradients of the cost function using Eq. (41) and (42)

In a standard gradient descent method, the gradients of each sample are computed and averaged to get based on Eq. (30),


To accelerate the training speed, the stochastic gradient descent (SGD) is used to train the material network after the gradient vector is obtained. Instead of computing and averaging the gradients over all the samples at each step, a small number

of samples are randomly picked to estimate the gradient

. In this way, the original dataset is divided into several mini-batches which will be used in a sequence of learning steps. If we label the samples in a mini-batch by , the gradient of cost function for the corresponding learning step can be approximated by


Writing out the gradient decent updating rule in terms of the components, we have





is a positive parameter known as the learning rate. Here, an epoch is defined as each time the algorithm has processed all the mini-batches and seen all the samples in the original dataset. In practice, after an epoch is completed, the dataset will be randomly shuffled and prepared for the next epoch to minimize the sample bias.

All fitting parameters are initialized randomly following a uniform distribution at the beginning of the SGD algorithm,


In order to reduce the influence of the constraint term at early training stage, the hyper-parameter is chosen to be


This will help to avoid unwanted early deactivation of the material network. Additionally, the learning rate is an important hyper-parameter that determines the performance of SGD. In the paper, we have utilized a Bold driver algorithm to dynamically adapting the learning rate, and it has effectively improved the training speed in our numerical study. In future, more sophisticated algorithms, like annealing, can be used to tune the learning rate.

3.3 Model compression and parameter reduction

The speed and convergence rate of the network training process can be improved by removing the redundancy in the network, which is called model compression. On the other hand, a less complex network is always beneficial for the extrapolation process in the online stage. As discussed in Section 2.2, the rectified linear unit enables automatical deactivation of nodes during the training; hence it already has a function of network compression. Moreover, two additional approaches are also introduced for further model compression: 1)deletion of the parent node with only one child node; 2) subtree merging based on the similarity search. Illustrations of these two model compression operations are shown in Fig. 6.

Figure 6: Illustrations of (a) deletion of node and (b) subtree merging for network compression. The merging operation is performed after reordering of the network.

In Fig. 6 (a), node 2 is deleted as its volume fraction is equal to 1. Meanwhile, the rotation angle of node 1 needs to be updated to avoid a sudden jump in the cost function,


As shown in Fig. 6 (b), merging of the subtree structures is based on the similarity search. The comparison of the two subtrees, and , are performed between all descendant layers of their root nodes and . The differences are evaluated by


where the index denotes the order of nodes in the subtree. If the differences of and are both below the tolerances, it is said that

and will be merged left to compress the network and reduce the number of fitting parameters. Afterwards, the parameters in the new subtree become


Note that a deletion operation will follow right after the merging operation since becomes 1 upon the merging of the two subtrees in Fig. 6 (b).

The merging operation should be performed on an ordered material network. At any layer of an order material network, the following condition should be satisfied,

Each time before the similarity search and subtree merging, the whole material network will be reordered based on the weighting functions. To save the training time, the network compression operations are performed every 10 epochs in our study.

4 Prediction and extrapolation

4.1 Nonlinear small-strain plasticity

Since the material network can capture the essential topological structure of an RVE, it can be used for online prediction of nonlinear plasticity, more than just linear elasticity. Each node at the bottom layer can be regarded as an individual material node with independent degrees of freedom (DOFs), precisely, the infinitesimal strain . As a result, the total number of DOFs in the material network at the beginning of training is proportional to

. On the other hand, the ReLU activation function and several compression algorithms introduced in Section

3.3 are used to decrease the model complexity during the training. For the final trained network, the number of DOFs becomes proportional to the number of active/remaining nodes at the bottom layer, denoted by ,


In the bottom layer, the Cauchy stress at the -th node is denoted by . At each loading step, its stress-strain relation can be written as


The compliance matrix and incremental stress are given by the local constitutive law,




where is the vector of history-dependent internal variables. Once the compliance matrix and incremental stress are obtained, we can compute the residual strain in the bottom layer by


Note that the residual strain comes from the material non-linearity. In other words, if all the materials are linear elastic, there will be no residual strain and the homogenization procedure will be same as the one described for the training process.

In a network with material non-linearity, both of the compliance matrix D and residual strain are feed forward from the bottom layer to the output layer . In Fig. 4, the stress-strain relations for the -th and -th nodes on layer at each loading step can be written as


and the stress-strain relation of their parent node at layer is


In the forward homogenization process, the compliance matrix and residual strain at the parent node are calculated from the ones in its child nodes,