Traditional constitutive modeling is based on constitutive or material laws to describe the explicit relationship among the measurable material states, e.g., stresses and strains, and internal state variables (ISVs) based on experimental observations, mechanistic hypothesis, and mathematical simplifications. However, limited data and functional form assumptions inevitably introduce errors to the model parameter calibration and model prediction. Moreover, with the pre-defined functions, constitutive laws often lack generality to capture full aspects of material behaviors He et al. (2020a, 2021).
Path-dependent constitutive modeling typically applies models with evolving ISVs in addition to the state space of deformation Coleman and Gurtin (1967); Horstemeyer and Bammann (2010). The ISV constitutive modeling framework has been effectively applied to model various nonlinear solid material behaviors, e.g., elasto-plasticity Kratochvil and Dillon Jr (1969); Simo and Miehe (1992), visco-plasticity Simo (1992), and material damage Perzyna (1986). However, ISVs are often non-measurable, which makes it challenging to define a complete and appropriate set of ISVs for highly nonlinear and complicated materials, e.g., geomechanical materials. Further, the traditional ISV constitutive modeling approach often results in excessive complexities with high computational cost, which is undesirable in practical applications.
In recent years, machine learning (ML) based data-driven approaches have demonstrated successful applications in various engineering problems, such as solving partial differential equationsRaissi et al. (2019); He and Tartakovsky (2021); Karniadakis et al. (2021); Kadeethum et al. (2021), system or parameter identification Brunton et al. (2016); Raissi et al. (2019); Cranmer et al. (2020); Tartakovsky et al. (2020); Haghighat et al. (2021); Kadeethum et al. (2021); He et al. (2022a), data-driven computational mechanics Kirchdoerfer and Ortiz (2017); Ayensa-Jiménez et al. (2018); He and Chen (2020); Eggersmann et al. (2019); He et al. (2020b, 2021); Kanno (2021); Bahmani and Sun (2021), reduced-order modeling Xie et al. (2019); Bai and Peng (2021); Kaneko et al. (2021); Kim et al. (2022); Fries et al. (2022); He et al. (2022b); Kadeethum et al. (2022), material design Bessa et al. (2017); Butler et al. (2018), etc. ML models, such as deep neural networks (DNNs), have emerged as a promising alternative for constitutive modeling due to their strong flexibility and capability in extracting complex features and patterns from data Bock et al. (2019). DNNs have been applied to model a variety of materials, including concrete materials Ghaboussi et al. (1991), hyper-elastic materials Shen et al. (2005), visco-plastic material of steel Furukawa and Yagawa (1998), and homogenized properties of composite structures Lefik et al. (2009). DNN-based constitutive models haven been integrated into finite element solvers to predict path- or rate-dependent materials behaviors Lefik and Schrefler (2003); Hashash et al. (2004); Jung and Ghaboussi (2006); Stoffel et al. (2019); Zhang and Mohr (2020). Recently, physical constraints or principles have been integrated into DNNs for data-driven constitutive modeling, including symmetric positive definiteness Xu et al. (2021), material frame invariance Ling et al. (2016), and thermodynamics Vlassis and Sun (2021); Masi et al. (2021). However, to model path-dependent materials, the DNN-based constitutive models require fully understood and prescribed material’s internal states, which is difficult for materials with highly nonlinear and complicated path-dependent behaviors and limits their applications in practice.
Recurrent neural networks (RNNs) designed for sequence learning have been successfully applied in various domains, such as machine translation and speech recognition, due to their capability of learning history-dependent features that are essential for sequential prediction Lipton et al. (2015); Yu et al. (2019)
. The RNN and gated variants, e.g., the long short-term memory (LSTM)Hochreiter and Schmidhuber (1997)
cells and the gated recurrent units (GRUs)Cho et al. (2014); Chung et al. (2014), have been applied to path-dependent materials modeling Heider et al. (2020), including plastic composites Mozaffar et al. (2019), visco-elasticity Chen (2021), and homogeneous anisotropic hardening Gorji et al. (2020). RNN-based constitutive models have also been applied to accelerate multi-scale simulations with path-dependent characteristics Wang and Sun (2018); Ghavamian and Simone (2019); Wu et al. (2020); Logarzo et al. (2021); Wu and Noels (2022). Recently, Bonatti and Mohr Bonatti and Mohr (2022) proposed a self-consistent RNN for path-dependent materials such that the model predictions converge as the loading increment is decreased. However, these RNN-based data-driven constitutive models may not satisfy the underlying thermodynamics principles of path-dependent materials.
In this study, we propose a thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials, which relies purely on measurable material states. The first thermodynamics principle is integrated into the model architecture whereas the second thermodynamics principle is enforced by a constraint on the network parameters. In the proposed model, an RNN is trained to infer intrinsic ISVs from its hidden (or memory) state that captures essential history-dependent features of data through a sequential input. The RNN describing the evolution of the data-driven machine-learned ISVs follows the thermodynamics second law. In addition, a DNN is trained simultaneously to predict the material energy potential given strain, ISVs, and temperature (for non-isothermal processes). Further, model robustness and accuracy is enhanced by introducing stochasticity to inputs for model training to account for uncertainties of input conditions in testing.
The remainder of this paper is organized as follows. The background of thermodynamics principles is first introduced in Section 2. In Section 3, DNNs and RNNs are introduced and their applications to path-dependent materials modeling are discussed. Section 4 introduces the proposed thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials, where two thermodynamically consistent recurrent neural networks (TCRNNs) are discussed. Finally, in Section 5, the effectiveness and generalization capability of the proposed TCRNN models are examined by modeling an elasto-plastic material and undrained soil under cyclic shear loading. A parametric study is conducted to investigate the effects of the number of RNN steps, the internal state dimension, the model complexity, and the strain increment on the model performance. Concluding remarks and discussions are summarized in Section 6.
2 Thermodynamics Principles
The balance of energy, i.e., the first thermodynamics principle, can be expressed as Silhavy (2013)
where the superposed “.” denotes the material time derivative; is the material density; is the specific internal energy and is its rate;
is the Cauchy stress tensor;is the strain tensor; denotes the rate of mechanical work; is the heat flux; is the specific rate of heat supply. The local form of the second thermodynamics principle expressed in terms of the Clausius-Duhem inequality reads Silhavy (2013)
where denotes the specific entropy and is the positive absolute temperature. Combining the first and second thermodynamic principles yields the dissipation inequality
The left-hand side of Eq. (3) represents the total dissipation rate that can be decomposed into the non-negative mechanical dissipation rate and the non-negative thermal dissipation rate Silhavy (2013); Simo and Hughes (2006):
Considering a constant material density and defining the specific internal energy per unit volume as and the specific entropy per unit volume as , we have and . Therefore, Eq. (4a) can be rewritten as
Denoting the specific Helmholtz free energy as Silhavy (2013) and taking the time derivative gives
For strain-rate independent materials, the Helmholtz free energy can be defined as Silhavy (2013)
where is a collection of internal state variables (ISVs) introduced to characterize the state of path-dependent materials, which can also be interpreted as history variables Eggersmann et al. (2019). However, ISVs are often non-measurable and the identification of the ISVs is often based on empiricism, which is non-trivial for materials with highly complex and nonlinear path-dependent behaviors. Here, a ML-enhanced data-driven approach is proposed to automatically infer the essential ISVs that follow the thermodynamics principles, which will be discussed in Section 4. Differentiation of Eq. (8) gives
The arbitrariness of , , and leads to the following relations
These relations are derived based on the universal thermodynamics principals and considered in the proposed data-driven constitutive models. In Section 4, we will introduce a thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials with the consideration of the thermodynamically consistent relations (Eq. (11)).
3 Black-Box Data-Driven Modeling of Path-Dependent Materials
3.1 Deep Neural Networks (DNNs) Constitutive Models
Deep neural networks (DNNs), as the core of the deep learningGoodfellow et al. (2016), represent complex models that relate data inputs, , to data outputs, . A typical DNN is composed of an input layer, an output layer, and
hidden layers. Each hidden layer transforms the outputs of the previous layer through an affine mapping followed by a nonlinear activation function, which can be written as:
where the superscript denotes the layer the quantities belong to, e.g., is the outputs of layer with neurons. and
are the weight matrix for linear mapping and the bias vector of layer, respectively, where is the input dimension. They are trainable parameters to be optimized through training. For a fully-connected layer, the number of trainable parameters is .
. Note that the choice of the activation of the output layer depends on the type of ML tasks. For regression tasks, which is the application of this study, a linear function is used in the output layer where the last hidden layer information is mapped to the output vector, expressed as: , where denotes the DNN approximation of the data output . Fig. 1(a) shows the computational graph of a feed-forward DNN with three input neurons, two hidden layers, and two output neurons.
For path-dependent materials, the current stress response depends on the past stress-strain history. Fig. 1(b) shows the computational graph of a DNN constitutive model for path-dependent materials that takes one history step of stress-strain states and the current strain increment as input and predicts the current stress increment. Alternatively, the current strain increment can be replaced with the current total strain as input and the current total stress as the output, as shown in Fig. 1(c). The effects of the input/output representation will be discussed in the next subsection. These DNN constitutive models can be extended to consider pre-defined ISVs as input in addition to the measurable material states. An additional DNN can be used to model the evolution of the ISVs Masi et al. (2021). However, the dependency on pre-defined ISVs limits its applications, especially when only the measurable material states of path-dependent behaviors are available, e.g., the soil example to be demonstrated in Section 5.2.
Note that for DNN constitutive models, the number of history input steps is fixed once the model architecture is determined, which means the number of history input steps used for testing must be the same as that used in training. Furthermore, it is difficult for DNNs with recurrent connections from the output of step to the input of step to capture the essential information about the past history since the outputs are explicitly trained only to match the training set targets not being informed of the past history Goodfellow et al. (2016). These issues are addressed by RNNs introduced in the next subsection.
3.2 Recurrent Neural Networks (RNNs) Constitutive Models
Recurrent neural networks (RNNs) designed for sequence learning have demonstrated successful applications in various domains, such as machine translation and speech recognition, due to their capability of learning history-dependent features that are essential for sequential prediction Lipton et al. (2015); Yu et al. (2019). Fig. 2(a) illustrates the computational graphs of a folded RNN and an unfolded RNN, where is a hidden state that captures essential history-dependent features from past information, which makes RNNs particularly suitable for modeling path-dependent material behaviors. Unfolding of the RNN computational graph results in parameter sharing across the network structure, reducing the number of trainable parameters and thus leading to more efficient training. The length of input/output sequences can be arbitrary, which allows generalization to sequence lengths not appeared in the training set. Each step can be viewed as a state. Despite the history sequence length, the trained RNN model always has the same input size, since it is specified in terms of transition from one state to another rather than in terms of a variable-length history of states Goodfellow et al. (2016). The forward propagation of RNN begins with an initial hidden state and the propagation equations at time step (state) are defined as
where is the hyperbolic tangent function; , and are trainable weight coefficients for input-to-hidden, hidden-to-hidden, and hidden-to-output transformations, respectively; and are trainable bias coefficients. These trainable parameters are shared across all RNN steps. Eqs. (13a) transforms the previous hidden state and the current input to the current hidden state , while (13b) transforms the current hidden state to the current output . The history information is captured by the hidden state of RNN by repeating the transformation in Eq. (13a) for all RNN steps. The hidden state that carries the essential history-dependent information is passed to the final step and informs the final prediction.
Depending on applications, RNNs can have flexible architectures of input and output, such as one-to-one, one-to-many, many-to-one, and many-to-many Goyal et al. (2018). For example, the unfolded RNN shown in Fig. 2
(a) is a many-to-many type of RNN, which can be applied to, for example, name entity recognition. For path-dependent constitutive modeling, the many-to-one type of RNN is more suitable. Fig.2(b) illustrates the computational graph of an RNN constitutive model that takes one history step of stress-strain states and the current strain increment as input and predicts the current stress increment, defined as an incremental RNN, whereas Fig. 2(c) show the computational graph of an RNN constitutive model that takes one history step of stress-strain states and the current total strain as input and predicts the current total stress, defined as a total-form RNN. Unlike the standard RNNs that has the same input size for all time steps (states), the history-step input of the RNN constitutive models shown in Fig. 2(b)-(c) contains both strain and stress components whereas the current-step input contains only strain components. Considering one history step, the forward propagation of a typical total-form RNN is expressed as follows
where , , , and are trainable weight coefficients for hidden-to-hidden, strain-to-hidden, stress-to-hidden, and hidden-to-output transformations, respectively, and and are trainable bias coefficients.
To capture complex history-dependent patterns, deep RNNs are more advantageous Goodfellow et al. (2016). Similar to DNNs, stacking of fully-connected hidden layers can be added to affine input-to-hidden, hidden-to-hidden, and hidden-to-output transformations.
Our studies show that the total-form RNN is less sensitive to the strain increment than the incremental RNN, as a consequence of interpolation outperforming extrapolation. For instance, considering a training dataset with one stress-strain path and a constant loading (strain) increment, the final-step input (the strain increment) of the incremental RNN is constant, whereas the final-step input (the total strain) of the total-form RNN is not constant and covers the whole range of strain in the stress-strain path. During testing on the same stress-strain path but with a different constant loading increment, larger or smaller than that of the training data, the incremental RNN becomes inaccurate since the final-step input (the strain increment) of the testing data is beyond the range of the training strain increment and the prediction is an extrapolation. In contrast, the total-form RNN remains accurate because the final-step input (the total strain) of the testing data is within the range of training total strain and the prediction is an interpolation. Therefore, the proposed data-driven models in this study are built upon the total form.
3.2.1 Gated Recurrent Units (GRUs)
Standard RNNs suffer from short-term memory due to vanishing and exploding gradient issues that arise from recurrent connections Bengio et al. (1993, 1994); Goodfellow et al. (2016). More effective RNNs for learning long-term dependencies have been developed, including the long short-term memory (LSTM) Hochreiter and Schmidhuber (1997) cells and gated recurrent units (GRUs) Cho et al. (2014); Chung et al. (2014). A typical GRU consists of a reset gate that removes irrelevant past information, an update gate that controls the amount of past information passing to the next step, and a candidate hidden state Chung et al. (2014). Compared to LSTM, the GRU has fewer parameters as it lacks an output gate. Considering one history step, the forward propagation of a typical GRU is expressed as follows
where denotes the Hadamard (element-wise) product;
is the sigmoid function;is the hyperbolic tangent function; , , , , , , and are trainable weight coefficients; , , , , and are trainable bias coefficients. Eq. (15d) calculates the current hidden state by a linear interpolation between the previous hidden state and the candidate hidden state , based on the update gate . The RNN-based constitutive models proposed in this study are applicable to all types of RNNs for complicated path-dependent material behaviors with long-term dependent features.
3.2.2 Model Training
)) is inherently sequential, i.e., each time step can only be computed after the previous one, the computation of the gradient of the loss function with respect to the trainable parameters cannot be parallelized and it needs to follow the reverse unfolded computational graph. The back-propagation through time algorithm is applied to RNNsGoodfellow et al. (2016).
During training, the model receives the ground truth stress data of history steps, which is a teacher forcing procedure emerging from the maximum likelihood criterion Goodfellow et al. (2016). However, the disadvantage of teacher forcing training arises when the trained model is applied in an open-loop test mode with the network’s previous outputs fed back as input for future predictions. The computational graphs of the test mode are shown in Fig. 3 corresponding to the RNN models (in the train mode) shown in Fig. 2(b)-(c). In this case, the inputs the trained model receives could be quite different from those received during training, forcing the model to perform extrapolative predictions and thus lead to large errors. Furthermore, such prediction errors could occur at the very first prediction, accumulate and propagate quickly, and contaminate the subsequent predictions. To mitigate the issue of error propagation due to the teacher forcing training and enhance model accuracy and robustness, we introduce stochasticity to the training set by adding random perturbations to the ground truth stress data. In this way, the network can also learn the variability of the input conditions resembling those in the test mode.
4 Thermodynamically Consistent Machine-Learned Internal State Variable Approach for Path-Dependent Materials
4.1 Thermodynamically Consistent Recurrent Neural Networks (TCRNNs)
To ensure thermodynamical consistency in the data-driven path-dependent materials modeling, thermodynamics principals introduced in Section 2 are embedded into RNNs to extract the hidden ISVs. The proposed TCRNN consists of an RNN and a DNN. The computational graphs for non-isothermal or isothermal processes are illustrated in Fig. 4.
Since the hidden state of RNNs captures essential history-dependent features from past material information, we propose to extract ISVs of materials from the hidden state of an RNN, as shown in Fig. 4. Considering one history step, the RNN-inferred ISV is expressed as follows
Hereafter, a hat symbol is used to denote the predicted quantities. Considering one single layer for input-to-hidden, hidden-to-hidden, and hidden-to-output transformations in RNN, the RNN function consists of the following
where , , , , and are trainable weight coefficients for hidden-to-hidden, strain-to-hidden, stress-to-hidden, temperature-to-hidden, and hidden-to-ISV transformations, respectively; and are trainable bias coefficients; and are activation functions. Note that the trainable parameters are shared across all steps of the RNN, which enhances training efficiency. is the machine-learned ISVs from the hidden state of the RNN and its rate, , can be computed by automatic differentiation Baydin et al. (2018), which will be discussed in the next subsection. Eqs. (17a-b) transform the history and current measurable material states to the current hidden state that carries the essential past information. For an RNN with more than one history step, Eq. (17a) is repeated for all history steps. Eq. (17c) infers the current ISV from the current hidden state .
A linear activation () is used for the transformation from the hidden state to the ISV in Eq. (17c). The selection of the activation requires particular attention due to the issue of second-order vanishing gradients Masi et al. (2021). For effective training via back-propagation, the gradient of the output derivative with respect to the trainable parameters requires non-zero second-order gradients of activation functions. As a result, the activation function SiLU is selected for due to its smoothness and non-zero second-order gradients, as shown in Fig. 5.
Following the extraction of the ISV, a DNN is then used to predict the Helmholtz free energy given the strain, the temperature and the machine-learned ISVs,
where represents a DNN, as discussed in Section 3.1. The activation in the output layer is linear. The output Helmholtz free energy is then used to compute the stress , the dissipation rate , and the entropy according to Eq. (11), which implicitly enforces the first thermodynamics principle. The second thermodynamics principle, i.e., , is enforced in the loss function by constraining the network parameters, which will be discussed in the next subsection. The gradients of the output with respect to the inputs are obtained by automatic differentiation Baydin et al. (2018). Since the output derivatives are involved in the loss function, SiLU is used for the activation of the hidden layers to avoid the issue of second-order vanishing gradients as discussed above.
4.2 Secondary Outputs
To balance feature contributions to the loss function and accelerate the training process, the training dataset is standardized to have zero mean and unit variance. For instance, a featureof the dataset is standardized by its mean ,
Hereafter, a bar symbol is used to denote standardized quantities. From Eq. (19), we have
Therefore, the predicted stress is calculated by
according to Eq. (20). is the second-order identity tensor. is the gradient of the output with respect to the input in Eq. (22) and can be obtained by automatic differentiation Baydin et al. (2018). Similarly, the predicted entropy is computed by
The predicted dissipation rate is computed by
can be obtained by applying the chain rule to Eq. (21)
which requires the rate of the input variables, including , , , , , etc.
The direct calculation of through Eq. (27) can be computationally intractable, especially when the dataset size, the internal state dimension, and the number RNN input steps are large. Alternatively, the rate of the ISVs can be approximated by , where is the increment of the ISVs at the current step . To obtain , alternative TCRNNs are proposed, as shown in Fig. 6, where the last two steps of the RNN infer the ISVs, and , respectively. These TCRNN models enhance training efficiency.
4.3 Model Training
The loss function is expressed as
where , , are regularization parameters. denotes the norm, and is set to be zero if the data of the entropy is unavailable. The training consists of forward propagation and backward propagation. During the forward propagation, the machine-learned ISVs are implicitly embedded in the calculation of the predicted measurable quantities by following the thermodynamics principles. During the backward propagation, the errors of the measurable quantities are back-propagated to update the model’s trainable parameters and refine machine-learned ISVs.
In some cases where the data of the dissipation rate is unavailable, the non-negativity condition, i.e., , can be imposed instead, which is resulted from the thermodynamics second law, Eq. (4a). To this end, the rectified linear unit (ReLU) can be utilized, i.e., ReLU(x) = , which is positive only if is positive. Hence, ReLU is positive only if is negative, which corresponds to violation of the non-negativity condition . Including ReLU to the loss function penalizes the violation of the non-negativity condition and enforces to be satisfied, which imposes a constraint on the network parameters during training. The loss function becomes
Similarly, if the data of the Helmholtz free energy is unavailable, the non-negativity condition, i.e., , can be imposed by adding ReLU to the loss function,
In some cases where prior knowledge of certain ISVs are available, the TCRNN models can be trained in a hybrid mode by leveraging the existing ISVs and simultaneously inferring additional thermodynamically consistent ISVs that are essential to path-dependent behaviors. Considering as the known ISVs and as the corresponding standardized quantity, the loss function becomes
where the last term enables the TCRNN model to learn the existing ISVs. For the TCRNN model to infer additional essential ISVs, the prescribed internal state dimension, , should be greater than the dimension of the existing ISVs, . Note that both existing and machine-learned ISVs are passed to the DNN to predict the Helmholtz free energy (Eq. 22) and the downstream calculations (Eqs. 26-27).
Apart from thermodynamics, the time (or self) consistency condition is critical for convergence of numerical approximation when Xu et al. (2021).
To achieve the time consistency condition, the training set can be augmented by additional samples constituted by zero strain increment and zero stress increment at different material states (time steps), which enables the machine-learned material model to learn the time consistency condition from data. Alternatively, the self-consistency condition can be integrated into the RNN architecture by definition Bonatti and Mohr (2022).
5 Numerical Results
5.1 Modeling Elasto-Plastic Materials
To demonstrate the accuracy, robustness, and generalization performances, the proposed TCRNN is applied to model a material with synthetic data generated by the one-dimensional elasto-plastic material with kinematic hardening. The Helmholtz free energy potential is expressed as
where is the Young’s modulus; is the kinematic hardening parameter; is the total strain; is the plastic strain, which can be considered as a phenomenological ISV. The yield stress is . The stress and the dissipation rate can be obtained by Eq. (11b-c) as follows
The dataset is generated by Eqs. (33)-(34), which contains five samples with the same stress-strain path (two loading-unloading cycles), as show in Fig. 7. The only difference in these samples is the strain increment, ranging from to . The data of the sample with a strain increment of is used to train the TCRNN. The remaining samples are in the testing set to evaluate the trained model.
To address the issue of error propagation in the test mode due to the teacher forcing training, as discussed in Section 3.2.2, and enhance model accuracy and robustness, stochasticity
is introduced to the stress data so that the model learns the uncertainties of the input conditions resembling those in the test mode. Random perturbations are generated from a normal (Gaussian) distribution with a zero mean and a standard deviation of, where is the maximum stress in the data and is a user-defined parameter to control the level of randomness. Fig. 7 shows the original stress-strain data in a black solid line and the randomly perturbed data in red circles. During supervised training, the noiseless stress is the ground truth and the input stress variable is no longer deterministic but rather stochastic, contributed by the random perturbations.
The TCRNN model based on the time rates of ISVs (), as shown in Fig. 4(b), is employed in this example. A GRU is used to infer the ISV and describe its evolution by following the thermodynamics second law. The GRU consists of one hidden layer for all affine transformations in Eq. (15) with the model complexity represented by the dimension of the hidden state, . Since the data of the free energy and the dissipation rate can be obtained by Eq. (34) in this example, the loss function in Eq. (28) is employed with and . A relative error used to measure the prediction accuracy is defined as follows
where and contain the stress data and stress predictions at all time steps of a loading path, respectively.
In the following subsections, the effects of the strain increments on model performance are first investigated. Since the machine-inferred ISVs are critical to the accuracy of path-dependent materials modeling, various factors that can affect the quality of the machine-inferred ISVs are investigated, including the number of RNN steps, the internal state dimension (), and the model complexity (). Further, the generalization capabilities of the TCRNN model are examined.
5.1.1 Effects of Strain Increments
We first investigate how the strain increment affects the prediction accuracy of the TCRNN model. The model has a scalar ISV and a hidden state dimension of 30 (). Fig. 8 compares the predictions with data, where the case with a green color line is used for training and those with blue color lines are used for testing. The trained model achieves 1.1 relative error (Eq. (35)) on the stress prediction for the training case and 1.9 mean relative error for the testing cases. The mean relative error is obtained by averaging the relative errors (Eq. (35)) of all cases. The good agreement between predictions and data for all quantities, including the stress, the Helmholtz free energy, and the dissipation rate demonstrates that the model maintains high prediction accuracy and robustness as the strain increment varies. Fig. 9 shows that the machine-learned ISV is monotonically correlated with the phenomenological ISV, which demonstrates the capability of the TCRNN model in extracting mechanistically and thermodynamically consistent ISV essential to dissipative elasto-plastic material behaviors.
5.1.2 Effects of The Number of RNN Steps
In the second example, TCRNN models with a scalar ISV and various RNN steps (including history and current steps) are examined. Note that the model complexity is independent of the number of RNN steps due to parameter sharing across all RNN steps. For a relatively compact model with , Fig. 10(a) shows that the relative errors of training and testing samples decrease significantly when the number of RNN steps reaches 4, a critical number of RNN steps, which is expected since increasing the number of RNN steps enables the model to extract more accurate path-dependent features from longer-term stress-strain history. As the number RNN steps further increases, the relative errors of training and testing samples remain at a plateau, with around 0.4 and 1.6 relative errors, respectively. The plateau indicates that further increasing the number of RNN steps does not improve the quality of the machine-inferred ISVs and thus the model accuracy, which could be potentially limited by or . For a more complex model with , Fig. 10(b) shows a similar convergence behavior but the critical number of RNN steps increases to 5. This shows that the number of RNN steps play an important role in model accuracy and performance. Unnecessarily increasing the model complexity may lead to an increase in the number of RNN steps for achieving the same level of accuracy.
5.1.3 Effects of Internal State Dimension
The internal state dimension () has a direct impact on the quality of the machine-inferred ISVs and thus the model performance. If is too small, the TCRNN model cannot capture all important thermodynamically consistent path-dependent features even if the number of RNN steps and model complexity are sufficient. In this example, TCRNN models with 5 RNN steps, and various internal state dimensions are examined. Fig. 11(a) shows that as the dimension of the ISV increases from 1 to 5, the relative errors of training and testing samples remain at a plateau, with around 0.5 and 1.5 relative errors, respectively. It indicates that a scalar ISV is sufficient for effectively capturing the path-dependent material behavior in this case. The convergence of the model performance against the internal state dimension is particularly important. In practice, the internal state dimension of path-dependent materials is often unknown a priori. The convergence property shows that the TCRNN model remains accurate and robust even if an excessive internal state dimension is prescribed. This convergence property also allows one to identify the optimal given measurable material states of path-dependent materials.
5.1.4 Effects of Model Complexity
The model complexity represented by the hidden state dimension is another important factor influencing the quality of the machine-inferred ISVs and model performance since the ISVs are inferred from the hidden states that directly capture the essential stress-strain path-dependent features. If the hidden state dimension is too small, important stress-strain path-dependent features could be lost, leading to inaccurate machine-inferred ISVs and poor model performance. In this example, the effects the TCRNN model complexity (hidden state dimension) are investigated. The TCRNN models examined have a scalar ISV, 5 RNN steps, various hidden state dimensions ranging from 5 to 100. Fig. 11(b) shows that as the hidden state dimension increases, the relative errors of training and testing samples decrease and then reach a plateau, with around 0.5 and 1.5 relative errors, respectively. This shows that a compact network is sufficient to achieve a satisfactory accuracy in this example and increasing the model complexity does not improve the model accuracy.
5.1.5 Model Generalization
In the following examples, three variables are considered to evaluate the generalization performances of the TCRNN model, including the loading strain per cycle, the unloading strain per cycle, and the number of (loading-unloading) cycles. The TCRNN model with 15 RNN steps, a scalar ISV, and is employed.
In the first test, we consider a two-dimensional parameter space constituted by the loading strain per cycle and the unloading strain per cycle. The dataset contains 16 cases with the same number of loading-unloading cycles but with different loading and unloading strains per cycle. Fig. 12 shows the comparison between the predicted stress and the data, where case 1, 4, 9, and 16, located at the corners in the figure, are used for training with the data marked with the green solid lines, and the remaining cases are used for testing with the data marked with the blue solid lines. From top to bottom, the loading strain per cycle increases from to . From left to right, the unloading strain per cycle increases from to . The mean relative errors of the training and testing cases are 3.1 and 2.8, respectively. The good agreement between the data and the predictions demonstrates that the trained TCRNN model can successfully predict the testing cases within the prescribed parameter space.
In the second test, we consider a two-dimensional parameter space constituted by the number of loading-unloading cycles and the loading strain per cycle. The dataset contains 16 cases with the same unloading strain per cycle, . Fig. 13 shows the comparison between the predicted stress and the data, where case 1, 4, 9, and 16, located at the corners in the figure, are used for training with the data marked with the green solid lines, and the remaining cases are used for testing with the data marked with the blue solid lines. From top to bottom, the number of loading-unloading cycles increases from 1 to 4. From left to right, the loading strain per cycle increases from to . The mean relative errors of the training and testing cases are 2.3 and 3.4, respectively. The good agreement between the data and the predictions further demonstrates the strong generalization ability of the TCRNN constitutive model.
5.2 Modeling Soil under Cyclic Shear Loading
The effectiveness of the proposed TCRNN constitutive model is further evaluated by modeling undrained soil under cyclic shear loading Bastidas (2016); Ghoraiby et al. (2020). The experimental data is collected from the undrained soil samples under initial triaxial confinement of and cyclic shear loading. A cyclic stress ratio (CSR) is defined as the ratio of the maximum shear stress to the initial vertical stress. The experimental data contains the shear strain, the vertical strain, the shear stress, and the vertical stress. Fig. 14 shows the experimental data with a CSR of 0.15, 0.16, and 0.17. The stress-strain relationships are highly nonlinear and path-dependent due to coupling effects of changes in volume, matric suction, degree of saturation, effective stress, shear modulus, etc. Rong and McCartney (2021). Modeling such path-dependent material behaviors by a phenomenological approach is challenging and complicated, which often relies on certain phenomenological ISVs.
Given only the stress-strain data, data-driven models and phenomenological models that require pre-defined ISVs cannot be applied. In contrast, the proposed TCRNN model can be effectively applied since it only requires measurable material states, and the model is capable of inferring essential ISVs from the measurable material states by following the thermodynamics principles.
The TCRNN based on the increment of ISVs (), as shown in Fig. 6(b), is employed in this example. A GRU is used to infer the ISV in Eq. (21) and describe its evolution by following the thermodynamics second law in Eq. (2). The GRU consists of one hidden layer for all affine transformations in Eq. (15) with the model complexity represented by the hidden state dimension, . Since the training data contains only stresses and strains, the loss function in Eq. (30) is employed with . The experimental data with a CSR of 0.15 and 0.17 are used for training, while the experimental data with a CSR of 0.16 is used for testing. The effects of the number of RNN steps, the internal state dimension, and the model complexity on the model performance are investigated.
Given TCRNN models with an internal state dimension () of 2 and a hidden state dimension () of 30, the number of RNN steps is varied from 5 to 60 and its influences on the model prediction accuracy are shown in Fig. 15(a) As the number of RNN steps increases, the relative errors of training and testing samples decrease and eventually converge to a plateau, with values around 3 and 11, respectively. The plateau indicates that further increasing the number of RNN steps does not improve the model accuracy.
The internal state dimension () required to effectively model the path-dependent material behaviors is unknown a priori, which depends on the complexity of the path-dependent behaviors. Here, we investigate the effects of on model prediction accuracy, which is varied from 1 to 10, while the number of RNN steps and are fixed as 40 and 30, respectively. Fig. 15(b) shows that the relative errors of training and testing samples are large when the machine-inferred ISV is a scalar, indicating that a scalar ISV is insufficient to capture all essential path-dependent features. As increases, the relative errors of training and testing samples decrease and then reach a plateau, with values around 2.7 and 12, respectively, which shows that the TCRNN model remains accurate and robust even if an excessive is prescribed. The convergence behavior also allows one to identify the optimal , around 2 in this example, given the measurable material states of path-dependent materials.
Lastly, and the number of RNN steps are fixed as 5 and 40, respectively, while is varied from 5 to 100 to investigate the effects of model complexity () on model performance. Fig. 15(c) shows that the relative errors of training and testing samples decrease as increases and eventually reach a plateau, with values around 2.8 and 14, respectively. The plateaus in the convergence curves indicate that further increasing the model complexity does not improve the model accuracy.
Fig. 16 compares shear stress experimental data with the predictions of the trained TCRNN model that employs 40 RNN steps, , and . The relative errors of training and testing samples are around 3.2 and 9.4, respectively. It shows that the TCRNN model is able to learn the path-dependent material behaviors from the measurable material states under given loading conditions and effectively predicts the path-dependent responses under untrained loading conditions, further demonstrating the generalization ability and effectiveness of the TCRNN model in practical applications. Further, the trained TCRNN material model is thermodynamically consistent, which is verified by the non-negative predicted free energy and the predicted dissipation rates that satisfy the thermodynamics second law, as shown in the second and the third rows of Fig. 16, respectively. The histories of the machine-learned ISVs are shown in the last row of Fig. 16, revealing interesting path-dependent patterns similar to the behaviors of the predicted free energy and dissipation rate.
In this study, we introduced a machine-learned internal state variable (ISV) approach for data-driven modeling of path-dependent materials, which is thermodynamically consistent and relies purely on the measurable material states. The proposed TCRNN constitutive models consist of two main components: an RNN that infers ISVs (Eq. (21)) and describes their evolution by following the thermodynamics second law (Eq. (2)), and a DNN that predicts the Helmholtz free energy (Eq. (22)) given strain, ISVs, and temperature (for non-isothermal processes). Two TCRNN constitutive models are developed, one based on the time rates of ISVs (), as shown in Fig. 4, and the other one based on the increments of ISVs (), as shown in Fig. 6. The latter model shows an enhanced efficiency as it utilizes an approximation of for the calculation of dissipation rate and avoids time-consuming differentiation of the RNN outputs with respect to all RNN inputs. Model robustness and accuracy is enhanced by introducing stochasticity to the training data to account for uncertainties of input conditions in the testing.
In the demonstration of modeling elasto-plastic materials, the parametric study shows that the model accuracy converges as the number of RNN steps, the internal state dimension, and the model complexity increase. All these factors play an important role in the model performance. Given path-dependent material behaviors, there exists an optimal internal state dimension to capture the essential path-dependent features by the TCRNN model. It has been shown that the TCRNN model remains accurate and robust even if an excessive internal state dimension is prescribed. The monotonic correlation between the machine-inferred and the phenomenological ISV of the elasto-plastic material demonstrates that the TCRNN constitutive model can infer mechanistically and thermodynamically consistent ISVs. The proposed TCRNN constitutive model is shown to remain robust against various strain increments and have strong generalization capabilities.
The effectiveness of the proposed TCRNN constitutive model is further demonstrated by modeling undrained soil under cyclic shear loading using experimental data, where only measurable material states (stresses and strains) are available. A similar convergence behaviors of the model accuracy are observed from a parametric study of the number of RNN steps, the internal state dimension, and the model complexity. The generalization capability of the TCRNN constitutive model is demonstrated by the effective prediction of the thermodynamically consistent response of undrained soil under the loading conditions different from the ones used in training, which reveals the promising potential of the proposed method to model complex path-dependent materials behaviors in real applications.
The proposed TCRNN constitutive model is general and applicable to model a wide range of path-dependent materials. It is efficient and can be applied to accelerate large-scale multi-scale simulations with complex microstructures and path-dependent material systems. To investigate reliability of model predictions, a future extension would be to integrate uncertainty quantification Smith (2013) into the proposed TCRNN model.
The support of this work by the DOE Nuclear Engineering University Program under Award Number DE-NE0008951 to the University of California, San Diego is very much appreciated.
- A new reliability-based data-driven approach for noisy experimental data with physical constraints. Computer Methods in Applied Mechanics and Engineering 328, pp. 752–774. Cited by: §1.
- Manifold embedding data-driven mechanics. arXiv preprint arXiv:2112.09842. Cited by: §1.
- Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators. Advanced Modeling and Simulation in Engineering Sciences 8 (1), pp. 1–24. Cited by: §1.
- Ottawa f-65 sand characterization. University of California, Davis. Cited by: §5.2.
- Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research 18, pp. 1–43. Cited by: §4.1, §4.1, §4.2.
- The problem of learning long-term dependencies in recurrent networks. In IEEE international conference on neural networks, pp. 1183–1188. Cited by: §3.2.1.
- Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5 (2), pp. 157–166. Cited by: §3.2.1.
A framework for data-driven analysis of materials under uncertainty: countering the curse of dimensionality. Computer Methods in Applied Mechanics and Engineering 320, pp. 633–667. Cited by: §1.
- A review of the application of machine learning and data mining approaches in continuum materials mechanics. Frontiers in Materials 6, pp. 110. Cited by: §1.
- On the importance of self-consistency in recurrent neural network models representing elasto-plastic solids. Journal of the Mechanics and Physics of Solids 158, pp. 104697. Cited by: §1, §4.3.
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences 113 (15), pp. 3932–3937. Cited by: §1.
- Machine learning for molecular and materials science. Nature 559 (7715), pp. 547–555. Cited by: §1.
- Recurrent neural networks (rnns) learn the constitutive law of viscoelasticity. Computational Mechanics 67 (3), pp. 1009–1019. Cited by: §1.
On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259. Cited by: §1, §3.2.1.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §1, §3.2.1.
- Thermodynamics with internal state variables. The journal of chemical physics 47 (2), pp. 597–613. Cited by: §1.
- Discovering symbolic models from deep learning with inductive biases. Advances in Neural Information Processing Systems 33, pp. 17429–17442. Cited by: §1.
- Model-free data-driven inelasticity. Computer Methods in Applied Mechanics and Engineering 350, pp. 81–99. Cited by: §1, §2.
- LaSDI: parametric latent space dynamics identification. arXiv preprint arXiv:2203.02076. Cited by: §1.
- Implicit constitutive modelling for viscoplasticity using neural networks. International Journal for Numerical Methods in Engineering 43 (2), pp. 195–219. Cited by: §1.
- Knowledge-based modeling of material behavior with neural networks. Journal of engineering mechanics 117 (1), pp. 132–153. Cited by: §1.
- Accelerating multiscale finite element simulations of history-dependent materials using a recurrent neural network. Computer Methods in Applied Mechanics and Engineering 357, pp. 112594. Cited by: §1.
- Physical and mechanical properties of ottawa f65 sand. In Model tests and numerical simulations of liquefaction and lateral spreading, pp. 45–67. Cited by: §5.2.
- Deep learning. MIT press. Cited by: §3.1, §3.1, §3.2.1, §3.2.2, §3.2.2, §3.2, §3.2.
- On the potential of recurrent neural networks for modeling path dependent plasticity. Journal of the Mechanics and Physics of Solids 143, pp. 103972. Cited by: §1.
Deep learning for natural language processing. New York: Apress. Cited by: §3.2.
- A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics. Computer Methods in Applied Mechanics and Engineering 379, pp. 113741. Cited by: §1.
- Numerical implementation of a neural network based material model in finite element analysis. International Journal for numerical methods in engineering 59 (7), pp. 989–1005. Cited by: §1.
- A physics-constrained data-driven approach based on locally convex reconstruction for noisy database. Computer Methods in Applied Mechanics and Engineering 363, pp. 112791. External Links: Cited by: §1.
- Manifold learning based data-driven modeling for soft biological tissues. Journal of Biomechanics 117, pp. 110124. External Links: Cited by: §1.
Physics-constrained deep neural network method for estimating parameters in a redox flow battery. Journal of Power Sources 528, pp. 231147. Cited by: §1.
- Physics-informed neural network method for forward and backward advection-dispersion equations. Water Resources Research 57 (7), pp. e2020WR029479. Cited by: §1.
- GLaSDI: parametric physics-informed greedy latent space dynamics identification. arXiv preprint arXiv:2204.12005. Cited by: §1.
- Physics-constrained local convexity data-driven modeling of anisotropic nonlinear elastic solids. Data-Centric Engineering 1. Cited by: §1.
Deep autoencoders for physics-constrained data-driven nonlinear materials modeling. Computer Methods in Applied Mechanics and Engineering 385, pp. 114034. Cited by: §1, §1.
- SO (3)-invariance of informed-graph-based deep neural network for anisotropic elastoplastic materials. Computer Methods in Applied Mechanics and Engineering 363, pp. 112875. Cited by: §1.
- Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §1, §3.2.1.
- Historical review of internal state variable theory for inelasticity. International Journal of Plasticity 26 (9), pp. 1310–1334. Cited by: §1.
- Neural network constitutive model for rate-dependent materials. Computers & Structures 84 (15-16), pp. 955–963. Cited by: §1.
- Non-intrusive reduced order modeling of natural convection in porous media using convolutional autoencoders: comparison with linear subspace techniques. Advances in Water Resources, pp. 104098. Cited by: §1.
A framework for data-driven solution and parameter estimation of pdes using conditional generative adversarial networks. Nature Computational Science 1 (12), pp. 819–829. Cited by: §1.
- A hyper-reduction computational method for accelerated modeling of thermal cycling-induced plastic deformations. Journal of the Mechanics and Physics of Solids 151, pp. 104385. Cited by: §1.
- A kernel method for learning constitutive relation in data-driven computational elasticity. Japan Journal of Industrial and Applied Mathematics 38 (1), pp. 39–77. Cited by: §1.
- Physics-informed machine learning. Nature Reviews Physics 3 (6), pp. 422–440. Cited by: §1.
- A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoder. Journal of Computational Physics 451, pp. 110841. Cited by: §1.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.3.
- Data driven computing with noisy material data sets. Computer Methods in Applied Mechanics and Engineering 326, pp. 622–641. Cited by: §1.
- Thermodynamics of elastic-plastic materials as a theory with internal state variables. Journal of Applied Physics 40 (8), pp. 3207–3218. Cited by: §1.
- Artificial neural networks in numerical modelling of composites. Computer Methods in Applied Mechanics and Engineering 198 (21-26), pp. 1785–1804. Cited by: §1.
- Artificial neural network as an incremental non-linear constitutive model for a finite element code. Computer methods in applied mechanics and engineering 192 (28-30), pp. 3265–3283. Cited by: §1.
- Machine learning strategies for systems with invariance properties. Journal of Computational Physics 318, pp. 22–35. Cited by: §1.
- A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. Cited by: §1, §3.2.
- Smart constitutive laws: inelastic homogenization through machine learning. Computer methods in applied mechanics and engineering 373, pp. 113482. Cited by: §1.
- Thermodynamics-based artificial neural networks for constitutive modeling. Journal of the Mechanics and Physics of Solids 147, pp. 104277. Cited by: §1, §3.1, §4.1.
- Deep learning predicts path-dependent plasticity. Proceedings of the National Academy of Sciences 116 (52), pp. 26414–26420. Cited by: §1.
- Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32. Cited by: §4.3.
- Internal state variable description of dynamic fracture of ductile solids. International Journal of Solids and Structures 22 (7), pp. 797–818. Cited by: §1.
- Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, pp. 686–707. Cited by: §1.
- Undrained seismic compression of unsaturated sand. Journal of Geotechnical and Geoenvironmental Engineering 147 (1), pp. 04020145. Cited by: §5.2.
- Finite element analysis of v-ribbed belts using neural network based hyperelastic material model. International Journal of Non-Linear Mechanics 40 (6), pp. 875–890. Cited by: §1.
- The mechanics and thermodynamics of continuous media. Springer Science & Business Media. Cited by: §2, §2.
- Associative coupled thermoplasticity at finite strains: formulation, numerical analysis and implementation. Computer Methods in Applied Mechanics and Engineering 98 (1), pp. 41–104. Cited by: §1.
- Computational inelasticity. Vol. 7, Springer Science & Business Media. Cited by: §2.
- Algorithms for static and dynamic multiplicative plasticity that preserve the classical return mapping schemes of the infinitesimal theory. Computer Methods in Applied Mechanics and Engineering 99 (1), pp. 61–112. Cited by: §1.
- Uncertainty quantification: theory, implementation, and applications. Vol. 12, Siam. Cited by: §6.
- Neural network based constitutive modeling of nonlinear viscoplastic structural response. Mechanics Research Communications 95, pp. 85–88. External Links: Cited by: §1.
- Physics-informed deep neural networks for learning parameters and constitutive relationships in subsurface flow problems. Water Resources Research 56 (5), pp. e2019WR026731. Cited by: §1.
- Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening. Computer Methods in Applied Mechanics and Engineering 377, pp. 113695. Cited by: §1.
- A multiscale multi-permeability poroplasticity model linked by recursive homogenizations and deep learning. Computer Methods in Applied Mechanics and Engineering 334, pp. 337–380. Cited by: §1.
- A recurrent neural network-accelerated multi-scale model for elasto-plastic heterogeneous materials subjected to random cyclic and non-proportional loading paths. Computer Methods in Applied Mechanics and Engineering 369, pp. 113234. Cited by: §1.
- Recurrent neural networks (rnns) with dimensionality reduction and break down in computational mechanics; application to multi-scale localization step. Computer Methods in Applied Mechanics and Engineering 390, pp. 114476. Cited by: §1.
- Non-intrusive inference reduced order model for fluids using deep multistep neural network. Mathematics 7 (8), pp. 757. Cited by: §1.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. Cited by: §3.1.
- Learning constitutive relations using symmetric positive definite neural networks. Journal of Computational Physics 428, pp. 110072. Cited by: §1, §4.3.
- A review of recurrent neural networks: lstm cells and network architectures. Neural computation 31 (7), pp. 1235–1270. Cited by: §1, §3.2.
- Using neural networks to represent von mises plasticity with isotropic hardening. International Journal of Plasticity 132, pp. 102732. Cited by: §1.