Physics-informed neural networks for modeling rate- and temperature-dependent plasticity

01/20/2022
by   Rajat Arora, et al.
4

This work presents a physics-informed neural network based framework to model the strain-rate and temperature dependence of the deformation fields (displacement, stress, plastic strain) in elastic-viscoplastic solids. A detailed discussion on the construction of the physics-based loss criterion along with a brief outline on ways to avoid unbalanced back-propagated gradients during training is also presented. We also present a simple strategy with no added computational complexity for choosing scalar weights that balance the interplay between different terms in the composite loss. Moreover, we also highlight a fundamental challenge involving selection of appropriate model outputs so that the mechanical problem can be faithfully solved using neural networks. Finally, the effectiveness of the proposed framework is demonstrated by studying two test problems modeling the elastic-viscoplastic deformation in solids at different strain-rates and temperatures, respectively.

READ FULL TEXT VIEW PDF

Authors

page 11

page 14

10/31/2017

Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling

This paper introduces a novel framework for learning data science models...
01/13/2020

Understanding and mitigating gradient pathologies in physics-informed neural networks

The widespread use of neural networks across different scientific domain...
07/08/2021

Direct detection of plasticity onset through total-strain profile evolution

Plastic yielding in solids strongly depends on various conditions, such ...
10/18/2020

An energy-based error bound of physics-informed neural network solutions in elasticity

An energy-based a posteriori error bound is proposed for the physics-inf...
05/09/2020

Inverse Modeling of Viscoelasticity Materials using Physics Constrained Learning

We propose a novel approach to model viscoelasticity materials using neu...
12/01/2021

A generic physics-informed neural network-based framework for reliability assessment of multi-state systems

In this paper, we leverage the recent advances in physics-informed neura...
01/18/2022

Temperature Field Inversion of Heat-Source Systems via Physics-Informed Neural Networks

Temperature field inversion of heat-source systems (TFI-HSS) with limite...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modeling the elastic-plastic response of materials using conventional numerical methods, such as finite element method, Isogeometric analysis, or mesh-free methods, has always been computationally expensive due to the inherent iterative nature of discretization algorithms used in such methods. Furthermore, multitude of ‘fundamentally accurate’ theories for the high-fidelity modeling of dislocation mediated plastic deformation at different scales [nielsen2019finite, arora2020finite, fleck1994strain, arora2020unification, niordson2019homogenized, lynggaard2019finite, kuroda2008finite, evers2004non, arora2020dislocation, greer2013shear, arora2019computational] or fracture modeling in materials [miehe2010thermodynamically, borden2014higher, yingjun2016phase, areias2016phase, kuhn2010continuum]

, is bringing these numerical solvers to their limits. Therefore, attempts are being made to explore the use of artificial intelligence, specifically Deep Neural Networks (DNN), to speed up (nonlinear) mechanical modeling of materials. The recent flurry of research involving the use of DNN to model physical systems has also been facilitated by the notable speed enhancements in the current computer architecture systems, and availability of computationally efficient open-source machine learning frameworks (PyTorch

[NEURIPS2019_9015]

, Tensorflow

[tensorflow2015-whitepaper]

, and Keras

[chollet2015keras] among others).

In fact, the idea of using the extraordinary approximating capability of DNN, established by the Universal approximation theorem [cybenko1989approximation]

, to obtain the solution of Partial Differential Equations (PDEs) by minimizing the network’s loss function, comprising the residual error of governing PDEs and its initial/boundary conditions, has been around for some time

[lagaris1998artificial, lagaris2000neural]. More recently, Raissi et. al [raissi2017physicsI, raissi2019physics] has extended this concept to develop a general Physics-Informed Neural Network (PINN) framework for solving the forward and inverse problems involving general nonlinear PDEs by relying on small or even zero labeled datasets. Several applications of PINNs can be found in the literature ranging from modeling of fluid flows and Navier Stokes equations [sun2020surrogate, rao2020physics, jin2021nsfnets, gao2020phygeonet], cardiovascular systems [kissas2020machine, sahli2020physics], and material modeling [frankel_prediction_2020, tipireddy2019comparative, zhang2020physics, meng2020composite, zhu2021machine, arora2021machine], among others. In addition to the PINN based approaches mentioned above, several data-driven approaches [versino2017data, li2017data, montans2019data, brunton2019methods] have also been proposed to generate surrogate models for modeling the solution of PDEs governing the behavior of physical systems. However, data-driven methods usually require extensive amount of experimentally or computationally generated data to learn a reliable model which may fail to satisfy physics-based constraints. On the other hand, the embedding of physics in the PINN based framework enforces physics-based constraints on the neural network outputs, thus enabling the generation of a high-fidelity model while simultaneously reducing (or even eliminating) the need for bigger training datasets.

Recurrent Neural Networks (RNN) and its variants Gated Recurrent Units [cho2014learning]

and Long Short-Term Memory (LSTM)

[hochreiter1997long] are another family of neural network models that are specifically designed to handle sequential data – making them an attractive choice to learn and predict (path-dependent) plastic behavior in metals. The presence of history-dependent hidden states in these models enable them to re-use relevant information from previous inputs to make future predictions which warrants their role similar to those of history-dependent internal variables (such as equivalent plastic strains) in computational plasticity. Various approaches using RNNs [mozaffar2019deep, deep_learning_plasticity2, Lstm_plasticity2, Huang_pod_cmame, gorji_potention_plast] have successfully demonstrated the predictive capability of neural networks to learn the path-dependent constitutive behavior in metals with great accuracy. These works showcase the potential of RNN based constitutive modeling to speedup existing numerical solvers by permitting direct evaluation of stresses at the integration points without the need for iterative return mapping algorithms. However, developing a physics-informed neural network to model the spatio-temporal variation of deformation in elastic-plastic solids, along with its dependence on strain-rate and temperature, poses several technical challenges. Through this work, we take a first step in highlighting these challenges and demonstrate the strength of PINNs for modeling elastic-viscoplastic deformation in materials.

The goal of the present work is to demonstrate the (physics-based) predictive capability of DNN models to model spatio-temporally varying deformation fields within elastic-plastic solids under monotonous loading conditions. In particular, two feedforward DNNs are used as the global approximator to study the dependence of spatio-temporally varying deformation fields (displacement, stress, and plastic strain) on strain rate (applied loading rate) and temperature, respectively. We provide several (but non-exhaustive) comparisons of the choice of the DNN architecture (number of hidden layers and number of neurons per layer) chosen to approximate the solution. A detailed discussion on the construction of (physics-based) composite loss along with a brief summary on ways to avoid unbalanced back-propagated (exploding) gradients during model training is also presented. We also present a strategy with no added computational complexity for choosing the scalar weights that balance the interplay between different terms in composite loss. Although the current work focuses on the scenarios with monotonic loading paths, we note that the deformation of an elastic-viscoplastic solid is a highly nonlinear function of temperature, strain rate, spatial coordinates, and strain. This

real-time stress predictive capability for elastic-viscoplastic materials enjoys special use in energy storage devices such as the design and development of lithium metal solid-state-batteries. Specifically, the study conducted here corresponds to analyzing the effect of impact (i.e. crash) and heat to the solid Anode in the solid state batteries.

The rest of this paper is organized as follows: After introducing notation and terminology immediately below, Sec. 2 presents a brief review of the equations for mechanical equilibrium for elastic-viscoplastic solids. The details of the neural network architecture and construction of the loss function are presented in Sections 3 and 4, respectively. Section 5 presents the results that evaluate the neural network models and demonstrate their predictive capabilities in modeling the temperature and strain rate dependence of deformation fields along with their spatio-temporal evolution for the two test cases discussed. Finally, conclusion of the current work, along with outlook of the future work, is presented in Sec. 6.

Notation and Terminology

Cartesian coordinate system is invoked for the ambient space and all (vector) tensor components are expressed with respect to the basis of this coordinate system. Vectors and tensors are represented by bold face lower- and upper-case letters, respectively. The symbol ‘’ denotes single contraction of adjacent indices of two tensors (i.e. or ). The symbol ‘’ denotes double contraction of adjacent indices of two tensors of rank two or higher (i.e. or ). The following list describes the recurring mathematical symbols used in this paper.

Mathematical symbols
Fourth order elasticity tensor
trace of the vector/tensor quantity
norm of the vector/tensor quantity
Divergence operator
Gradient operator
Second order Identity tensor
Deviatoric part of a second order tensor i.e. 
Stress in the material
Strain rate
Time
Scalar strain defined as
Volumetric domain
Spatial coordinates in the domain
Temperature of the domain
External boundary of the domain
External unit outward normal to
Elastic strain tensor
Plastic strain tensor
Plastic strain rate tensor
Total strain tensor
Displacement vector
Young’s Modulus
Shear Modulus
Poisson’s ratio
Equivalent plastic strain rate
Traction vector on Neumann boundary
Displacement vector on Dirichlet boundary
Pre-exponential factor
Activation energy
Gas constant
Strain-rate-sensitivity parameter
Sign of the scalar quantity
Material strength
Saturation value of for a given strain rate and temperature
Strain-hardening parameters
Initial value of the material strength

2 Governing equations

We briefly recall the system of nonlinear PDEs governing the behavior of elastic-viscoplastic solids under loads at small deformation. The reader is referred to standard textbooks [gurtin_fried_anand_2010] for a detailed discussion on the thermodynamics and mechanics of continuous media. The equilibrium equation, in the absence of body and inertial forces, is written as

(1)

The above PDE, together with the boundary conditions

(2)

describe the strong form of mechanical equilibrium. The total strain is given by the symmetric part of the displacement gradient, i.e. . is decomposed into the sum of elastic and plastic strain components, i.e. . The stress is given by the Hooke’s law

(3)

The plastic strain evolution is governed by

(4)

The equivalent plastic strain rate is assumed to be given by a power law of the form

(5)

The flow strength is assumed to evolve according to

(6)

Following [anand2019elastic], the hardening function is taken as

In this work, we use the values of material parameters listed in Table 2 which correspond to an elastic-viscoplastic model for Lithium [anand2019elastic]. These parameters have been calibrated using the experimental data from direct tension tests on polycrystalline lithium specimens [lepage2019lithium].

kJ/mol MPa MPa MPa MPa
7810 0.38
Table 2: Material parameters for the elastic-viscoplastic model of Lithium.

3 Model Architecture

The central idea is to use a fully connected DNN to approximate the nonlinear mapping (for deformation fields) , where and denote the -dimensional input and -dimensional output arrays for the mapping, respectively. In particular, we design and train two specific fully connected neural networks for modeling the elastic-viscoplastic behavior in two-dimensional solids and its dependence on strain rate, and temperature, respectively. To this end, we choose the inputs and the outputs of the model for the two cases before moving on to describing the model architecture.

Inputs of the model: The inputs for both the cases are described in the table below:

Case I : Strain rate dependence
Case II : Temperature dependence

Outputs of the model: The elastic-plastic deformation can be uniquely characterized by determining displacement vector and any one of the two tensor fields, plastic strain or stress (since they are related by (3)), along with the determination of internal variables. Therefore, a preferred choice of outputs for the neural network model would include and one of the two fields ( or ) plus internal variables. However, as demonstrated later in Appendix A, the DNN model with such choice of outputs suffers from degraded accuracy and convergence issues. Therefore, we propose a mixed-variable formulation, i.e., displacement, stress, plastic strain and the internal variables as the DNN outputs in this work. This formulation is found to be superior to the other formulation with regard to the trainability of the network as discussed later in appendix A. The model outputs for the proposed mixed-variable formulation for two-dimensional plane-strain conditions are implying . In doing so, we have used the facts that the stress tensor is symmetric and the plastic strain tensor is symmetric and deviatoric (i.e. .

The hyper-parameters completely defining the neural network model comprise the number of hidden layers , the number of neurons per layer

, and the activation function(s). The fully connected neural network algorithm representing the mapping

also includes the following main elements:

  • Normalization of the inputs: Each input component is individually scaled between and before being used as the network input. The normalization is done by linearly projecting it within a range of as follows

    (7)

    where is a (known) transformation based on the physical concepts in the process of elastic-viscoplastic deformation. The use of such transformations have previously been shown to improve neural network training in terms of speed and accuracy [kapoor2005use, Lstm_plasticity2].

  • Activation of the hidden layer: , the output array of any hidden layer , is calculated as

    (8)

    In the above, denotes the activation function for the hidden layer which operates elementwise on its arguments. and denote the weight matrix and bias array for the hidden layer , respectively. denotes the number of the neurons in the layer . For : and

  • Computation of the outputs: The normalized output array is similarly calculated as

    (9)

    and denote the weight matrix and bias for the output layer, respectively.

  • Denormalization of the network outputs: Each component is then individually denormalized to yield the output array as

    (10)

Henceforth, we denote the set of all weight matrices and bias arrays of the neural network by and , respectively. In the present work, the hyperbolic tangent function is used as activation function for all the hidden layers, i.e. . The transformation (see (7)) for all inputs is taken to be identity except for the strain rate, which is taken to be function since strain rates may vary over multiple orders of magnitudes.

Figure 1: Schematic showing the geometry and the applied boundary conditions.

4 Construction of the loss function

The development of a PINN based approach to solve a system of nonlinear PDEs renders us an optimization problem which involves solving for that minimizes network’s total loss. The composite loss in semi-supervised approach comprises the summation of supervised data loss and the physics-based loss i.e. . The (nondimensional) supervised data loss measures the discrepancy between the normalized ground truth data and the neural network outputs and is given by

(11)

where denotes the number of ground truth samples.

To evaluate the physics-based loss , we introduce a finite set of randomly distributed collocation points discretizing the (normalized) input space. The whole set of collocation points is denoted by where denotes the collocation points in the entire input space . and denote the subset of that intersects with the and , respectively. denotes the subset of that intersects with .

To this end, we construct the physics-based loss with seven components i) PDE loss , ii) Dirichlet boundary condition loss , iii) Neumann boundary condition loss , iv) initial condition loss , v) constitutive loss corresponding to the satisfaction of constitutive law, vi) plastic strain rate loss corresponding to the equation governing the evolution of plastic strain, and vii) strength loss enforcing the material strength evolution equation. Each component of is individually calculated as follows:

(12)

In the above, denotes the initial state of the system i.e. outputs at . The loss criterion is discussed in detail below. is then given as the weighted sum of these loss components

where are the scalar weights.

Next, we briefly discuss the two main difficulties that hinder the training of DNNs for elastic-viscoplastic modeling applications.

  1. The power law dependence of the equivalent plastic strain rate (Eq. (5)) leads to large values of norm of loss which causes unstable imbalance in the magnitude of the back-propagated gradients during the training when using common loss criterions such as Mean-Squared-Error . Therefore, in this work, we use a novel Modified Mean Squared Error (MMSE) loss criterion to reduce the numerical stiffness associated with equation (5) and allow stable gradients to be used during the training

    (13)

    In the above, denotes the residual value. The loss criterion is equivalent to the Mean Squared Error (MSE) criterion when the discrepancy between the residual values are small.

  2. The relative coefficients for all the losses comprising play an important role in mitigating the gradient pathology issue during the training [wang2020understanding]. There are competing effects between these different loss components which can lead to convergence issues during the minimization of the composite loss (see [sun2020surrogate, Sec. 4.1]). While the recent advances in mitigating gradient pathologies [wang2020understanding, bischof2021multi] might improve predictive accuracy, they introduce additional computational and memory overhead because of the calculation of an adaptive factor for each loss component. In this work, we devise a simple strategy, with no added computational complexity, to evaluate the coefficients which remain constant during the course of training. The strategy is outlined as follows:

    • The Dirichlet boundary condition and initial condition losses ( and ) are calculated in a normalized manner (scaled between ). So, we take .

    • The other loss components are nondimensionalized using appropriate scales as shown in Table 3. is a constant chosen to scale quantities with units of stress. Based on the observation that stress is often nondimensionalized by Shear Modulus in conventional numerical methods, we choose to achieve tight tolerance on the equilibrium equation and traction boundary conditions.

    • Since material strength and differ by orders of magnitude, is nondimensionalized by .

    • We nondimensionalize time by using strain rate , since sets the time scale for the problem.

    • The length is nondimensionalized by the characteristic length of the domain, chosen to be in this work.

Loss component Scaling
Table 3: Scaling constants for different physics-based loss components.

4.1 Training the network

The neural network is implemented and trained using PyTorch framework [NEURIPS2019_9015]. Before the training, the trainable parameters of the neural network are initialized using Xavier initialization [glorot2010understanding] technique. Adam optimizer [kingma_adam:_2015] is used as the optimization algorithm with an initial learning rate with the other hyper-parameters set to their default value. We continue training for around epochs (complete passes through the whole training dataset) during which the learning rate is monitored and adaptively reduced by using ReduceLROnPlateau scheduler with the value of patience set to . The dataset collection strategy, and splitting into the training/validation/test datasets are discussed in greater detail in Section 5.

5 Results & Discussion

To illustrate the application of the proposed approach, we design and train specific neural networks for the two test cases focusing on reproducing the complex behavior of an elastic-viscoplastic material under loads along with its dependence on a) strain rate and b) temperature. Fig. 2 shows a schematic of the body along with the applied boundary conditions for both the cases. Without loss of generality, the body is assumed to deform quasistatically under plane strain conditions in the absence of body forces. An in-house code is developed using deal.II [BangerthHartmannKanschat2007] to generate the ground truth data by solving the system of equations (1) - (6) up to strain . The code uses Finite Element Method with bi-linear elements on a grid size of . To qualitatively assess the accuracy of the predictions, we define Root-Mean-Squared Error (RMSE) as

(14)

where is the (normalized) quantity of interest and denotes the number of reference points in the spatial domain. The averaged-RMSE over all output quantities is then defined as

(15)

In the following, a neural network architecture with number of layers and number of neurons per layer will be referred by .

Figure 2: Schematic showing the geometry and the applied boundary conditions.

Figure 3: Comparison of the results obtained from physics-informed neural network model (left block) with the ground truth reference data (right block) for .

5.1 Case I: Strain rate dependence

Data collection strategy: To study the effect of strain rate on the spatio-temporal evolution of deformation fields in the body, we generate the numerical data at the centroid of each grid element for the following strain rates ): at a constant temperature upto strain . The dataset is then randomly split into a ratio for training, validation, and test purposes. We note that the testing data is set to zero as the model testing will be performed separately for several strain rates that are different from those used for generating the training data.

First, we perform a (non-exhaustive) parametric study to identify a suitable number of hidden layers and number of neurons per layer needed to model the deformation field with an acceptable accuracy. We train neural networks with the following architectures: i) , ii) , iii) , iv) , v) , and vi) . Figure 4 presents the training history for each of these architectures. As expected, we see a merit in increasing both and initially but the final value of the composite loss stops improving when the number of layers are increased from to keeping fixed at . These three network architectures ( and ) reduce the nondimensional composite loss by almost five orders of magnitude (from to ). The values of the corresponding validation losses are monitored to notice any overfitting issues. We use the neural network with architecture for results presented in this subsection.

Next, we compare the predicted values of the stress, plastic strain, and displacement fields in the domain with a test dataset for at different strains i.e. at and . Figure 3 presents the predicted values of these deformation field components (left block) along with the FEM reference solution (right block). We can observe that the predicted values have no visible artifacts and are in great agreement with the FEM reference results. We also calculate the value of the RMSE (see (14)) for each output quantity and report it underneath the corresponding field plot. The small values of further confirm our observation that the neural network predictions match the FEM reference results remarkably well.

Figure 4: Comparison of the composite loss for different neural network architectures . and denotes the number of layers and number of nodels per layer, repectively. Figure 5: Stress-strain responses for different strain rates in and outside of the training data range.

Next, we test the predictive capabilities of the trained neural network for values of inputs that lie outside of the subspace spanned by the training data. Specifically, we calculate the averaged-RMSE (see (15)) for different strain rate values and multiple strain values in the range . We recall that training data spanned and . Figure 7 shows the variation of the error with strain at different strain rates . We make two important observations from this plot:

  1. For the values of strain rate within the training range, the error is very small upto strain . However, in the region , the error steadily increases to .

  2. For values of outside of the training data range, the error is large at all strains which implies that the predicted values do not match well with the actual FEM reference data.

Both these observations are consistent with the obtained stress-strain responses plotted in Figure 5. The stress-strain response is obtained by plotting the total force on the top surface divided by its length.

We therefore conclude that a fully connected DNN is successfully able to learn the highly nonlinear dependence of the deformation fields on the applied strain rate along with their spatio-temporal evolution. The predictions match remarkably well with the ground truth reference data for inputs within the training data range. However, as expected, the accuracy of the predictions degrade for strains and strain rate outside the training data range.

Figure 6: Comparison of the results obtained from physics-informed neural network model (left block) and the ground truth reference data (right block) for .

5.2 Temperature dependence

Data collection strategy: To approximate the nonlinear dependence of the (spatio-temporally varying) deformation fields (stress, plastic strain and displacement) on temperature and strain, we generate the numerical data at the centroid of each grid element for the following temperatures (): at a constant strain rate upto strain . In the similar spirit as before (see Sec. 5.1), the data set is split into ratio for training, validation, and test purposes. Test dataset size is set to zero as the model testing will be performed separately for several temperatures that are different from those used during model training.

As before, we first conduct a study to gain insight into the effect of and on the composite loss and train the six aforementioned neural network architectures (see Sec. 5.1). Figure 8 presents the training history for each of these architectures which shows similar trend as in Figure 4. Therefore, we use the neural network with architecture for results presented in this subsection.

Next, we compare the predicted values of the stress, plastic strain, and displacement fields in the domain with a test dataset for at different strains i.e. at and . Figure 3 presents the predicted values of these deformation field components (left block) along with the FEM reference solution (right block). We can observe that the error (see (14)) has small values and the predictions are in great agreement with the FEM reference results.

Figure 9 shows the variation of the error with strain for different temperatures. We note that the error rises to as the strains go beyond the training data range. On the other hand, for values of temperature outside the training data range, the errors are still which is in contrast with the error trend observed in Figure 7 when strain-rate was outside the training data range. The stress-strain response plotted in Figure 10 further confirm these observations.

We therefore conclude that a fully connected neural network is successfully able to learn the nonlinear dependence of the deformation field on the temperature along with their spatio-temporal evolution.

Figure 7: Variation of error with strain for different strain rates in and outside of the training data range. Figure 8: Comparison of the composite loss for different neural network architectures . and denotes the number of layers and number of nodels per layer, repectively.
Figure 9: Variation of error with strain for different temperatures in and outside of the training data range. Figure 10: Stress-strain responses for different temperatures in and outside of the training data range.

6 Conclusion

This work demonstrates the strength of physics-informed neural networks in the context of problems dealing with the evolution of highly nonlinear deformation field in elastic-viscoplastic materials under monotonous loading. In particular, we trained specific physics-informed neural networks and applied them to two test cases of modeling the spatio-temporally varying deformation in elastic-viscoplastic materials at different strain rates, and temperatures, respectively. We obtained results that are in great agreement with the ground truth reference data for both the test cases discussed in this work.

This work also discusses the construction of composite loss function, comprising the data loss component and physics-based loss components, in great detail. We also discuss a novel physics-based strategy for evaluation of nondimensional scalar constants that weigh each component in the physics-based loss function without any added computational complexity. Moreover, a novel loss criterion for residual calculation corresponding to plastic strain rate equation is proposed to alleviate issues related to unbalanced back-propagated (exploding) gradients during model training.

We also highlighted a fundamental challenge involving selection of appropriate model outputs so that the mechanical problem can be faithfully solved using neural networks. We present and compare two potential choice of outputs for the model in Appendix A and present detailed reasoning for preferring one choice over the other. The real-time stress field prediction in such highly nonlinear mechanical system paves the way for many new applications, such as design and optimization of lithium ion batteries or inverse modeling problems which were previously computationally intractable.

Future work will also focus on extending the framework to account for the path dependency of the loading by using recurrent architectures such as long short-term memory (LSTM) [hochreiter1997long] and gated recurrent unit (GRU) [cho2014learning] to detect history-dependent features. It would also be interesting to investigate the effect of enforcing boundary conditions in a hard manner [rao2021physics] in the current framework.

Appendix A Comparison between two models

This section compares the results obtained from two PINN models a) Model I with displacement, stress, plastic strain, and strength () as outputs and b) Model II with displacement, plastic strain, and stress () as outputs. For Model II, the physics-based loss is obtained from the set of equations (12) with the following important changes: i) The stress is directly calculated from the displacements and plastic strains which are outputs of the neural network, i.e. . This implicitly leads to satisfaction of constitutive law so the loss component is ignored. ii) The data loss is also modified to account for the current model outputs.

The study conducted here corresponds to case I: Understanding the effect of strain rate on the spatio-temporal evolution of deformation in an elastic-viscoplastic material. The learning rate for Model II is taken to be

while keeping the collocation points and all other hyperparameters the same for both the architectures as described in Section

5.1.

The convergence of the training loss for both the models is presented in Fig. 11. It can be seen that loss reaches a stagnation value of for model II at around epochs which is approximately hundred times larger than the converged loss value obtained for model I. We can conclude that model I does not suffer from any such degraded accuracy or convergence issue as indicated by Figure 11. This result is an extension of the similar observation for the purely linear elastic calculations presented in [rao2021physics] to the general elastic-plastic modeling case discussed here.

While the exact reasons for such a behavior are still unclear, we highlight the main differences between the two models. First, the stress calculated in model II is sensitive to the noise in the gradients of . Second, we note that highest order of the spatial derivatives occurring in the composite loss function is one and two for models I and II, respectively. Moreover, in elastic/elastic-plastic deformations the order of displacement field magnitudes in the and direction can be vastly different because of the loading setup and Poisson’s effect. We believe that these factors combine together to give rise to convergence issue and degraded accuracy when using model II. The use of improved training technique [czarnecki2017sobolev], which also approximates target derivatives along with target values, may alleviate these issues for model II but that may involve added computational complexity and remains the subject of future investigation.

Figure 11: Comparison of training history for Model I and II differing only in the model outputs.

References