A deep learning framework for solution and discovery in solid mechanics

by   Ehsan Haghighat, et al.

We present the application of a class of deep learning, known as Physics Informed Neural Networks (PINN), to learning and discovery in solid mechanics. We explain how to incorporate the momentum balance and constitutive relations into PINN, and explore in detail the application to linear elasticity, and illustrate its extension to nonlinear problems through an example that showcases von Mises elastoplasticity. While common PINN algorithms are based on training one deep neural network (DNN), we propose a multi-network model that results in more accurate representation of the field variables. To validate the model, we test the framework on synthetic data generated from analytical and numerical reference solutions. We study convergence of the PINN model, and show that Isogeometric Analysis (IGA) results in superior accuracy and convergence characteristics compared with classic low-order Finite Element Method (FEM). We also show the applicability of the framework for transfer learning, and find vastly accelerated convergence during network re-training. Finally, we find that honoring the physics leads to improved robustness: when trained only on a few parameters, we find that the PINN model can accurately predict the solution for a wide range of parameters new to the network—thus pointing to an important application of this framework to sensitivity analysis and surrogate modeling.



There are no comments yet.



A deep learning framework for solution and discovery in solid mechanics: linear elasticity

We present the application of a class of deep learning, known as Physics...

Physics informed deep learning for computational elastodynamics without labeled data

Numerical methods such as finite element have been flourishing in the pa...

DAE-PINN: A Physics-Informed Neural Network Model for Simulating Differential-Algebraic Equations with Application to Power Networks

Deep learning-based surrogate modeling is becoming a promising approach ...

Probabilistic Deep Learning for Real-Time Large Deformation Simulations

For many novel applications, such as patient-specific computer-aided sur...

Deep learning for solution and inversion of structural mechanics and vibrations

Deep learning has been the most popular machine learning method in the l...

A transfer learning enhanced the physics-informed neural network model for vortex-induced vibration

Vortex-induced vibration (VIV) is a typical nonlinear fluid-structure in...

Symplectic Momentum Neural Networks – Using Discrete Variational Mechanics as a prior in Deep Learning

With deep learning being gaining attention from the research community f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the past few years, there has been a revolution in the successful application of Artificial Neural Networks (ANN), also commonly referred to Deep Neural Networks (DNN) and Deep Learning (DL), in various fields including image classification, handwriting recognition, speech recognition and translation, and computer vision. These ANN approaches have led to a sea change in the performance of search engines, autonomous driving, e-commerce, and photography (see

Bishop (2006); LeCun et al. (2015); Goodfellow et al. (2016) for a review). In engineering and science, ANNs have been applied to an increasing number of areas, including geosciences Yoon et al. (2015); Bergen et al. (2019); DeVries et al. (2018); Kong et al. (2018); Ren et al. (2019), material science Pilania et al. (2013); Butler et al. (2018); Shi et al. (2019); Brunton and Kutz (2019), fluid mechanics Brenner et al. (2019); Brunton et al. (2020), genetics Libbrecht and Noble (2015), and infrastructure health monitoring Rafiei and Adeli (2017); Sen et al. (2019), to name a few examples. In the solid and geomechanics community, deep learning has been used primarily for material modeling, in an attempt to replace classical constitutive models with ANNs Ghaboussi and Sidarta (1998); Kalidindi et al. (2011); Mozaffar et al. (2019). In these applications, training of the network, i.e., evaluation of the network parameters, is carried out by minimizing the norm of the distance between the network output (prediction) and the true output (training data). In this paper, we will refer to ANNs trained in this way as “data-driven.”

A different class of ANNs, known as Physics-Informed Neural Networks (PINN), was introduced recently Rudy et al. (2019); Raissi et al. (2019); Han et al. (2018); Bar-Sinai et al. (2019); Zhu et al. (2019)

. This concept of ANNs was developed to endow the network model with known equations that govern the physics of a system. The training of PINNs is performed with a cost function that, in addition to data, includes the governing equations, initial and boundary conditions. This architecture can be used for solution and discovery (finding parameters) of systems of ordinary differential equations (ODEs) and partial differential equations (PDEs). While solving ODEs and PDEs with ANNs is not a new topic, e.g.,

Meade and Fernandez (1994); Lagaris et al. (1998, 2000)

, the success of these recent studies can be broadly attributed to: (1) the choice of network architecture, i.e., the set of inputs and outputs of the ANN, so that one can impose governing equations on the network; (2) algorithmic advances, including graph-based automatic differentiation for accurate differentiation of ANN functionals and for error back-propagation; and (3) availability of advanced machine-learning software with CPU and GPU parallel processing capabilities including Theano

Bergstra et al. (2010)

and TensorFlow

Abadi et al. (2016).

This framework has been used for solution and discovery of Schrodinger, Allen–Cahn, and Navier–Stokes equations Raissi et al. (2019); Rudy et al. (2019). It has also been used for solution of high-dimensional stochastic PDEs Han et al. (2018). As pointed out in Han et al. (2018)

, this approach can be considered as a class of Reinforcement Learning

Lange et al. (2012), where the learning is on maximizing an incentive or minimizing a loss rather than direct training on data. If the network prediction does not satisfy a governing equation, it will result in an increase in the cost and therefore the learning traverses a path that minimizes that cost.

Here, we focus on the novel application of PINNs to solution and discovery of solid mechanics. We study linear elasticity in detail, but then illustrate the performance on nonlinear von Mises elastoplasticity. Since parameters of the governing PDEs can also be defined as trainable parameters, the framework inherently allows us to perform parameter identification (model inversion). We validate the framework on synthetic data generated from low-order and high-order Finite Element Methods (FEM) and from Isogeometric Analysis (IGA) Hughes et al. (2005); Cottrell et al. (2009). These datasets satisfy the governing equations with different order of accuracy, where the error can be considered as noise in data. We find that the training converges faster on more accurate datasets, pointing to importance of higher-order numerical methods for pre-training ANNs. We also find that if the data is pre-processed properly, the training converges to the correct solution and correct parameters even on data generated with a coarse mesh and low-order FEM—an important result that illustrates the robustness of the proposed approach. Finally, we find that, due to the imposition of the physics constraints, the training converges on a very sparse data set, which is a crucial property in practice given that the installation of a dense network of sensors can be very costly.

Parameter estimation (identification) of complex models is a challenging task that requires a large number of forward simulations, depending on model complexity and the number of parameters. As a result, most inversion techniques have been applied to simplified models. The use of PINNs, however, allows us to perform identification simultaneously with fitting the ANN model on data

Raissi et al. (2019). This property highlights the potential of this approach compared with classical methods. We explore the application of PINN models for identification of multiple datasets generated with different parameters. Similar to transfer learning, where a pre-trained model is used as the initial state of the network Taylor and Stone (2009), we perform re-training on new datasets starting from a previously trained network on a different dataset (with different parameters). We find that the re-training and identification of other datasets take far less time. Since the successfully trained PINN model should also satisfy the physics constraints, it is in effect a surrogate model that can be used for extrapolation on unexplored data. To test this property, we train a network on four datasets with different parameters and then test it on a wide range of new parameter sets, and find that the results remain relatively accurate. This property points to the applicability of PINN models for sensitivity analysis, where classical approaches typically require an exceedingly large number of forward simulations.

2 Physics-Informed Neural Networks: Linear Elasticity

In this section, we review the equations of linear elastostatics with emphasis on PINN implementation.

2.1 Linear elasticity

The equations expressing momentum balance, the constitutive model and the kinematic relations are, respectively,



denotes the Cauchy stress tensor. For the two-dimensional problems considered here

(or ). We use the summation convention, and an subscript comma denotes partial derivative. The function denotes a body force, represents the displacements, is the infinitesimal stress tensor and is the Kronecker delta. The Lamé parameters and are the quantities to be inferred using PINN.

2.2 Introduction to Physics-Informed Neural Networks

In this section, we provide an overview of the Physics-Informed Neural Networks (PINN) architecture, with emphasis on their application to model inversion. Let be an

-layer neural network with input vector

, output vector , and network parameters . This network is a feed-forward network, meaning that each layer creates data for the next layer through the following nested transformations:


where and are inputs and outputs of the model, are parameters of each layer , known as weights and biases, respectively. The functions

are called activation functions and make the network nonlinear with respect to the inputs. For instance, an ANN functional of some field variable, such as displacement

, with three hidden layers and with as the activation function for all layers except the last can be written as


This model can be considered as an approximate solution for the field variable  of a partial differential equation.

In the PINN architecture, the network inputs (also known as features) are space and time variables, i.e., in Cartesian coordinates, which makes it meaningful to perform the differentiation of the network’s output with respect to any of the input variables. Classical implementations based on finite difference approximations are not accurate when applied to deep networks (see Baydin et al. (2017) for a review). Thanks to modern graph-based implementation of the feed-forward network (e.g., Theano Bergstra et al. (2010), Tensorflow Abadi et al. (2016), MXNet Chen et al. (2015)), this can be carried out using Automatic Differentiation at machine precision, therefore allowing for many hidden layers to represent nonlinear response. Hence, evaluation of a partial differential operator acting on is achieved naturally with graph-based differentiation and can then be incorporated in the cost function along with initial and boundary conditions as:


where is the domain boundary, is the initial condition at , and indicates the expected (true) value for the differential relation at any given training point. The norm of a generic quantity defined in denotes where the ’s are the spatial points where the data is known. The dataset is then fed to the neural network and an optimization is performed to evaluate all the parameters of the model, including the parameters of the PDE.

2.3 Training PINN

Different algorithms that can be used to train a neural network. Among the choices available in Keras

Chollet and others (2015) we use the Adam optimization scheme Kingma and Ba (2014), which we have found to outperform other choices such as Adagrad Duchi et al. (2011), for this task. Several algorithmic parameters affect the rate of convergence of the network training. Here we adopt the terminology in Keras Chollet and others (2015), but the terminology in other modern machine learning packages is similar. The algorithmic parameters include batch-size, epochs, shuffle, and patience

. Batch-size controls the number of samples from a dataset used to evaluate one gradient update. A batch-size of 1 would be associated with a full stochastic gradient descent optimization. One epoch is one round of training on a dataset. If a dataset is shuffled, then a new round of training (epoch) would result in an updated parameter set because the batched-gradients are evaluated on different batches. It is common to re-shuffle a dataset many times and perform the back-propagation updates. The optimizer may, however, stop earlier if it finds that new rounds of epochs are not improving the cost function. That is where the last keyword, patience, comes in. This is mainly because we are dealing with non-convex optimization and we need to test the training from different starting points and in different directions to build confidence on the parameters evaluated from minimization of the cost-function on a dataset. Patience is the parameter that controls when the optimizer should stop the training.

There are three ways to train the network: (1) generate a sufficiently large number of datasets and perform a one-epoch training on each dataset, (2) work on one dataset over many epochs by reshuffling the data, and (3) a combination of these. When dealing with synthetic data, all approaches are feasible to pursue. However, strategy (1) above is usually impossible to apply in practice, specially in space, where sensors are installed at fixed and limited locations. In the original work on PINN Raissi et al. (2019), approach (1) was used to train the model, where datasets are generated on random space discretizations at each epoch. Here, we follow approach (2) to use training data that we could realistically have in practice. For all examples, unless otherwise noted, we use a batch-size of 64, a limit of 10,000 epochs with shuffling, and a patience of 500 to perform the training.

3 Illustrative Example and Discussion

In this section, we use the PINN architecture on an illustrative linear elasticity problem.

3.1 Problem setup

To illustrate the application of the proposed approach, we consider an elastic plane-strain problem on the unit square (Fig. 1), subject to the boundary conditions depicted in the figure. The body forces are:


The exact solution of this problem is

Figure 1: Problem setup and boundary conditions.

which is plotted in Fig. 2, for parameter values of , , and .

Figure 2: Exact solution in Eqs. (6)–(7) for parameter values of , , and .

3.2 Neural Network setup

Due to the symmetry of the stress and strain tensors, the quantities of interest for a two-dimensional problem are , , , , , , , . There are a few potential architectures that we can use to design our network. The input features (variables) are the spatial coordinates , for all the network choices. For the outputs, a potential design is to have a densely connected network with two outputs as . Another option is to have two densely connected independent networks with only one output each, associated with and , respectively (Fig. 3). Then, the remaining quantities of interest, i.e., , can be obtained through differentiation. Alternatively, we may have or as outputs of one network or multiple independent networks. As can be seen from Fig. 3, these choices affect the number of parameters of the network and how different quantities of interest are correlated. Equation (3

) shows that the the feed-forward neural network imposes a special functional form to the network that may not necessarily follow any cross-dependence between variables in the governing equations (

1). Our data shows that using separate networks for each variable results in a far more effective strategy. Therefore, we propose to have variables defined as independent ANNs as our architecture of choice (see Fig. 4), i.e.

Figure 3: Potential PINN network choices, with and as outputs of a single network (left), or outputs of two independent networks with different parameters (right).

The cost function is defined as


The quantities with asterisks represent given data. We will train the networks so that their output values are as close as possible to the data, which may be real field data or, in this paper, synthetic data from the exact solution to the problem or the result of a high-fidelity simulation. The values without asterisk represent either direct outputs of the network (e.g., or ; see Eq. (8)) or quantities obtained through automatic graph-based differentiation Baydin et al. (2017) of the network outputs (e.g., ). In Eq. (9), and represent data on the body forces obtained as .

The different terms in the cost function represent measures of the error in the displacement and stress fields, the momentum balance, and the constitutive law. This cost function can be used for deep-learning-based solution of PDEs as well as for identification of the model parameters. For the solution of PDEs, and are treated as fixed numbers in the network. For parameter identification, and are treated as network parameters that change during the training phase (see Fig. 4). In TensorFlow Abadi et al. (2016) this can be accomplished defining and as Constant (PDE solution) or Variable (parameter identification) objects, respectively. We set up the problem using the SciANN Haghighat and Juanes (2019) framework, a high-level Keras Chollet and others (2015) wrapper for physics-informed deep learning and scientific computations. Experimenting with all of the previously mentioned network choices can be easily done in SciANN with minimal coding.111The code for some of the examples solved here is available at: https://github.com/sciann/examples.

Figure 4: Network architecture of choice used in this study. We define five networks, one for each variable of interest, i.e., . Each network has as input features.

3.3 Identification of model parameters: PINN trained on the exact solution

Here, we use PINN to identify the model parameters and . Our data corresponds to the exact solution with parameter values , and . Our default dataset consists of 100

100 sample points, uniformly distributed. We study how the accuracy and the efficiency of the identification process depend on the architecture and functional form of the network; the available data; and whether we use one or several independent networks for the different quantities of interest. To study the impact of the architecture and functional form of the ANN, we use 4 different networks with either 5 or 10 hidden layers, and either 20 or 50 neurons per layer; see Table 

1. The role of the network functional form is studied comparing the performance of the two most widely used activation functions, i.e.,

and ReLU, where

Bishop (2006).

Studying the impact of the available data on the identification process is crucial because we are interested in identifying the model parameters with as little data as possible. We undertake the analysis considering two scenarios:

  1. Stress-complete data: In this case, we have data at a set of points for the displacements and their first-order derivatives, that is, , , , , . Because our cost function (9) involves also data that depends on the stress derivatives ( and ), this approach relies on an additional algorithmic procedure for differentiation of stresses. In this section we compute the stress derivatives using second-order central finite-difference approximations.

  2. Force-complete data: In this scenario, we have data at a set of points for the displacements, their first derivatives and their second derivatives. The availability of the displacement second derivatives allows us to determine data for the body forces and using the momentum balance equation without resorting to any differentiation algorithm.

Network Layers Neurons Number of Parameters
Independent Networks Single Network
i 5 20 12336 1893
ii 5 50 72816 10713
iii 10 20 27036 3993
iv 10 50 162066 23463
Table 1: Statistics of the networks of choice to perform PINN learning.

In Fig. 5 we compare the evolution of the cost function for stress-complete data (Fig. 5a) and force-complete data (Fig. 5b). Both figures show a comparison of the four network architectures that we study; see Table 1. We find that training on the force-complete data performs slightly better (lower loss) at a given epoch.

The result of convergence of model identification is shown in Fig. 6. The training converges to the true values of parameters, i.e., and , for all cases. We find that the optimization is very quick on the parameters while it takes far more epochs to fit the network on the field variables. Additionally, we observe that deeper networks produce less accurate parameters. We attribute the loss of accuracy as we increase the ANN complexity to over-fitting Bishop (2006); Goodfellow et al. (2016)

. Convergence of the individual terms in the loss function (

9) is shown in Fig. 7 for Net-ii (see Table 1). We find that all terms in the loss, i.e., data-driven and physics-informed, show oscillations during the optimization. Therefore, no individual term is solely responsible for the oscillations in the total loss (Fig. 5).

Figure 5: The result of training networks i, ii, iii, and iv on the analytical data set , , , , and ; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.
Figure 6: The result of identification for for networks i, ii, iii, and iv on the analytical data set , , , , and ; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.
Figure 7: Individual terms of total loss (9) for network ii on the analytical data set , , , , and ; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.

The impact of the ANN functional form can be examined comparing the data in Figs. 5b and 8a, which show the evolution of the cost function using the activation functions and ReLU, respectively. The function ReLU has discontinuous derivatives, which explains its poor performance for physics-informed deep learning, whose effectiveness relies heavily on accurate evaluation of derivatives.

A comparison of Figs. 5b and 8b shows that using independent networks for displacements and stresses is more effective than using a single network. We find that the single network leads to less accurate elastic parameters because the cross-dependencies of the network outputs through the kinematic and constitutive relations may not be adequately represented by the activation function.

Figure 8: (a) ReLU activation function on the analytical data set , , , , , , and . (b) Connected network.

Fig. 9 analyzes the effect of availability of data on the training. We computed the exact solution on four different uniform grids of size , , , and ; and carried out the parameter identification process. We performed the comparison using force-complete data and a network with 10 layers and 20 neurons per layer (network iii). The training process found good approximations to the parameters for all cases, including that with only  points. The results show that fewer data points require many more epoch cycles, but the overall computational cost is far lower.

Figure 9: Training on different sizes of data. Parameters are all accurately identified however solution has different level of accuracy.

3.4 PINN models trained on the FEM solution

Here, we generate synthetic data from FEM solutions, and then perform the training. The domain is discretized with a mesh comprised of elements. Four datasets are prepared using quadrilateral bilinear, biquadratic, bicubic, and biquartic Lagrange elements using the commercial FEM software COMSOL COMSOL (2020). We evaluate the FEM displacements, strains, stresses and stress derivatives at the center of each element. Then, we map the data to a

training grid using SciPy’s griddata module with cubic interpolation. This step is performed as a data-augmentation procedure, which is a common practice in machine learning

Bishop (2006).

To analyze the importance of data satisfying the governing equations of the system, we focus our attention on network ii and we study cases with stress- and force-complete data. The results of training are presented in Fig. 10. As can be seen here, the bilinear element performs poorly on the learning and identification. The performance of training on the other elements is good, comparable to that using the analytical solution. Further analysis shows that this is indeed expected as FEM differentiation of bilinear elements provides a poor approximation of the body forces. The error in the body forces is shown in Fig. 11, which indicates a high error for bilinear elements. We conclude that the standard bilinear elements are not suitable for this problem to generate numerical data for deep learning. Fig. 10a2 confirms that pre-processing the data can remove the error that was present in the numerical solution with bilinear elements, and enable the optimization to successfully complete the identification.

Figure 10: (a1) Training on the FEM dataset using , , , , , and components. (a2) Training with body forces and evaluated from central-differentiation of stress components.
Figure 11: The error in bilinear, biquadratic, bicubic, and biquartic FEM data, that is evaluated as the difference between FEM evaluation of momentum relation, i.e., and true body forces in (top) and (bottom) directions.

3.5 PINN models trained on the IGA solution

Observing the lowest loss on the analytical solution, we decided to study the influence of the global continuity of the numerical solution. We generated a -continuous dataset using Isogeometric analysis Bazilevs et al. (2010). We, therefore, analyze the system using IGA elements with again a grid of dimension. The data are then mapped on to a grid of and used to train the PINN models. The training results are shown in Fig. 12. The outputs are very similar to the high-order FEM datasets.

Figure 12: (a1) Training on the IGA dataset using , , , , , and components. (a2) Learning with body forces and evaluated from centeral-differentiation of stress components.

3.6 Identification using transfer learning

Here we explore the applicability of our PINN framework to transfer learning: a neural network that is pre-trained is used to perform identification on a new dataset. The expectation is that since the initial state of neural network is not randomly chosen anymore, training should converge faster to the solution and parameters of the data. This is crucial for many practical aspects including adaptation to new data for online search or purchase history Taylor and Stone (2009) or in geosciences, where we can train a representative PINN in highly-instrumented regions and use them at other locations with limited observational datasets. To this end, we use the pre-trained model on Net-iii (Fig. 5), which was trained on a dataset with and and then we explore how the loss evolves and the training converges when data is generated with different values of .

In Fig. 13 we show the convergence of the model with different datasets. Note that the loss is normalized by the initial value from the pre-trained network on (Fig. 5). As can be seen here, re-training on new datasets costs only a few hundred epochs with a smaller initial value for the loss. This is pointing to the advantage of deep learning and PINN, where retraining on similar data is much less costly than classical methods that rely on forward simulations.

Figure 13: Identification of a new dataset generated with different values of using a pre-trained neural network on . The re-training takes far less epochs to converge with an initial value for loss much smaller.

3.7 Application to sensitivity analysis

Performing sensitivity analysis is an expensive task when the analytical solution is not available, since it requires performing many forward numerical simulations. Alternatively, if we can construct a surrogate model to be a function of parameters of interest, then performing sensitivity analysis becomes tractable. However, construction of such a surrogate model is itself an expensive task within classical frameworks. Within PINN, however, this seems to be naturally possible. Let us suppose that the parameter of interest is shear modulus . Consider an ANN model with inputs as and outputs as . We can, therefore, use a similar framework to construct a model that is a function of in addition to the space variables. Again, PINN can constrain the model to adapt to the physics of interest and therefore there is less data needed to construct such a model.

Here, we explore if a PINN model trained on multiple datasets generated with various material parameters, i.e., different values of , can be used as a surrogate model to perform sensitivity analysis. The network in Fig. 4 is now slightly adapted to carry as an extra input (in addition to ). The training set is prepared based on and . Note that there is no identification in this case, and therefore the parameters and are known at any given training data. The results of the analysis are shown in Fig. 14. For a wide range of values of , the model performs very well in terms of displacements; it is less accurate, but still very useful, in terms of stresses with a maximum error for near-incompressible conditions, .

Figure 14: Application to sensitivity analysis: the model is trained on multiple datasets generated with different values of (highlighted in dot-dashed lines). The model is then tested on a continuous range of values for . The error is defined as at the point where is maximum.

4 Extension to Nonlinear Elastoplasticity

In this section, we discuss the application of PINN to nonlinear solid mechanics problems undergoing elastic and plastic deformation. We use the von Mises elastoplastic constitutive model—a commonly used model to describe mechanical behavior of solid materials, in particular metals. We first describe the extension of the linear-elasticity relations in Eq. (1) to the von Mises elastoplastic relations. We then discuss the neural-network setup and apply the PINN framework to identify parameters of a classic problem: a perforated strip subjected to uniaxial extension.

4.1 von Mises elastoplasticity

We adopt the classic elastoplasticity postulate of additive decomposition of the strain tensor Simo and Hughes (1998),


The stress tensor is now linearly dependent on the elastic strain tensor:


The plastic part of deformation tensor is evaluated through a plasticity model. The von Mises model implies that the plastic deformation occurs in the direction of normal to a yield surface defined as , as


where  is the yield stress,  is the equivalent stress defined as , with the components of the deviatoric stress tensor, . The strain remains strictly elastic as long as the state of stress  remains inside the yield surface, . Plastic deformation occurs when the state of stress is on the yield surface, . The condition is associated with an inadmissible state of stress. Parameter  is the plastic multiplier, subject to the condition , and evaluated through a predictor–corrector algorithm by imposing the condition  Simo and Hughes (1998). In the case of von Mises plasticity, the volumetric plastic deformation is zero, . It can be shown that the plastic multiplier  is equal to the equivalent plastic strain , where are the components of deviatoric strain tensor, .

Therefore, the elastoplastic relations for a plane-strain problem can be summarized as:


subject to the elastoplasticity conditions:


also known as Karush–Kuhn–Tucker (KKT) conditions. For the von Mises model, the plastic multiplier can be expressed as


where is the total equivalent strain, i.e., . Therefore, the parameters of this model are the Lamé elastic parameters and , and the yield stress .

4.2 Neural Network setup

The solution variables for a two-dimensional problem are , , , , , , , , , , , , . Since the out-of-plane components are no longer zero, they must be reflected in the choice of independent networks. Following the discussions for the linear elasticity case, we approximate the displacement and stress components with nonlinear neural networks as:


The associated cost function is then defined as


The KKT positivity and negativity conditions are imposed through a penalty constraint in the loss function. For instance, is incorporated in the loss as . Therefore, for values of , the resulting ‘cost’ is , which should vanish.

4.3 Illustrative example

We use a classic example to illustrate our framework: a perforated strip subjected to uniaxial extension Zienkiewicz et al. (1969); Simo and Hughes (1998). Consider a plate of dimensions , with a circular hole of diameter located in the center of the plate. The plate is subjected to extension displacements of along the short edge, under plane-strain condition, and without body forces, . The parameters are , and . Due to symmetry, only a quarter of the domain needs to be considered in the simulation. The synthetic data is generated from a high-fidelity FEM simulation using COMSOL software COMSOL (2020) on a mesh of 13041 quartic triangular elements (Fig. 15). The plate undergoes significant plastic deformation around the circular hole, as can be seen from contours in Fig. 15. This results in localized deformation in the form of a shear band. While the strain exhibits localization, the stress field remains continuous and smooth—a behavior that is due to the choice of a perfect-plasticity model with no hardening.

We use 2,000 data points from this reference solution, randomly distributed in the simulation domain, to provide the training data. The PINN training is performed using networks with 4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. The optimization parameters are the same as those used for the linear elasticity problem. The results predicted by the PINN approach match the reference results very closely, as evidenced by: (1) the very small errors in each of the components of the solution, except for the out-of-plane plastic strain components (Fig. 16); and (2) the precise identification of yield stress and relatively accurate identification of elastic parameters and , yielding estimated values , and .

Figure 15: Reference solution of extension loading of a perforated plate from a high-fidelity FEM simulation. The true parameters are , and .
Figure 16: Error in predicted values from the PINN framework for displacements, strains, plastic strains and stresses.

5 Conclusions

We study the application of a class of deep learning, known as Physics-Informed Neural Networks (PINN), for solution and discovery in solid mechanics. In this work, we formulate and apply the framework to a linear elastostatics problem, which we analyze in detail, but then illustrate the application of the method to nonlinear elastoplasticity. We study the sensitivity of the proposed framework to noise in data coming from different numerical techniques. We find that the optimizer performs much better on data from high-order classical finite elements, or with methods with enhanced continuity such as Isogeometric Analysis. We analyze the impact of the size and depth of the network, and the size of the dataset from uniform sampling of the numerical solution—an aspect that is important in practice given the cost of a dense monitoring network. We find that the proposed PINN approach is able to converge to the solution and identify the parameters quite efficiently with as little as 100 data points.

We also explore transfer learning, that is, the use a pre-trained neural network to perform training on new datasets with different parameters. We find that training converges much faster when this is done. Lastly, we study the applicability of the model as a surrogate model for sensitivity analysis. To this end, we introduce shear modulus  as an input variable to the network. When training only on four values of , we find that the network predicts the solution quite accurately on a wide range of values for , a feature that is indicative of the robustness of the approach.

Despite the success exhibited by the PINN approach, we have found that it faces challenges when dealing with problems with discontinuous solutions. The network architecture is less accurate on problems with localized high gradients as a result of discontinuities in the material properties or boundary conditions. We find that, in those cases, the results are artificially diffuse where they should be sharp. We speculate that the underlying reason for this behavior is the particular architecture of the network, where the input variables are only the spatial dimensions ( and ), rendering the network unable to produce the required variability needed for gradient-based optimization that would capture solutions with high gradients. Addressing this extension is an exciting avenue for future work in machine-learning applications to solid mechanics.


This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale Reservoir Science’.


  • M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng (2016) TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, pp. 265–283. External Links: ISBN 978-1-931971-33-1, Link Cited by: §1, §2.2, §3.2.
  • Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner (2019) Learning data-driven discretizations for partial differential equations. Proceedings of the National Academy of Sciences 116 (31), pp. 15344–15349. External Links: Document, ISSN 0027-8424, Link, https://www.pnas.org/content/116/31/15344.full.pdf Cited by: §1.
  • A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind (2017) Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research 18 (1), pp. 5595–5637. External Links: Link Cited by: §2.2, §3.2.
  • Y. Bazilevs, V. M. Calo, J. A. Cottrell, J. A. Evans, T. J. R. Hughes, S. Lipton, M. A. Scott, and T. W. Sederberg (2010) Isogeometric analysis using T-splines. Computer Methods in Applied Mechanics and Engineering 199 (5-8), pp. 229–263. External Links: Link, Document Cited by: §3.5.
  • K. J. Bergen, P. A. Johnson, M. V. de Hoop, and G. C. Beroza (2019) Machine learning for data-driven discovery in solid earth geoscience. Science 363 (6433). External Links: Document, Link, https://science.sciencemag.org/content/363/6433/eaau0323.full.pdf Cited by: §1.
  • J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio (2010) Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), Vol. 4. Cited by: §1, §2.2.
  • C. M. Bishop (2006) Pattern recognition and machine learning. Springer-Verlag, Berlin, Heidelberg. External Links: ISBN 0387310738, Document, Link Cited by: §1, §3.3, §3.3, §3.4.
  • M. P. Brenner, J. D. Eldredge, and J. B. Freund (2019) Perspective on machine learning for advancing fluid mechanics. Physical Review Fluids 4 (10), pp. 100501. External Links: Document, Link Cited by: §1.
  • S. L. Brunton and J. N. Kutz (2019) Methods for data-driven multiscale model discovery for materials. Journal of Physics: Materials 2 (4), pp. 044002. External Links: Document, Link Cited by: §1.
  • S. L. Brunton, B. R. Noack, and P. Koumoutsakos (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52 (1), pp. 477–508. External Links: Document, Link Cited by: §1.
  • K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh (2018) Machine learning for molecular and materials science. Nature 559 (7715), pp. 547–555. External Links: Document, ISSN 14764687, Link Cited by: §1.
  • T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang (2015) MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. External Links: 1512.01274 Cited by: §2.2.
  • F. Chollet et al. (2015) Keras. Cited by: §2.3, §3.2.
  • COMSOL (2020) COMSOL Multiphysics user’s guide. COMSOL, Stockholm, Sweden. Cited by: §3.4, §4.3.
  • J. A. Cottrell, T. J. Hughes, and Y. Bazilevs (2009) Isogeometric analysis: toward integration of CAD and FEA. John Wiley & Sons. External Links: Document Cited by: §1.
  • P. M.R. DeVries, F. Viégas, M. Wattenberg, and B. J. Meade (2018) Deep learning of aftershock patterns following large earthquakes. Nature 560 (7720), pp. 632–634. External Links: Document, ISSN 14764687, Link Cited by: §1.
  • J. Duchi, E. Hazan, and Y. Singer (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (Jul), pp. 2121–2159. External Links: Link Cited by: §2.3.
  • J. Ghaboussi and D. Sidarta (1998) New nested adaptive neural networks (NANN) for constitutive modeling. Computers and Geotechnics 22 (1), pp. 29–52. Cited by: §1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. External Links: ISBN 9781405161251, Link, Document Cited by: §1, §3.3.
  • E. Haghighat and R. Juanes (2019) SciANN: a Keras wrapper for scientific computations and physics-informed deep learning using artificial neural networks. Note: https://sciann.com External Links: Link Cited by: §3.2.
  • J. Han, A. Jentzen, and W. E (2018) Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. External Links: Document, Link Cited by: §1, §1.
  • T.J.R. Hughes, J.A. Cottrell, and Y. Bazilevs (2005) Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering 194 (39), pp. 4135–4195. External Links: ISSN 0045-7825, Document, Link Cited by: §1.
  • S. R. Kalidindi, S. R. Niezgoda, and A. A. Salem (2011) Microstructure informatics using higher-order statistics and efficient data-mining protocols. JOM 63 (4), pp. 34–41. External Links: Document, ISBN 1543-1851, Link, Document Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. External Links: 1412.6980 Cited by: §2.3.
  • Q. Kong, D. T. Trugman, Z. E. Ross, M. J. Bianco, B. J. Meade, and P. Gerstoft (2018) Machine learning in seismology: turning data into insights. Seismological Research Letters 90 (1), pp. 3–14. External Links: Document, ISSN 0895-0695 Cited by: §1.
  • I. E. Lagaris, A. Likas, and D. I. Fotiadis (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks 9 (5), pp. 987–1000. External Links: Document, ISSN 1941-0093, Link Cited by: §1.
  • I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou (2000) Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks 11 (5), pp. 1041–1049. External Links: Document, ISSN 10459227, Link Cited by: §1.
  • S. Lange, T. Gabel, and M. Riedmiller (2012) Reinforcement learning. Adaptation, Learning, and Optimization, Vol. 12, Springer Berlin Heidelberg, Berlin, Heidelberg. External Links: Document, ISBN 978-3-642-27644-6, ISSN 18674542, Link Cited by: §1.
  • Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. External Links: Document, Link Cited by: §1.
  • M. W. Libbrecht and W. S. Noble (2015) Machine learning applications in genetics and genomics. Nature Reviews Genetics 16 (6), pp. 321–332. External Links: Document, ISSN 14710064, Link Cited by: §1.
  • A. J. Meade and A. A. Fernandez (1994) The numerical solution of linear ordinary differential equations by feed-forward neural networks. Mathematical and Computer Modelling 19 (12), pp. 1–25. External Links: Document, ISSN 08957177, Link Cited by: §1.
  • M. Mozaffar, R. Bostanabad, W. Chen, K. Ehmann, J. Cao, and M. A. Bessa (2019) Deep learning predicts path-dependent plasticity. Proceedings of the National Academy of Sciences 116 (52), pp. 26414–26420. External Links: Document, ISSN 0027-8424, Link, https://www.pnas.org/content/116/52/26414.full.pdf Cited by: §1.
  • G. Pilania, C. Wang, X. Jiang, S. Rajasekaran, and R. Ramprasad (2013) Accelerating materials property predictions using machine learning. Scientific Reports 3, pp. 1–6. External Links: Document, ISSN 20452322 Cited by: §1.
  • M. H. Rafiei and H. Adeli (2017) A novel machine learning-based algorithm to detect damage in high-rise building structures. Structural Design of Tall and Special Buildings 26 (18), pp. 1–11. External Links: Document, ISSN 15417808 Cited by: §1.
  • M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. External Links: Document, ISSN 10902716, Link Cited by: §1, §1, §1, §2.3.
  • C. X. Ren, O. Dorostkar, B. Rouet-Leduc, C. Hulbert, D. Strebel, R. A. Guyer, P. A. Johnson, and J. Carmeliet (2019) Machine learning reveals the state of intermittent frictional dynamics in a sheared granular fault. Geophysical Research Letters 46 (13), pp. 7395–7403. External Links: Document, Link, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2019GL082706 Cited by: §1.
  • S. Rudy, A. Alla, S. L. Brunton, and J. N. Kutz (2019) Data-driven identification of parametric partial differential equations. SIAM Journal on Applied Dynamical Systems 18 (2), pp. 643–660. External Links: Document, Link Cited by: §1, §1.
  • D. Sen, A. Aghazadeh, A. Mousavi, S. Nagarajaiah, R. Baraniuk, and A. Dabak (2019)

    Data-driven semi-supervised and supervised learning algorithms for health monitoring of pipes

    Mechanical Systems and Signal Processing 131, pp. 524–537. External Links: Document, ISSN 10961216, Link Cited by: §1.
  • Z. Shi, E. Tsymbalov, M. Dao, S. Suresh, A. Shapeev, and J. Li (2019) Deep elastic strain engineering of bandgap through machine learning. Proceedings of the National Academy of Sciences 116 (10), pp. 4117–4122. External Links: Document, ISSN 0027-8424, Link Cited by: §1.
  • J. C. Simo and T. J. R. Hughes (1998) Computational inelasticity. Interdisciplinary Applied Mathematics, Vol. 7, Springer, New York. Cited by: §4.1, §4.3.
  • M. E. Taylor and P. Stone (2009) Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10 (Jul), pp. 1633–1685. External Links: Document, Link Cited by: §1, §3.6.
  • C. E. Yoon, O. O’Reilly, K. J. Bergen, and G. C. Beroza (2015) Earthquake detection through computationally efficient similarity search. Science Advances 1 (11), pp. e1501057. External Links: Document, ISSN 2375-2548, Link Cited by: §1.
  • Y. Zhu, N. Zabaras, P. Koutsourelakis, and P. Perdikaris (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. External Links: Document, Link Cited by: §1.
  • O. Zienkiewicz, S. Valliappan, and I. King (1969) Elasto-plastic solutions of engineering problems ‘initial stress’, finite element approach. International Journal for Numerical Methods in Engineering 1 (1), pp. 75–100. Cited by: §4.3.