Over the past decade, artificial neural networks, also known as deep learning, have revolutionized many computational tasks, including image classification and computer visionBishop2006; NIPS2012_4824; Lecun2015, search engines and recommender systems jannach2010recommender; zhang2019deep, speech recognition graves2013speech, autonomous driving bojarski2016end, and healthcare miotto2018deep (for a review, see, e.g. Goodfellow2016). Even more recently, this data-driven framework has made inroads in engineering and scientific applications, such as earthquake detection Kong2018; Ross2019; Bergen2019, fluid mechanics and turbulence modeling Brenner2019; Brunton2020, dynamical systems Dana2020, and constitutive modeling Tartakovsky2018; Xu2020. A recent class of deep learning known as physics-informed neural networks (PINN) has been shown to be particularly well suited for solution and inversion of equations governing physical systems, in domains such as fluid mechanics Raissi2018; Raissi2018c, solid mechanics Haghighat2020 and dynamical systems Rudy2019bergstra2010theano, Tensorflow abadi2016tensorflow, MXNET chen2015mxnet, and Keras chollet2015keras, which offer features such as high-performance computing and automatic differentiation Gune2018.
Advances in deep learning have led to the emergence of different neural network architectures, including densely connected multi-layer deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs) and residual networks (ResNets). This proliferation of network architectures, and the (often steep) learning curve for each package, makes it challenging for new researchers in the field to use deep learning tools in their computational workflows. In this paper, we introduce an open-source Python package, SciANN, developed on Tensorflow and Keras, which is designed with scientific computations and physics-informed deep learning in mind. As such, the abstractions used in this programming interface target engineering applications such as model fitting, solution of ordinary and partial differential equations, and model inversion (parameter identification).
The outline of the paper is as follows. We first describe the functional form associated with deep neural networks. We then discuss different interfaces in SciANN that can be used to set up neural networks and optimization problems. We then illustrate SciANN’s application to curve fitting, the solution of the Burgers equation, and the identification of the Navier–Stokes equations and the von Mises plasticity model from data. Lastly, we show how to use SciANN in the context of the variational PINN framework Kharazmi2020. The examples discussed here and several additional applications are freely available at github.com/sciann/sciann-applications.
2 Artificial Neural Networks as Universal Approximators
A single-layer feed-forward neural network with inputs, outputs , and hidden units is constructed as:
where (, ), (, ) are parameters of this transformation, also known as weights and biases, and
is the activation function. As shown inHornik1989; Cybenko1989, this transformation can approximate any measurable function, independently of the size of input features or the activation function . If we define the transformation as with as the input to and as the output of any hidden layer , as the main input to the network, and as the final output of the network, we can construct a general -layer neural network as composition of functions as:
with as activation functions that make the transformations nonlinear. Some common activation functions are:
In general, this multilayer feed-forward neural network is capable of approximating functions to any desired accuracy Hornik1989; Hornik1991. Inaccurate approximation may arise due to lack of a deterministic relation between input and outputs, insufficient number of hidden units, inadequate training, or poor choice of the optimization algorithm.
The parameters of the neural network, and of all layers , are identified through minimization using a back-propagation algorithm Rumelhart1986. For instance, if we approximate a field variable such as temperature with a multi-layer neural network as , we can set up the optimization problem as
where is the set of discrete training points, and
is the mean squared norm. Note that one can use other choices for the loss function, such as mean absolute error or cross-entropy. The optimization problem (4) is nonconvex, which may require significant trial and error efforts to find an effective optimization algorithm and optimization parameters.
We can construct deep neural networks with an arbitrary number of layers and neurons. We can also define multiple networks and combine them to generate the final output. There are many types of neural networks that have been optimized for specific tasks. An example is the ResNet architecture introduced for image classification, consisting of many blocks, each of the form:
where is the block number and is the output of previous block, with and as the main inputs to and outputs of the network. Therefore, artificial neural networks offer a simple way of constructing very complex but dependent solution spaces (see, e.g., Fig. 1).
3 SciANN: Scientific Computing with Artificial Neural Networks
SciANN is an open-source neural-network library, based on Tensorflow abadi2016tensorflow and Keras chollet2015keras, which abstracts the application of deep learning for scientific computing purposes. In this section, we discuss abstraction choices for SciANN and illustrate how one can use it for scientific computations.
3.1 Brief description of SciANN
SciANN is implemented on the most popular deep-learning packages, Tensorflow and Keras, and therefore it inherits all the functionalities they provide. Among those, the most important ones include graph-based automatic differentiation and massive heterogeneous high-performance computing capabilities. It is designed for an audience with a background in scientific computation or computational science and engineering.
SciANN currently supports fully connected feed-forward deep neural networks, and recurrent networks are under development. Some architectures, such as convolutional networks, are not a good fit for scientific computing applications and therefore are not currently in our development plans. Tensorflow and Keras provide a wide range of features, including optimization algorithms, automatic differentiation, and model parameter exports for transfer learning.
To install SciANN, one can simply use the Python’s pip package installer as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
It can be imported into the active Python environment using Python’s import module: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Its mathematical functions are located in the sn.math interface. For instance, the function diff is accessed through sn.math.diff. The main building blocks of SciANN include:
sn.Variable: class to define inputs to the network.
sn.Field: class to define outputs of the network.
sn.Functional: class to construct a nonlinear neural network approximation.
sn.Parameter: class to define a parameter for inversion purposes.
sn.Data, sn.Tie: class to define the targets. If there are observations for any variable, the ‘sn.Data’ interface is used when building the optimization model. For physical constraints such as PDEs or equality relations between different variables, the ‘sn.Tie’ interface is designed to build the optimizer.
sn.SciModel: class to set up the optimization problem, i.e. inputs to the networks, targets (objectives), and the loss function.
sn.math: mathematical operations are accessed here. SciANN also support operator overloading, which improves readability when setting up complex mathematical relations such as PDEs.
3.2 An illustrative example: curve fitting
We illustrate SciANN’s capabilities with its application to a curve-fitting problem. Given a set of discrete data, generated from over the domain , we want to fit a surface, in the form of a neural network, to this dataset. A multi-layer neural network approximating the function can be constructed as , with inputs and output . In the most common mathematical and Pythonic abstraction, the inputs and output can be implemented as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
A 3-layer neural network with 6 neural units and hyperbolic-tangent activation function can then be constructed as [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
This definition can be further compressed as [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
At this stage, the parameters of the networks, i.e. set of for all layers, are randomly initialized. Their current values can be retrieved using the command get_weights: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
One can set the parameters of the network to any desired values using the command set_weights.
As another example, a more complex neural network functional as the composition of three blocks, as shown in Fig. 1, can be constructed as [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Any of these functions can be evaluated immediately or after training using the eval function, by providing discrete data for the inputs: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Once the networks are initialized, we set up the optimization problem and train the network by minimizing an objective function, i.e. solving the optimization problem for and . The optimization problem for a data-driven curve-fitting is defined as:
where is the set of all discrete points where is given. For the loss-function , we use the mean squared-error norm . This problem is set up in SciANN through the SciModel class as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The train model is then used to perform the training and identify the parameters of the neural network: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Once the training is completed, one can set parameters of a Functional to be trainable or non-trainable (fixed). For instance, to set to be non-trainable: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The result of this training is shown in Fig. 2
, where we have used 400 epochs to perform the training on a dataset generated using a uniform grid of.
Since data was generated from , we know that this is a solution to , with as the Laplacian operator. As a first illustration of SciANN for physics-informed deep learning, we can constrain the curve-fitting problem with this ‘governing equation’. In SciANN, the differentiation operators are evaluated through sn.math.diff function. Here, this differential equation can be evaluated as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
with order expressing the order of differentiation.
Based on the physics-informed deep learning framework, the governing equation can be imposed through the objective function. The optimization problem can then be defined as
and implemented in SciANN as [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Note that while the inputs are the same as for the previous case, the optimization model is defined with two targets, fxy and L. The training data for fxy remains the same; the sampling grid, however, can be expanded further as ‘physics’ can be imposed everywhere. A sampling grid is used here, where data is only given at the same locations as the previous case, i.e. on the grid. To impose target L, it is simply set to ’zero’. The new result is shown in Fig. 3. We find that, for the same network size and training parameters, incorpo rating the ‘physics’ reduces the error significantly.
Once the training is completed, the weights for all layers can be saved using the command save_weights, for future use. These weights can be later used to initialize a network of the same structure using load_weights_from keyword in SciModel.
4 Application of SciANN to Physics-Informed Deep Learning
In this section, we explore how to use SciANN to solve and discover some representative case studies of physics-informed deep learning.
4.1 Burgers equation
As the first example, we illustrate the use of SciANN to solve the Burgers equation, which arises in fluid mechanics, acoustics, and traffic flow dafermos-claws. Following Raissi2018, we explore the governing equation:
subject to initial and boundary conditions and , respectively. The solution variable can be approximated by , defined in the form of a nonlinear neural network as . The network used in Raissi2018 consists of 8 hidden layers, each with 20 neurons, and with activation function, and can be defined in SciANN as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
To set up the optimization problem, we need to identify the targets. The first target, as used in the PINN framework, is the PDE in Eq. (8), and is defined in SciANN as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
To impose boundary conditions, one can define them as continuous mathematical functions defined at all sampling points:
For instance, is zero at all sampling points except for , which is chosen as . Instead of , one can use smoother functions such as . In this way, the optimization model can be set up as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
In this case, all targets should ‘vanish’, therefore the training is done as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
An alternative approach to define the boundary conditions in SciANN is to define the target in the sn.SciModel as the variable of interest and pass the ‘ids’ of training data where the conditions should be imposed. This is achieved as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Here, ids_ic_bc are ids associated with collocation points (t_data, x_data) where the initial condition and boundary condition are given. An important point to keep in mind is that if the number of sampling points where boundary conditions are imposed is a very small portion, the mini-batch optimization parameter batch_size should be set to a large number to guarantee consistent mini-batch optimization. Otherwise, some mini-batches may not acquire any data on the boundary and therefore not generate the correct gradient for the gradient-descent update. Also worth noting is that setting governing relations to ‘zero’ is conveniently done in SciANN.
The result of solving the Burgers equation using the deep learning framework is shown in Fig. 4. The results match the exact solution accurately, and reproduce the formation of a shock (self-sharpening discontinuity) in the solution at .
4.2 Data driven discovery of Navier–Stokes equations
As a second example, we show how SciANN can be used for discovery of partial differential equations. We choose the incompressible Navier–Stokes problem used in Raissi2018. The equations are:
where and are components of velocity field in and directions, respectively, is the density-normalized pressure, should be identically equal to 1 for Newtonian fluids, and is the kinematic viscosity. The true value of the parameters to be identified are and . Given the assumption of fluid incompressibility, we use the divergence-free form of the equations, from which the components of the velocity are obtained as:
where is the potential function.
Here, the independent field variables and are approximated as and , respectively, using nonlinear artificial neural networks as and . Using the same network size and activation function that was used in Raissi2018c, we set up the neural networks in SciANN as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Note that this way of defining the networks results in two separate networks for and , which we find more suitable for many problems. To replicate the one-network model used in the original study, one can use: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Here, the objective is to identify parameters and of the Navier–Stokes equations (10) on a dataset with given velocity field. Therefore, we need to define these as trainable parameters of the network. This is done using sn.Parameter interface as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Note that these parameters are initialized with a value of . The required derivatives in Equations (10) and (11) are evaluated as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
with ‘order’ indicating the order of differentiation. We can now set up the targets of the problem as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The optimization model is now set up as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
where only training points for and are provided, as in Raissi2018c. The results are shown in Fig. 5.
4.3 Discovery of nonlinear solid mechanics with von Mises plasticity
Here, we illustrate the use of PINN for solution and discovery of nonlinear solid mechanics. We use the von Mises elastoplastic constitutive model, which is commonly used to describe mechanical behavior of solid materials, in particular metals. Elastoplasticity relations give rise to inequality constraints on the governing equations simohughes-ci, and, therefore, compared to the Navier–Stokes equations, they pose a different challenge to be incorporated in PINN. The elastoplastic relations for a plane-strain problem are:
Here, the summation notation is used with .
are components of the Cauchy stress tensor, andand are its deviatoric components and its pressure invariant, respectively. are components of the infinitesimal strain tensor derived from the displacements , and and are its deviatoric and volumetric components, respectively.
According to the von Mises plasticity model, the admissible state of stress is defined inside the cylindrical yield surface as . Here, is the equivalent stress defined as . Assuming the associative flow rule, the plastic strain components are:
where is the equivalent plastic strain, subject to . For the von Mises model, it can be shown that is evaluated as
where is the total equivalent strain, defined as . Note that for von Mises plasticity, the volumetric part of plastic strain tensor is zero, . Finally, the parameters of this model include the Lamé elastic parameters and , and the yield stress .
We use a classic example to illustrate our framework: a perforated strip subjected to uniaxial extension zienkiewicz1969elasto; simohughes-ci. Consider a plate of dimensions , with a circular hole of diameter located in the center of the plate. The plate is subjected to extension displacements of along the short edge, under plane-strain condition, and without body forces, . The parameters are , and . Due to symmetry, only a quarter of the domain needs to be considered in the simulation. The synthetic data is generated from a high-fidelity FEM simulation using COMSOL software COMSOL on a mesh of approximately 13,000 quartic triangular elements. The plate undergoes significant plastic deformation around the circular hole. This results in localized deformation in the form of a shear band. While the strain exhibits localization, the stress field remains continuous and smooth—a behavior that is due to the choice of a perfect-plasticity model with no hardening.
Following the approach proposed in Haghighat2020, we approximate displacement and stress components with nonlinear neural networks as:
Note that due to plastic deformation, the out-of-plane stress is not predefined, and therefore we also approximate it with a neural network. These neural networks and parameters are defined as follows: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The kinematic relations, deviatoric stress components and plastic strains can be defined as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The operator-overloading abstraction of SciANN improves readability significantly. Assuming access to the measured data for variables , , , , , , , , , the optimization targets for training data can be described using the L* = sn.Data(*), where refers to each variable. The physics-informed constraints are set as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
We use 2,000 data points from this reference solution, randomly distributed in the simulation domain, to provide the training data. The PINN training is performed using networks with 4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. The optimization parameters are the same as those used in Haghighat2020. The results predicted by the PINN approach match the reference results very closely, as evidenced by: (1) the very small errors in each of the components of the solution, except for the out-of-plane plastic strain components (Fig. 6); and (2) the precise identification of yield stress and relatively accurate identification of elastic parameters and
, yielding estimated values, and .
5 Application to Variational PINN
Neural networks have recently been used to solve the variational form of differential equations as well Weinan2018; Berg2018. In a recent study Kharazmi2020, the vPINN framework for solving PDEs was introduced and analyzed. Like PINN, it is based on graph-based automatic differentiation. The authors of Kharazmi2020 suggest a Petrov–Galerkin approach, where the test functions are chosen differently from the trial functions. For the test functions, they propose the use of polynomials that vanish on the boundary of the domain. Here, we illustrate how to use SciANN for vPINN, and we show how to construct proper test functions using neural networks.
Consider the steady-state heat equation subject to Dirichlet boundary conditions and a known heat source Kharazmi2020:
subject to the following boundary conditions:
and a heat source:
The analytical solution to this problem is:
The weak form of Eq. (16) is expressed as:
where is the domain of the problem, is the boundary of the domain, is the boundary heat flux, and is the test function. The trial space for the temperature field is constructed by a neural network as . For the test space , the authors of Kharazmi2020 suggest the use of polynomials that satisfy the boundary conditions. However, considering the universal approximation capabilities of the neural networks, we suggest that this step is unnecessary, and a general neural network can be used as the test function. Note that test functions should satisfy continuity requirements as well as boundary conditions. A multi-layer neural network with any nonlinear activation function is a good candidate for the continuity requirements. To satisfy the boundary conditions, we can simply train the test functions to vanish on the boundary. Note that this step is associated to the construction of proper test function and is done as a preprocessing step. Once the test functions satisfy the (homogeneous) boundary conditions, there is no need to further train them, and therefore their parameters can be set to non-trainable at this stage. We also find that there is no need for the and networks to be of the same size, or use the same activation functions.
Therefore, the test function is defined as subject to . Here, overbar weights and biases indicate that their values are predefined and fixed (non-trainable). Therefore, the boundary flux integral on the right side of Eq. (20) vanishes, and the resulting week form can be expressed as
The problem can be defined in SciANN as follows. The first step is to construct proper test function: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
As discussed earlier, Q_data takes a value of for training points on the boundary and random values at interior quadrature points. Additionally, parameters of are set to non-trainable at the end of this step. The trial function and the target weak form in Eq. (21) are now implemented as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
Since the variational relation (21) takes an integral form, we need to perform a domain integral. Therefore, the volume information should be passed to the network along with the body-force information at the quadrature points. This is achieved by introducing two new SciANN variables as the inputs to the network. The optimization model is then defined as: [backgroundcolor=mintedbg, linecolor=mintedbg, innerleftmargin=0, innertopmargin=0,innerbottommargin=0]
The second target on imposes the boundary conditions at specific quadrature points bc_ids.
Following the details in Kharazmi2020, we perform the integration on a grid. The results are shown in Fig. 7, which are very similar to those reported in Kharazmi2020.
In this paper, we have introduced the open-source deep-learning package, SciANN, designed specifically to facilitate physics-informed simulation, inversion, and discovery in the context of computational science and engineering problems. It can be used for regression and physics-informed deep learning with minimal effort on the neural network setup. It is based on Tensorflow and Keras packages, and therefore it inherits all the high-performance computing capabilities of Tensorflow back-end, including CPU/GPU parallelization capabilities.
The objective of this paper is to introduce an environment based on a modern implementation of graph-based neural network and automatic differentiation, to be used as a platform for scientific computations. In a series of examples, we have shown how to use SciANN for curve-fitting, solving PDEs in strong and weak form, and for model inversion in the context of physics-informed deep learning. The examples presented here as well as the package itself are all open-source, and available in the github repository github.com/sciann.
This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale Reservoir Science’.