A Graph-based probabilistic geometric deep learning framework with online physics-based corrections to predict the criticality of defects in porous materials

05/13/2022
by   Vasilis Krokos, et al.
0

Stress prediction in porous materials and structures is challenging due to the high computational cost associated with direct numerical simulations. Convolutional Neural Network (CNN) based architectures have recently been proposed as surrogates to approximate and extrapolate the solution of such multiscale simulations. These methodologies are usually limited to 2D problems due to the high computational cost of 3D voxel based CNNs. We propose a novel geometric learning approach based on a Graph Neural Network (GNN) that efficiently deals with three-dimensional problems by performing convolutions over 2D surfaces only. Following our previous developments using pixel-based CNN, we train the GNN to automatically add local fine-scale stress corrections to an inexpensively computed coarse stress prediction in the porous structure of interest. Our method is Bayesian and generates densities of stress fields, from which credible intervals may be extracted. As a second scientific contribution, we propose to improve the extrapolation ability of our network by deploying a strategy of online physics-based corrections. Specifically, we condition the posterior predictions of our probabilistic predictions to satisfy partial equilibrium at the microscale, at the inference stage. This is done using an Ensemble Kalman algorithm, to ensure tractability of the Bayesian conditioning operation. We show that this innovative methodology allows us to alleviate the effect of undesirable biases observed in the outputs of the uncorrected GNN, and improves the accuracy of the predictions in general.

READ FULL TEXT VIEW PDF

page 11

page 17

page 22

page 26

page 27

page 28

page 31

page 34

12/17/2020

A Bayesian multiscale CNN framework to predict local stress fields in structures with microscale features

The purpose of this work is to train an Encoder-Decoder Convolutional Ne...
07/21/2021

Graph neural networks for emulating crack coalescence and propagation in brittle materials

High-fidelity fracture mechanics simulations of multiple microcracks int...
11/20/2020

StressNet: Deep Learning to Predict Stress With Fracture Propagation in Brittle Materials

Catastrophic failure in brittle materials is often due to the rapid grow...
06/14/2019

Machine Learning Approach to Earthquake Rupture Dynamics

Simulating dynamic rupture propagation is challenging due to the uncerta...
07/12/2022

Machine Learning model for gas-liquid interface reconstruction in CFD numerical simulations

The volume of fluid (VoF) method is widely used in multi-phase flow simu...
09/26/2019

Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

We address the problem of accelerating thin-shell deformable object simu...

1 Introduction

Multiscale computational analysis is very common in various engineering areas such as geo-engineering (heterogeneous soil layers and subterranean cavities), bio-mechanical engineering (bones and soft tissues) and mechanical engineering (composite and architectured materials). Finite Element Analysis (FEA) in these structures is notoriously challenging since the FE mesh required to fully resolve the finer scale features has to be very dense resulting into intractable computational problems.

Common approaches to tackle multiscale problems usually fall either in the homogenisation or concurrent scale coupling approaches. In both cases the multiscale problem is split into a macroscale and a microscale problem. The macroscale problem diffuses the overall stress field in the entire structure, and the microscale computations are performed to correct the macroscale fields. In homogenisation it is assumed that the gradients of the macroscale displacement are homogeneous over the entire material sample (i.e. the representative volume element, RVE) [Sanchez-Palencia, 1987; Zohdi and Wriggers, 2005]. If that assumption does not hold, for instance due to fast macroscale gradients induced by sharp macroscale features, then concurrent scale coupling methods are employed that are usually more expensive and intrusive compared to RVE based methods. In these techniques the results of homogenisation are applied to the boundary of regions of interest for concurrent microscale corrections to be performed [Raghavan and Ghosh, 2004; Oden et al., 2006; Kerfriden et al., 2009; Hesthaven et al., 2015; Paladim et al., 2016; Krokos et al., 2021] which may be performed in a purely downward approach [Oden et al., 1999], or with 2-way coupling [Gendre et al., 2009]

In this work we propose a Neural Network (NN) based concurrent scale coupling method to tackle 3D multiscale problems and specifically we focus on porous media with no separation of scales. Neural Networks (NNs) have been successfully employed in solving engineering problems by reproducing the output of simulation codes in a variety of fields like fluid dynamics [Lye et al., 2020; Raissi et al., 2020], solid mechanics [Nie et al., 2019; Jiang et al., 2020; Deshpande et al., 2021] and computational biology [Senior et al., 2019, 2020]. Most of the publications that focus on learning the response of FE models employ Convolutional Neural Networks (CNNs), that are very efficient in working with images [Nie et al., 2019; Mendizabal et al., 2020; Sun et al., 2020].

Unfortunately, most of the aforementioned works deal with 2D problems. One of the reasons, in our opinion, is that 3D CNNs are very expensive and thus the literature on the subject is limited [Mendizabal et al., 2020; Rao and Liu, 2020] and mostly focused on segmentation [Buda et al., 2019; Lu et al., 2019] and classification tasks [Khosla et al., 2018; Vu et al., 2020]. An important reason for this is that representing arbitrary geometries with voxels is highly inefficient because the resolution of the image has to be constant while the geometry of the object can be better described with a variable resolution. Specifically, for multiscale problems, this means that the voxel size needs to be very small to capture the microscale features. This results in a very computationally wasteful data representation that uses a large number of voxels to represent areas with no microscale features that could be represented with only a few voxels. This is also true for CNNs applied to 2D problems, but the computational cost is disproportionately larger in 3D problems. We will follow a different approach, that of geometric learning.

Another problem with current deep learning models is their poor generalisation in cases where the training dataset is small or different from the cases we want to predict. The problem is more prominent when the computational model is very expensive and cannot be evaluated multiple times or when no computational model is in place and experiments need to be performed. Consequently, it is difficult to create a representative and large enough training dataset to get reasonable predictions.

Our paper addresses these two points by extending our previous work [Krokos et al., 2021], while adding two novel algorithmic elements. First, we manage to tackle 3D problems by using geometric learning where instead of voxelised images we learn directly from the mesh that is used to perform the FE simulations. Second, we use physics based online corrections to improve the generalisation of the NN without the need of creating more data.

Geometric learning can be seen as an extension of previously established NN techniques from Euclidean, grid structures to non-Euclidean, graph and manifold structures. The NNs that perform geometric learning are called Graph Neural Networks (GNNs). GNNs can overcome the constant voxel resolution problem by operating directly on the FE mesh used to solve the FE problem. The FE mesh can have a large number of triangles or tetrahedra in areas where the geometry is complex or large gradients are expected during the FEA and a small number of triangles or tetrahedra in the rest of the cases. This adaptive resolution strategy allows for a very efficient use of computational resources. Additionally, because we are primarily interested in predicting the maximum stress, GNNs allow us to further reduce the computational cost. Specifically, we show that the maximum stress, in the cases we examine, is present on the surface and not the volume of the mesh [3.2] and thus we choose to extract and work only on the surface mesh which greatly reduces the training time.

We use Ensemble Kalman method [Evensen, 1994] as the online correction technique. The difference with respect to the original Kalman method is that the covariance matrix is replaced by the sample covariance. This reduces the computational cost and makes this variant suitable for problems with a large number of variables such as in our case where the number of variables is equal to the number of nodes on the FE mesh used to solve the PDE. In order to use the Ensemble Kalman method we need to convert the GNN into a Bayesian GNN (BGNN). In this work we use the Bayes by Backprop (BBB) method [Blundell et al., 2015]

to quantify the uncertainty of the GNN and convert it into a BGNN. The online correction step takes as input the BGNN output, prior distribution, and performs an update step using Bayes’ Theorem, which produces the updated distribution, posterior, that satisfies the desirable physical law. This technique allows us to improve the BGNN prediction by taking advantage of the physics in place without creating new data or retraining the BGNN.

BGNNs are a very useful tool even outside the online correction framework, since they can provide credible intervals (CIs) along with a mean prediction. When the BGNN makes a prediction for a case that is different from the cases in the training dataset the CIs will be broad reflecting the fact that the BGNN is uncertain for the prediction. In case where the CIs are very tight this implies that the BGNN is certain for the prediction. The BGNN not only warns the user for the potential uncertainty in the prediction but it also attempts to squeeze the correct value between an upper and a lower value.

Our paper is organised as follows. In section [2] we present the reference multiscale mechanical model that we aim to solve online, and we introduce useful definitions for our methodology. In section [3] we discuss the GNN architecture and the input and output features. Moreover, in section [4] we explain the strategy that we follow to convert the deterministic GNN to a BGNN. Furthermore, in section [4.2] we introduce the online correction method we use. Finally, in section [5] we apply our multiscale GNN method to two different problems, we demonstrate the effectiveness of the GNN on accurately predicting the maximum stress values and the stress distributions and also we demonstrate the online stress correction technique that we use to improve the GNN prediction in common cases where NNs fail namely small training dataset and extrapolation.

2 Deep-learning-based meta-modelling for multiscale finite element analysis

2.1 Problem statement: linear elasticity

We consider a 3D body occupying domain with a boundary . The body is subjected to prescribed displacements on its boundary and prescribed tractions on the complementary boundary = . We consider a body force f and a displacement . For an initial configuration x, the boundary value problem of linear elasticity consists in finding where the potential energy is defined as

(1)

We consider a linear Saint-Venant–Kirchhoff material model defined by its strain energy density

(2)

In the equations above, and are the Lamé elasticity parameters and

the linearised strain tensor defined as

(3)

The Cauchy stress tensor is calculated as follows:

(4)

2.2 Case study: mechanical specimen made of a porous material

For the purpose of training the GNN we need to create multiple realisations of a multiscale geometry with both macroscale and microscale features. In this work we consider a structure that we will refer to from now on as dogbone [Fig 1]. The dogbone specimen has a cylindrical hole as a macroscale feature. The material is porous, made of randomly distributed spherical pores as microscale features.

Figure 1: In the top subfigure, we can see a realisation of the dogbone structure. The dogbone is porous and it has a cylindrical hole in the middle. The porous phase is geometrically defined as the union of randomly distributed spheres. The spheres are allowed to intersect with each other, the boundaries of the dogbone and the cylindrical hole. In the bottom subfigure, we see the mesh of the dogbone. We observe that the mesh is denser in the middle of the structure, where the porous phase is present, and coarser everywhere else.

2.3 Mesh and Finite element solver

The FE meshes are created using the open source 3D finite element mesh generator gmsh

[Geuzaine and Remacle, 2009], using pygmsh [Schlömer, 2021] as a python interface. Unstructured tetrahedral meshes will be used.

The FE problems, comprising the macroscopic surrogates models and the full fine-scale models (as defined in the next subsection [2.4

]), will be solved using the FEniCS software which is a python based popular open-source computing platform for solving partial differential equations (PDEs)

[Alnæs et al., 2015; Logg et al., 2012]. We use standard linear tetrahedral elements to solve the elasticity problem.

Meshes are manipulated using the python package pyvista [Sullivan and Kaszynski, 2019] that allows for easy interaction with VTK objects through Numpy arrays.

2.4 Global-local framework

The multiscale framework that we use here is a direct extension of our previous work [Krokos et al., 2021]. As mentioned in section [2.2], the examples that we show in this work are dedicated to linear elasticity for porous media made of a homogeneous matrix with a random distribution of microscale features that are modelled as spherical voids.

Firstly, we use a coarse finite element mesh that ignores the existence of the microscale features, to obtain the macroscale solution, with a small computational cost, over the entire domain . Then, we assume there exists a function that takes as input the geometry of the microscale features and the local macroscale solution in a window , which we call patch, and outputs the microscale stress field in a sub-region of namely that we call Region of Interest (ROI) [Fig 2].

The function will be approximated through a GNN. The existence of such function is not proved mathematically but it is supported by the Saint-Venant’s principle, our previous work [Krokos et al., 2021] and will be shown to be valid via numerical experiments in the context of the present paper, which extends previous findings to the three-dimensional setting.

Figure 2: Porous material, , with patches, . We can observe a dogbone structure where we have extracted 2 patches. With green we can see the ROI of the patches,

2.5 Quantity of interest and accuracy measure

We are interested in predicting a mechanical quantity that indicates potential crack initiation sites. In this work we choose the Von Mises stress (VM) for this purpose, which can be calculated from the following formula

(5)

Additionally, we will define accuracy in a way that serves the purpose of accurately predicting the maximum Von Mises stress in the patches. We consider that the maximum Von Mises stress is accurately calculated in the ROI of the patch if it is predicted with a relative error less than the threshold of 10%, unless stated otherwise. This procedure is summarised in [Appendix A].

3 Graph-based geometric deep learning

In this section we briefly review geometric learning methods reported in literature. Also, we explain the geometric learning framework that we choose to use and finally we discuss the structure of the GNN along with the input and output features.

3.1 Brief review of earlier work on geometric deep learning

GNNs have been vastly developed over the last years with various works on segmentation [Hanocka et al., 2019; Schult et al., 2020; Lei et al., 2021] and shape correspondence or retrieval tasks [Masci et al., 2018; Gong et al., 2020]. One early attempt of deep learning in Non-Euclidean spaces is introduced by [Qi et al., 2017] where a method for geometric learning using point clouds is described. Even though point clouds is a very simple way to describe 3D objects, a more common and informative way is by using a mesh. One common approach for geometric learning with meshes is described in [Hanocka et al., 2019] where a generic framework for processing 3D models for classification and segmentation tasks is described. MeshCNN operates directly on the mesh and not a voxel based representation of the object, which leads to increased computational cost, or a point cloud, which results in loss of valuable topology information.

GNNs have also been used for various engineering applications. [Vlassis et al., 2020] used geometric deep learning to model anisotropic hyperelastic materials. Specifically they modelled polycrystals using a graph where the nodes represent the monocrystals and the edges their connectivity. This utilization of non-Euclidean data allowed them to take advantage of the rich micro structure information not described by classical Euclidean descriptors like density and porosity. Additionally, [Sanchez-Gonzalez et al., 2020] used GNNs to simulate particle based dynamic systems where 1 or more types of particles, ranging from water to sand, are involved. The physical state of the system is described by a graph of interacting particles and the physics is predicted by message passing through the nodes (particles) of this graph. Later, [Pfaff et al., 2021] used GNNs for mesh based dynamic simulations. In contrast to particle based problems, in this work the authors took advantage of the connectivity of the mesh to define the edges of the graph. They demonstrated their method in a variety of applications such as deformable bodies, fluid flow around obstacle and air flow around an aircraft wing. Specifically for the air flow around an aircraft wing they show that GNNs have superior performance compared to CNNs. Recently, [Mylonas et al., 2022] used a Bayesian GNN to infer the position and shape of an unknown crack via patterns of dynamic strain field measurements at discrete locations. Lastly, [Lino et al., 2021] developed a multiscale GNN that efficiently diffuses information across different scales making it ideal for tackling strongly non local problems such as advection and incompressible fluid dynamics.

Most of these methods are using the formulation described in [Battaglia et al., 2018] which provides a general framework to work with GNNs. We use the same formulation since it allows for a very natural encoding of information on the mesh where absolute information, for instance material properties, can be encoded on the vertices of the mesh while relative information, for example distance between vertices, is encoded on the edges of the mesh.

3.2 Geometric learning for multiscale stress analysis: assumptions and justification

As can be seen from the literature review, researchers can take advantage of the flexibility that GNNs offer, to reduce the computational cost by modelling their system in more meaningful ways that stem from the physical understanding of the underlying problem, as for example [Vlassis et al., 2020] who modelled the behaviour of a polycrystal by considering every crystal as a node of the graph instead of operating on the volume mesh. Following this paradigm, for reasons we explain in the next paragraphs, we choose to work only on the surface mesh of the porous medium and not the volume mesh.

First of all, we claim that the maximum stress in the cases we examine is located at the surface of the structure, not in the bulk of it. In order to test this hypothesis we conduct an experiment where we perform 100 FE simulations with 100 realisations of the dogbone structure. For the boundary conditions we apply a displacement along the and direction of the same magnitude but with opposite direction. Afterwards, we extract from the volume mesh the surface mesh with the stress tensors encoded on the nodes of the surface mesh [Appendix B]. Lastly, we calculate the maximum Von Mises stress both on the volume and surface mesh and we compare the two. From [Fig 3] we can conclude that indeed the maximum volume stress is the same as the maximum surface stress.

Additionally, we hypothesise that the surface mesh is a comprehensive representation of the geometry of the specimen. Lastly, we assume that the trace of the, smooth, macroscale stress over the skin mesh is sufficient to inform the GNN of the macroscopic stress state over the patch. We prove these 2 hypotheses numerically in the context of the current paper.

Consequently, we choose to work only on the surface mesh since we are primarily interested in predicting the maximum stress. This greatly reduces the memory requirements and the training time.

Figure 3: The x axis corresponds to the maximum Von Mises stress on the volume mesh and the y axis to the maximum Von Mises stress on the surface mesh for each of the 100 performed simulations. The red line is the line. We can observe that all the points lay on this line and thus the maximum volume and surface stress values are the same.

3.3 Graph construction

In our attempt to perform geometric learning on the surface mesh of a dogbone structure we consider a graph that can be described by a set of nodes where is the number of nodes of and a set of edges where is the number of edges of .

The nodes of the graph coincide with the nodes of the surface mesh, since their position is optimised by the mesh generation software to best describe the geometry of the structure. The edges of the graph are used to pass messages between the nodes and they define the connectivity of the graph. Consequently, the edges of the graph cannot coincide with the edges of the surface mesh since the surface mesh has disconnected areas. This means that it is possible for areas of the surface mesh that are close to each other to not share an edge, for example two interacting but non intersecting defects. This does not allow passing of information between the two and thus their interaction cannot be modelled. To overcame this problem we define 2 different neighbourhoods [Fig 4].

  • Geodesic: The Geodesic, or 1-ring neighbourhood, includes the nodes that share an edge with the central node. This is equivalent to the neighbourhood dictated by the surface mesh connectivity.

  • Euclidean: The Euclidean neighbourhood in this work is defined using the radius neighbours. Every node , is connected with every node that falls into a sphere with a predefined radius and with centre . The predefined radius that is used for this work is 4 times the radius of the spherical voids as suggested in [Krokos et al., 2021]. In practice that results into a very high number of neighbours and thus into a very expensive computational problem. To tackle this problem we need to restrict the number of neighbours for each node by setting a threshold and randomly choosing neighbours so that the total number of neighbours is below the threshold. In section [D.5] we discuss the most appropriate value for this threshold.

Figure 4: On the left we show a cross section of the surface mesh depecting a rectangle with 2 spherical features. With red with see the node that is considered the central node for the convolutional neighbourhood. On the right we can see 2 possible neighbourhoods for the central node. The Geodesic (1-ring) neighbourhood on top and the Euclidean neighbourhood at the bottom with a threshold for the maximum number of neighbours equal to 3. In the Geodesic neighbourhood there is no edge between the two spherical features and thus message passing between the two is not possible.

In this work we use either Euclidean or a combination of Euclidean and Geodesic neighbours, as proposed by [Schult et al., 2020], both of which allow the modeling of interactions between non-intersecting parts of the mesh.

The edges of the graph are bidirectional so that the information can be exchanged both ways between two connected nodes. This means that for every edge passing a message from node to there exists the edge passing a message from node to .

3.4 Input and Output features

The framework we use requires the input and output graphs to be of the same structure. Consequently, we choose to interpolate the 3D FE solution from the macroscale mesh, i.e. mesh without spherical voids, to the microscale mesh, i.e. mesh with the spherical voids.

For the input of the GNN we encode node features, where , on the nodes of the graph and edge features, where , on the edges of the graph. The input node features are the independent components of the macroscale stress tensor along with the microscale feature indicator, a single integer indicating if the node corresponds to a microscale feature or not. The input edge features are the relative position between the 2 nodes that the edge is connecting along with the distance between the 2 nodes. This is summarised in [Fig 5].

Figure 5: Example of input node features, and , and edge features, . With orange we see the nodes and with light blue the directed edge. is the micro indicator, a single integer that indicates if the node is a macroscale or microscale feature. corresponds to the i,j component of the macroscale stress tensor. is the distance between the two nodes.

The output of the graph contains information only on the nodes of the mesh. Specifically, the output node features are the components of the microscale stress tensor.

We sum up in [Fig 6] the input and output of the GNN. We have only included the node features since the edge features are calculated as a pre-processing step from the connectivity of the mesh.

Figure 6: In this figure we can see the input and output of the GNN. The input consists of the 6 stress components of the macroscale stress tensor along with the micro indicator, a single integer per node that determines if the node belongs to a microscale or a macroscale geometrical feature. For the patch corresponding to the micro indicator we the FE mesh is visible. The output of the GNN is the 6 stress components of the microscale stress tensor.

3.5 GN block

The GN block is the basic building block of the GNN and it is used to map an input graph to an output graph with the same nodes and edges but with updated node and edge features. The node and edge features are jointly passed through an edge update Multilayer Perceptron (MLP) and a node update MLP in a 2-step procedure described in

[Battaglia et al., 2018] that sequentially updates the edge features first and the node features later. A sketch of this procedure can be found in [Fig 7].

For the edge update step, if we consider an edge with edge features that sends a message from the sender node with node features to the receiver node with node features then the updated edge feature, , is calculated by passing through the edge MLP the concatenation of these features, )). If is the number of edges of the graph, is the number of node features and is the number of edge features then the input of the edge update MLP is of size

For the node update step, the updated edge features, , are aggregated per node. In this work we use mean aggregation although elementwise summation, maximum or any other symmetric operation can be used. After the aggregation step, we have a single edge feature, , corresponding to each node feature . The updated node features, , are calculated by passing through the node MLP the concatenation of the original node features with the updated and aggregated edge features corresponding to this node, )). If is the number of nodes of the graph, is the number of node features and is the dimensionality of the updated edge features then the input of the node update MLP is of size .

Figure 7: Structure of the GN block used in this work. The update of node and edge features happens through a 2-step procedure as described in this figure. FC stands for fully connected layer, Dp for dropout and LN for layer normalisation.

3.6 Model

In this work we deploy an Encode-Process-Decode GNN as described by [Battaglia et al., 2018]. The encoder, encodes the node and edge features in a latent space of higher dimensions. The processor updates these node and edge features and finally the decoder, decodes these features from the latent space to the output space.

  • Encoder: The node and edge features are independently encoded into a latent space of 128 dimensions using 2 distinct MLPs with the same structure. The MLP has 3 hidden layers followed by a Dropout layer [Hinton et al., 2012]

    and ReLU activation function each and a Layer Normalization layer after the output layer, that improves training stability and decreases training time in recurrent NNs

    [Ba et al., 2016].

  • Processor: The processor is composed of 5 residual GN Blocks. By using residual blocks the NN itself can choose its depth by skipping the training of a few layers using skip connections. This is currently the most common way to train deep NNs [He et al., 2015; Kim et al., 2016; Zagoruyko and Komodakis, 2017; Lim et al., 2017] and it also applies to GNNs [Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021; Mylonas et al., 2022]. The structure of the MLPs of the Processor is the same as the structure of the encoder MLPs. The processor combines information from edges and nodes and passes messages between nodes that share an edge.

  • Decoder: The decoder operates only on the node features as the output of the GNN is the microscale stress field on the nodes. A single MLP is used to decode the node features from the latent space to the output space. The decoder MLP is similar to the MLPs used in the encoder and the processor but there is no Layer Normalization layer applied after the output layer

There is a skip connection that connects the macroscale stress tensor, from the input, with the output of the Decoder. That way the network can learn how the microscale stress field diverges from the macroscale stress field. In section [D.1] we demonstrate that this leads to smoother and faster convergence. A sketch of the GNN can be found below [Fig 8].

Figure 8: Architecture of the GNN

4 Stress correction using homogeneous Neuman conditions: an online Kalman-based approach

In this section we introduce an online hierarchical Bayesian method to constrain the prediction of the GNN. This method allows us to improve the network prediction without creating more data and retraining the network, which is computationally expensive and requires access to the function that creates the training data. We impose homogeneous Neumann conditions, that cannot be explicitly imposed while training the network, even though the training examples satisfy them up to the FE accuracy.

To apply this method we need a distribution of outputs from the GNN. To achieve that, we convert the deterministic GNN to a Bayesian GNN (BGNN). The posterior estimation of the BGNN for the stress field over the surface of the structure becomes a prior for the online assimilation step. Physical constraints are applied statistically by introducing them as partial information of the stress field, yielding a Bayesian posterior update. We chose to resort to a Gaussian approximation of the posterior in the form of the ensemble Kalman method. Alternative methods include Langevin MCMC and Variational Inference, however the online cost may prove to be prohibitive.

4.1 Bayesian GNN

Several techniques have been proposed to quantify the uncertainty in NNs resulting in Bayesian NNs [Hinton and van Camp, 1993; Graves, 2011; Kingma and Welling, 2014; Blundell et al., 2015]. In this work we use the Bayes by Backprop method (BBB) to convert the deterministic GNN to a BGNN [Blundell et al., 2015].

In order to define the optimisation objective, we consider a probabilistic NN where we replace the constant weights of a deterministic NN with a distribution over each weight. Before observing the data, , the weight distribution is called prior, , and after observing the data the weight distribution is updated, so that the BNN will fit the data, and called posterior . For an input

the output of the probabilistic NN will be a probability distribution over all possible outputs

. Minimising the loss function results in determining the posterior distribution of the weights given the training data,

. The posterior cannot be computed analytically through the Bayes’ rule, , because of the existence of intractable integrals. A common approach is to perform Variational Inference (VI) to approximate the posterior with a variational distribution [Hinton and van Camp, 1993; Graves, 2011], parameterised by . The new objective is to find the parameters [Eq. 6] that minimise the Kullback-Leibler (KL) divergence between the approximate posterior and the true Bayesian posterior.

(6)

We consider the approximate posterior to be a fully factorised Gaussian [Graves, 2011], corresponding to a squared loss [Blundell et al., 2015]

. The prior is also a Gaussian distribution corresponding to L2 regularization

[Blundell et al., 2015]. Each layer has a single prior and thus a single prior mean,

, and a single prior standard deviation,

. In a forward pass of the BGNN the weights are sampled from the posterior distribution. This creates a problem during backpropagation since we cannot define a gradient for this operation with respect to the weights. To tackle this problem we couple the BBB method with the reparametrisation trick

[Kingma and Welling, 2014], as described in [Blundell et al., 2015], which replaces the sampling from the posterior with sampling from a unit Gaussian scaled by the posterior standard deviation, , and shifted by the posterior mean, . The standard deviation needs to be positive, thus it is parameterised using a parameter, , as . Consequently, the variational parameters to be optimised are , where and .

4.2 Ensemble Kalman method for stress correction

The ensemble Kalman method is used to update the prior state vector of a system,

. In this work the state vector corresponds to the microscale stress components predicted by the BGNN on the nodes of the mesh. In order to generate the prior ensemble we generate a number of samples from the BGNN. For node of sample we have

(7)

Assuming that the mesh has nodes and that we draw samples from the BGNN the prior ensemble matrix can be expressed as

(8)

where

(9)

The updated stress distribution (posterior), , after observing the data , can be obtained as

(10)

where H is the observation matrix that is used to provide the noise free value of the data, , and is the Kalman gain matrix, calculated as

(11)

where is the error of the data and is the ensemble covariance calculated as

(12)

The data, , reflect the homogeneous Neumann conditions that we impose on every node

(13)

Instead of explicitly defining the observation matrix we use an observation function, , (as described in appendix E.3) and we calculate the term on every node as

(14)

where is the symmetric microscale stress tensor and is the unit normal vector

(15)

For more details the reader can refer to [appendix E]

5 Numerical Examples

5.1 Cubical Heterogeneous material

In this section we want to test our methodology using a dataset containing samples from a heterogeneous material. We will demonstrate how we can use our methodology to

  • predict the microscale stress distribution

  • predict the maximum microscale Von Mises stress over the patches

  • quantify the uncertainty of the prediction

  • improve the BGNN prediction using the online correction technique introduced in section [4.2]

5.1.1 FEA set-up and generation of the training dataset

For the purpose of training our model we will work with a synthetic heterogeneous material from which we extract and examine cubical specimens. The size of each specimen is 2 units along each spatial dimension. The specimen has two elliptical pores, with random position, size and orientation and a distribution of 50 to 100 spherical pores with the same radius, units. The spherical and elliptical pores are generated without intersecting. For the macroscale simulations, input of the BGNN, only the elliptical pores are taken into account and the spheres are ignored. Two realisations of the cubical specimen are shown in [Fig 9].

Figure 9: Two realisations of a cubical specimen from an heterogeneous materials with a bimodal distribution of random pores: large elliptical pores and smaller equally-sized spherical pores.

As already explained in [2.4] the input to the GNN is a patch of the geometry, not the entire structure, and the prediction of the GNN happens in a sub region of the patch that we called ROI. The patch and ROI size depend on the radius of interaction of the spherical pores. Based on the work of [Pilkey and Pilkey, 2008] on stress intensity factors, we assume that the effect of spherical voids on the global stress field fades out for a distance larger than 4 radii from the centre of the spherical voids. We combine this information with our previous work [Krokos et al., 2021] and we choose a patch size of and a ROI size of , where is the radius of the spherical voids, as can be seen in [Fig 10].

Figure 10: Sketch of a cross section of a patch. In light brown we see the patch, in green the region of interest (ROI) and with the black the spherical voids.

The patches that are extracted never intersect with the exterior boundaries of the FE mesh. Additionally, we apply homogeneous Dirichlet boundary conditions away from the material specimen, as represented in [Fig 11], [Eq. 16].

(a)
(b)
Figure 11: On the left (a) we can see a volume mesh of the multiscale structure. The buffer area where the mesh is coarse is visible. In the centre we can see the dense mesh area from where we extract the patches. On the right (b) we can see a zoom in the dense mesh region.
(16)

where , , , , , are far-field load parameters, is the position of a point in and is the initial position of the centre of the body in .

For data pre-processing we use a simple linear rescaling to map the input data to a range from -1 to 1. Further information can be found in [appendix C]

The dataset we use consists of 200 FE simulations completed in 8.7 hours on 5 cores on an Intel® Xeon® Gold 6148 CPU @ 2.40GHz CPU. From every FE simulation we extract 5 patches, at random locations in the cubic specimen, resulting in 1,000 data points, 900 for training and 100 for testing.

5.1.2 Training parameters

For the training of the network we used a batch size of 2 and the Adam optimiser with an initial learning rate of

decreasing by 5% every 50 epochs. Additionally, we identified some key parameters for the training of the GNN and performed several tests to identify their optimum values. We concluded that for achieving optimum behaviour we will use a GNN with 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 10 neighbours per node. More details can be found in [Appendix

D.1 - D.5].

Training on 900 patches using the parameters identified above requires 6 hours on an Nvidia V100 GPU with 16GB of RAM with a maximum memory requirement of 4.3GB.

5.1.3 BGNN MAP prediction

In this section we will examine the mean prediction provided by the BGNN, without referring to the uncertainty estimation.

In [Fig 11(a)] we observe that the maximum stress components predicted by the BGNN are in a good agreement with the ones calculated by the FEA. Specifically, the coefficient of determination, , for all the 6 stress components is larger that 0.98. In [Fig 11(b)] we observe similar results for the maximum Von Mises stress. Although the points are more scattered and the coefficient of determination dropped to 0.84, the accuracy is 96%.

(a) maximum microscale components
(b) maximum microscale Von Mises stress
Figure 12: In both subfigures the x-axis corresponds to the FE prediction in the ROI of each patch and the y-axis to the BGNN prediction in the ROI of each patch. In the top subfigure, (a), we observe 6 diagrams corresponding to each of the maximum absolute microscale stress components. In the bottom subfigure, (b), we observe the maximum Von Mises stress.

Additionally, we examine the stress distribution on the FE mesh. In [Fig 13] we compare the microscale Von Mises stress in the ROI of a patch as predicted by the GNN with the one calculated by FE simulations. We observe that the predicted stress distribution is very close to the ground-truth.

Figure 13: Comparison between the Von Mises stress distribution as calculated by FEA (right) and the Von Mises stress distribution as predicted by the GNN (left).

5.1.4 BGNN uncertainty estimation

After studying the quality of the predictions provided by the mode of the BGNN, we propose to evaluate the ability of the BGNN to quantify the uncertainty of the prediction and provide credible intervals (CIs) that reflect the error in the prediction. From [Fig 14] we can see that the percentage of points inside the 95% CIs is 92%, which is satisfactory.

Figure 14: In this figure we see the upper and lower 95% CIs for the maximum Von Mises stress prediction, blue and orange points respectively. The BGNN is able to include the real value in it’s 95% CIs in 92% of the cases.

5.2 Online bias correction

In this section we will demonstrate the use of the Ensemble Kalman method, section [4.2], to improve the BGNN prediction. We will examine 2 realistic cases, first case is to improve the prediction of an under trained network and second case is to improve the prediction of a network in an extrapolation regime.

5.2.1 Under trained GNN

With the phrase under trained we refer to a network for which we use a small dataset for training. In [Fig 15] we show the accuracy plots for the mean BGNN prediction as a function of the epochs for 2 GNNs trained with the same parameters but the one on the left uses data from 32 FE simulations for training and the one on the right uses data from 100 FE simulations for training. The BGNN trained with more data performs better than the undertrained network.

(a) 32 FE simulations
(b) 100 FE simulations
Figure 15: In both diagrams we see accuracy curves defined using the mean BGNN prediction for the maximum Von Mises stress. Specifically, we see the accuracy as function of the training epochs for the training set (blue) and the test set (red). In the diagram on the left (a) the GNN is trained with 32 FE simulations, 160 patches, and in the diagram on the right (b) the GNN is trained with 100 FE simulations, 500 patches.

Using the Ensemble Kalman method we want to update the output distribution of the BGNN, prior, and obtain the posterior. In [Fig 16] we can see the posterior and the prior for the prediction of the maximum Von Mises stress. The coefficient of determination between the posterior and the FE results has a value of 0.3074 which is larger than the coefficient of determination between the prior and the FE results that has a value of 0.0176. Also we can see that best line that fits the posterior data is closer to the line (ideal result) compared to the best line that fits the prior data, i.e. the output of the BGNN, without the online correction. Therefore, the posterior mean better fits the data compared to the prior mean and thus the online stress correction resulted in an improvement of the mean BGNN prediction.

Figure 16: A diagram showing the posterior (yellow points) maximum Von Mises prediction and the prior (blue points) maximum Von Mises prediction. We observe that the posterior, after the stress update, has a larger coefficient of determination compared to the prior.

Furthermore, we want to examine the posterior distribution and not only the posterior mean value. In [Fig 17] we see the comparison between the prior 95% CIs (left) and the posterior 95% CIs (right). Firstly, we can observe that even in the prior case, where we have only used 32 FE simulations, the 95% CIs give a good estimation of where the real value is. We can see that the percentage of points that are inside the 95% CIs is 79%. Additionally, we see that this value climbs to 88% in the posterior case. This clearly means that by using the online stress update we were able to improve the BGNN prediction without adding more data to the training dataset. We note that we only update the mean value and the posterior standard deviation is taken equal to the prior one.

Figure 17: Comparison between the prior and posterior 95% CIs for the maximum Von Mises stress in the ROI of the patches for a BGNN trained with only 32 FE simulation data. The diagram on the left, (a), is showing the upper and lower 95% CIs, blue and orange points respectively, before the stress update. While the diagram on the right, (b), is after the stress update. Two clusters, A and B, are marked before and after the stress correction highlighting that blue points, +95% CIs, that were below the line for the prior case have moved towards the line and in a lot of cases surpassed it, in the posterior case.

Lastly, in [Fig 18] we compare the mean Von Mises stress distribution on the mesh before and after the stress update with the one calculated through FE simulations. We can see that the posterior, middle image, is closer to the FE results, image on the left, compared to the prior, image on the right.

Figure 18: Comparison between the Von Mises stress before and after the online stress update with the one calculated through FE simulations. On the left we see the FE results, in the middle the posterior and on the right the prior. For the posterior and prior we show the mean prediction.

5.2.2 Prediction for out-of-training microstructural inputs

In this section, with the phrase out-of-training we refer to a network trained with spherical defects but evaluated in cases that have elliptical defects. The BGNN is extrapolating which is a particularly undesirable but common scenario for real applications. In [Fig 19] we show a realisation of the training dataset on the left and a realisation of the test dataset on the right.

(a) realisation of the training dataset
(b) realisation of the test dataset
Figure 19: On the left (a) we see a realisation of the training dataset and on the right (b) of the test dataset. For the training dataset the porous phase is composed of spheres while for the test set of ellipsoids.

In [Fig 20] we see the comparison between the prior 95% CIs (left) and the posterior 95% CIs (right). Firstly, we can observe that even in the prior case, where the BGNN extrapolates, the 95% CIs give a good estimation of where the real value is. We can see that the percentage of points that are inside the 95% CIs is 91%. Additionally, we see that this value climbs to 95% in the posterior case. Most importantly we see that using the ensemble Kalman method we managed to avoid underpredicting the maximum values. This clearly means that by using the Ensemble Kalman method we were able to improve the BGNN prediction without adding more data to the training dataset. We note that we only update the mean value and the posterior standard deviation is taken equal to the prior one.

Figure 20: Comparison between the prior and posterior 95% CIs for the maximum Von Mises stress in the ROI of the patches for a BGNN trained on a dataset with spheres as microscale features and tested on a dataset with ellipses as microscale features. The diagram on the left, (a), is showing the upper and lower 95% CIs, blue and orange points respectively, before the stress update. While the diagram on the right, (b), is after the stress update. A cluster of points is marked before and after the stress correction highlighting that blue points, +95% CIs, that were below the line for the prior case have moved towards the line and in a lot of cases surpassed it, for the posterior case.

5.3 Dogbone

After demonstrating the applicability of our GNN and studying the online correction technique, in this subsection we apply the proposed deep learning methodology to a more challenging case. We also provide multiple examples of the Von Mises stress distribution on the surface of the mesh.

5.3.1 Training dataset

The geometry that we choose to model is a dogbone with a single hole in the middle that does not change from one realisation to another, and with a random distribution of 50 to 100 spherical pores. The size of the dogbone is 2 units along the axis (length), 0.6 units along the axis (height) and 0.08 units along the axis (width). The radius of the hole is 0.032 units and the radius of the spherical pores is 0.016 units. The spherical pores may intersect with each other, with the boundaries of the geometry and the hole. For the macroscale simulations, input of the BGNN, only the hole is taken into account and the spherical pores are ignored. In [Fig 21] we show 2 examples of the geometry. For the boundary conditions we apply a displacement along the direction of the same magnitude but of opposite sign.

Figure 21: Two realisations of the dogbone used to train the GNN. Each dogbone has a cylinder-shaped hole and the porous material is defined via a random distribution of spherical voids.

We performed 500 FE simulations completed in 5.7 hours on 10 cores on an Intel® Xeon® Gold 6148 CPU @ 2.40GHz CPU. From every FE simulation we extract 10 patches resulting in 5,000 data points, 4,000 for training and 1,000 for testing. As explained in section [5.1.1] the patch is of size and the ROI is of size , where is the radius of the spherical pores.

5.3.2 GNN parameters

For the GNN training we used the Adam optimiser with an initial learning rate of decreasing by 5% every 50 epochs. Additionally, we started with the parameters that we identified in [5.1.2

], namely 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 10 neighbours per node. After experiments we concluded that in this case the GNN can benefit from a maximum of 20 neighbours per node but all the other hyperparameters remained the unchanged. Training of the GNN was performed on an Nvidia V100 GPU with 16GB of RAM.

In this more challenging problem we investigate the idea of using dual convolutions. Instead of only considering the Euclidean neighbourhood we also consider the Geodesic neighbourhood of every node in order to perform joint Geodesic and Euclidean convolutions as explained in [3.3]. After experiments, we concluded that in our case the optimum value for the ratio between Geodesic and total convolutions is 75%. For more details the reader can refer to [Appendix D.6].

In addition, the reader can find experiments that support our choice to predict the full stress tensor instead of the Von Mises stress in [Appendix D.7] and our choice to work on patches of the structure instead of the entire structure in [Appendix D.8].

5.3.3 Numerical examples with Deterministic GNN

As can be seen in [Fig 21(a)] when training with 4000 patches the accuracy for the maximum Von Mises stress is 71% which is considerably lower than the accuracy reported for the cubical heterogeneous material case, 96%. In [Fig 21(b)] we can also evaluate the performance of the GNN on the entire dogbone and not only at patch level. The accuracy does not change considerably in this case but the data are more spread around the line which is indicated by the lower coefficient of determination .

(a) evaluate on patches
(b) evaluate on entire structure
Figure 22: In both subfigures the x-axis corresponds to the maximum Von Mises stress as calculated by FEA and the y-axis as calculated by the GNN. For the left subfigure, (a), we see results for the ROI of the patches while for the right subfigure, (b), for the entire dogbone.

Furthermore, we demonstrate the quality of the stress distribution. In [Fig 23] and [Fig 24] we show that the stress field predicted by the GNN is in good agreement with that provided by direct FEA.

Figure 23: Comparison between the Von Mises stress distribution as calculated by FEA (top) and Von Mises stress distribution predicted by the GNN (bottom). The GNN result is reconstructed from the union of multiple patch predictions where only the ROI is extracted.
Figure 24: Comparison between the Von Mises stress distribution, extracted from a cross section of the structure (top), as calculated by FEA (middle) and as predicted by the GNN (bottom). We can observe that the two distributions are qualitatively very similar. The GNN result is reconstructed from the union of multiple patch predictions where only the ROI is extracted.

Lastly, we are interested in studying the predicted stress distribution in the highly stressed regions of the specimen. In [Fig 25] we have zoomed in a region where a spherical void interacts strongly with the cylinder-shaped hole, and in [Fig 25] we have zoomed in at a strong interaction between 2 spherical voids. We observe that in both cases the GNN successfully located the area where the maximum Von Mises occurs, predicted its value and the stress distribution around it.

(a) spherical void - cylinder-shaped hole interaction
(b) spherical void - spherical void interaction
Figure 25: In both subfigures we compare the maximum Von Mises stress as calculated by FEA and as predicted by the GNN. In the top subfigure (a), the maximum is due to an interaction between spherical voids and the cylinder-shaped hole while in the bottom, (b), due to an interaction between 2 spherical voids.

5.4 Variable Dimension Dogbone

In this section we demonstrate the ability of our method to be used for exploration of responses in a space of shapes described by a few geometrical parameters. To this end, we perform geometric learning in a family of dogbone shapes where not only the spherical void distribution will be different from realisation to realisation but also the cylindrical-hole and the dimensions of the dogbone. We focus on the dependence of the quality of the results with respect to the size of the dataset both in terms of of maximum Von Mises stress and stress distribution.

5.4.1 Training dataset

To create this dataset we follow the same process as in the Dogbone section, 5.3. The difference lays on the fact that here the dimension of the specimen is not constant. The length of the dogbone is equal to 2, the width varies from 0.06 to 0.12 and the height to varies from 0.24 to 0.7. Additionally, the number of cylindrical holes varies from 1 to 4 in every realisation, and their radii from 0.024 to 0.06. Two realisations of the variable dogbone geometry can be found in [Fig 26].

Figure 26: Two realisations of the variable dogbone structure. In the top subfigure we observe a dogbone with large height, small width and 4 cylindrical holes. In the bottom subfigure we observe a dogbone with small height, large width and 2 cylindrical holes.

5.4.2 Numerical Examples with variable dimension dogbone

We investigate the performance of the GNN with respect to the size of the training dataset. We use patches extracted from 100 FE simulations for the test dataset and 4 training datasets consisting of patches extracted from 50, 100, 200, and 400 FE simulations. Results can be found in [Fig 27]. From [Fig 26(a)] we can observe that the accuracy increases as we increase the size of the training dataset, from 50.21% for the GNN trained with 50 FE simulations to 64.30% for the GNN trained with 400 FE simulations. We see the opposite trend in [Fig 26(b)] where the loss decreases in fewer epochs and reaches a lower value as the size of the training dataset increases. Compared to the previous family of dogbones, where the dimensions of the dogbone and the cylindrical hole did not change from one realisation to another we can observe that the accuracy that can be achieved using 400 FE simulations decreases from 71% to 64% for the variable dogbone case. This is an expectable result since the space of shapes we try to learn now has more parameters than before. In order to achieve similar accuracy, we create additional 600 FE simulations and we train the GNN for 100 epochs using as initial weights the ones determined by the GNN trained with 400 FE simulations. The accuracy achieved for this case is 70%.

(a) Accuracy
(b) Loss
Figure 27: In the diagram on the left (a) we see accuracy curves defined using the maximum Von Mises stress and a 15% threshold for the relative error. Specifically, we see the accuracy as function of the training epochs. In the diagram on the right (b) we see the loss function as a function of the epochs. For both diagrams, coloured lines correspond to GNNs trained with datasets of different size, namely patches extracted from 50, 100, 200 and 400 FE simulations.

Additionally, in order to examine the stress distribution, in the following figure we have extracted the upper part of the structure so that we can observe the stress distribution inside. This can be found in [Fig 28], where we can see that the GNN successfully reconstructed the Von Mises distribution.

Figure 28: Comparison between the Von Mises stress distribution, extracted from the top of the structure (top), as calculated by FEA (middle) and as predicted by the GNN (bottom). We can observe that the two distributions are qualitatively very similar. The GNN result is reconstructed from the union of multiple patch predictions where only the ROI is extracted.

Lastly, want to examine the quality of the stress distribution with respect to the size of the training dataset. In [Fig 29] we see the Von Mises prediction for a case where not many strong interactions are present. We can observe that all the GNNs even the one trained with 100 FE simulations successfully predicts the correct Von Mises stress distribution. On the other hand, in [Fig 30] we can see a case where two intersecting defects, located at the corner of the domain, strongly interact and we can observe that the GNNs trained with less than 200 FE results fail to predict this interaction.

Figure 29: On the left the Von Mises distribution on the surfacre of a dogbone structure as calculated by FE simulations. On the right, we can see a zoom of an area of the dogbone. From top to bottom we see the FE simulation results, the GNN prediction for a GNN that was trained with patches extracted from 400, 200 and 100 FE simulations. We can observe that all the GNN predictions are very close to the FE results.
Figure 30: On the left the Von Mises distribution on the surfacre of a dogbone structure as calculated by FE simulations. On the right, we can see a zoom of the maximum Von Mises stress area. From top to bottom we see the FE simulation results, the GNN prediction for a GNN that was trained with patches extracted from 400, 200 and 100 FE simulations. We can observe that only the GNN that was trained using 400 FE simulations was able to correctly predict the maximum Von Mises stress that is created from the interaction between two spherical voids.

6 Conclusions

In this work we have developed a Neural Network (NN) based concurrent scale coupling method to deal with 3D multiscale stress predictions on porous media to overcome the need for computationally expensive direct numerical multiscale simulations. Our approach does not assume separation of scales nor does it require prior geometrical parametrisation of the multiscale problem.

Our approach is based on geometric learning which is more suitable for multiscale problems as it allows us to operate on a variable resolution mesh of the geometry rather than a fixed size voxel-based image. Additionally, we made the choice to analyse and predict only the surface stress of the geometry. This choice was based on 3 assumptions, 2 for the input and 1 for the output. For the input of the GNN we assumed that the surface representation is a comprehensive representation of the geometry and that the trace of the smooth macro stress over the surface mesh is sufficient to inform the GNN of the macroscopic stress state. For the output, we assumed and proved, that in our dataset the maximum microscale stress field can be found on the surface of the structure.

This work extended our previous work on 2D image based CNNs [Krokos et al., 2021], to 3D graph-based Graph Neural Networks (GNNs). We found the same efficiency we had in the 2D case, to predict stress fields, local maxima and quantify uncertainty.

Moreover, we introduced a way to enforce physics constraints at evaluation time, using a Bayesian framework whereby the Bayesian GNN (BGNN) predictions were considered as prior, and posterior predictions that conform to homogeneous Neumann conditions on free surfaces of the porous materials were computed via an ensemble Kalman update. We have used this method to tackle problems commonly reported with data-driven methodologies, namely lack of training data and poor extrapolation ability. We have shown that without this online correction, downward bias is observed for maximum equivalent stresses, which is problematic. This effect is alleviated thanks to the online physics-based correction.

Lastly, we show that for Monte-Carlo simulations of porous microstructure, relatively few simulations are required to train the GNN, 50 simulations already give 50% accuracy. The trained GNN can be used in an inverse problem setting for instance for the optimisation of a CAD model under uncertainty stemming from the microstructure.

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 764644.

This paper only contains the author’s views and the Research Executive Agency and the Commission are not responsible for any use that may be made of the information it contains.

Stéphane P.A. BORDAS received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 811099 TWINNING Project DRIVEN for the University of Luxembourg and from the Fonds National de la Recherche Luxembourg FNR under grant O17-QCCAAS-11758809. We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.

We thank Synopsys Northern Europe Ltd for its support and specifically Dr Viet Bui Xuan for his constructive suggestions during the planning and development of this research work.

References

  • M. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. Rognes, and G. Wells (2015) The fenics project version 1.5. 3, pp. . External Links: Document Cited by: §2.3.
  • J. L. Ba, J. R. Kiros, and G. E. Hinton (2016) Layer normalization. Note: arXiv preprint arXiv: 1607.06450 External Links: 1607.06450 Cited by: 1st item.
  • P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu (2018) Relational inductive biases, deep learning, and graph networks. Note: arXiv preprint arXiv: 1806.01261 External Links: 1806.01261 Cited by: §3.1, §3.5, §3.6.
  • C. M. Bishop (1995)

    Neural networks for pattern recognition

    .
    Oxford University Press, Inc., USA. External Links: ISBN 0198538642 Cited by: Appendix C.
  • C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra (2015) Weight uncertainty in neural networks. Note: arXiv preprint arXiv: 1505.05424 External Links: 1505.05424 Cited by: §1, §4.1, §4.1.
  • M. Buda, A. Saha, and M. A. Mazurowski (2019) Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Computers in Biology and MedicineProcedia ManufacturingNeuroImageComputational Materials ScienceComputational MechanicsACM Transactions on Graphics (TOG)J. Machine Learning Res.Rendiconti del Seminario MatematicoIEEE Transactions on MagneticsPeterson’s Stress Concentration Factors, Third EditionInternational Journal for Numerical Methods in EngineeringJournal of Open Source SoftwareComputer Methods in Applied Mechanics and EngineeringSIAM Journal on Scientific ComputingComputational MechanicsSIAM Journal on Multiscale Modeling and SimulationInternational Journal for Numerical Methods in EngineeringJournal of Geophysical Research: OceansComputer Methods in Applied Mechanics and EngineeringComputational MechanicsComputer Methods in Applied Mechanics and Engineering 109, pp. 218–225. External Links: ISSN 0010-4825, Document, Link Cited by: §1.
  • S. Deshpande, J. Lengiewicz, and S. P. A. Bordas (2021) FEM-based real-time simulations of large deformations with probabilistic deep learning. Note: arXiv preprint arXiv: 2111.01867 External Links: 2111.01867 Cited by: §1.
  • G. Evensen (1994) Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlo methods to forecast error statistics. 99 (C5), pp. 10143–10162. External Links: Document, Link, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/94JC00572 Cited by: §1.
  • L. Gendre, O. Allix, P. Gosselet, and F. Comte (2009) Non-intrusive and exact global/local techniques for structural problems with local plasticity. 44 (2), pp. 233–245. External Links: Link, Document Cited by: §1.
  • C. Geuzaine and J. Remacle (2009) Gmsh: a 3-d finite element mesh generator with built-in pre- and post-processing facilities. 79, pp. 1309 – 1331. External Links: Document Cited by: §2.3.
  • S. Gong, M. Bahri, M. M. Bronstein, and S. Zafeiriou (2020) Geometrically principled connections in graph neural networks. Note: arXiv preprint arXiv: 2004.02658 External Links: 2004.02658 Cited by: §3.1.
  • A. Graves (2011) Practical variational inference for neural networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, Red Hook, NY, USA, pp. 2348–2356. External Links: ISBN 9781618395993 Cited by: §4.1, §4.1, §4.1.
  • R. Hanocka, A. Hertz, N. Fish, R. Giryes, S. Fleishman, and D. Cohen-Or (2019) MeshCNN: a network with an edge. 38, pp. 1 – 12. Cited by: §3.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep residual learning for image recognition. Note: arXiv: 1512.03385 External Links: 1512.03385 Cited by: 2nd item.
  • J. Hesthaven, S. Zhang, and X. Zhu (2015) Reduced basis multiscale finite element methods for elliptic problems. 13, pp. 316–337. External Links: Document Cited by: §1.
  • G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov (2012) Improving neural networks by preventing co-adaptation of feature detectors. Note: arXiv preprint arXiv: 1207.0580 External Links: 1207.0580 Cited by: 1st item.
  • G. E. Hinton and D. van Camp (1993) Keeping the neural networks simple by minimizing the description length of the weights. In

    Proceedings of the Sixth Annual Conference on Computational Learning Theory

    ,
    COLT ’93, New York, NY, USA, pp. 5–13. External Links: ISBN 0897916115, Link, Document Cited by: §4.1, §4.1.
  • H. Jiang, Z. Nie, R. Yeo, A. B. Farimani, and L. B. Kara (2020) StressGAN: a generative deep learning model for 2d stress distribution prediction. Note: arXiv: 2006.11376 External Links: 2006.11376 Cited by: §1.
  • P. Kerfriden, O. Allix, and P. Gosselet (2009) A three-scale domain decomposition method for the 3d analysis of debonding in laminates. 44, pp. 343–362. External Links: Document Cited by: §1.
  • M. Khosla, K. Jamison, A. Kuceyeski, and M. Sabuncu (2018) 3D convolutional neural networks for classification of functional connectomes: 4th international workshop, dlmia 2018, and 8th international workshop, ml-cds 2018, held in conjunction with miccai 2018, granada, spain, september 20, 2018, proceedings. pp. 137–145. External Links: ISBN 978-3-030-00888-8, Document Cited by: §1.
  • J. Kim, J. K. Lee, and K. M. Lee (2016)

    Deeply-recursive convolutional network for image super-resolution

    .
    Note: arXiv: 1511.04491 External Links: 1511.04491 Cited by: 2nd item.
  • D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. Note: arXiv preprint arXiv: 1312.6114 External Links: 1312.6114 Cited by: §4.1, §4.1.
  • V. Krokos, V. Bui Xuan, S. Bordas, P. Young, and P. Kerfriden (2021) A bayesian multiscale cnn framework to predict local stress fields in structures with microscale features. pp. 1–34. Cited by: §1, §1, §2.4, §2.4, 2nd item, §5.1.1, §6.
  • H. Lei, N. Akhtar, and A. Mian (2021) Picasso: a cuda-based library for deep learning over 3d meshes. Note: arXiv preprint arXiv: 2103.15076 External Links: Document, Link Cited by: §3.1.
  • B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee (2017) Enhanced deep residual networks for single image super-resolution. Note: arXiv preprint arXiv: 1707.02921 External Links: 1707.02921 Cited by: 2nd item.
  • M. Lino, C. Cantwell, A. A. Bharath, and S. Fotiadis (2021) Simulating continuum mechanics with multi-scale graph neural networks. Note: arXiv preprint arXiv: 2106.04900 External Links: 2106.04900 Cited by: §3.1.
  • A. Logg, K. Mardal, G. N. Wells, et al. (2012) Automated solution of differential equations by the finite element method. Springer. External Links: Document, ISBN 978-3-642-23098-1 Cited by: §2.3.
  • H. Lu, H. Wang, Q. Zhang, S. W. Yoon, and D. Won (2019) A 3d convolutional neural network for volumetric image semantic segmentation. 39, pp. 422–428. Note: 25th International Conference on Production Research Manufacturing Innovation: Cyber Physical Manufacturing August 9-14, 2019 — Chicago, Illinois (USA) External Links: ISSN 2351-9789, Document, Link Cited by: §1.
  • K. O. Lye, S. Mishra, and D. Ray (2020) Deep learning observables in computational fluid dynamics. Journal of Computational Physics 410, pp. 109339. External Links: ISSN 0021-9991, Document, Link Cited by: §1.
  • J. Masci, D. Boscaini, M. M. Bronstein, and P. Vandergheynst (2018) Geodesic convolutional neural networks on riemannian manifolds. Note: arXiv preprint arXiv: 1501.06297 External Links: 1501.06297 Cited by: §3.1.
  • A. Mendizabal, P. Márquez-Neila, and S. Cotin (2020) Simulation of hyperelastic materials in real-time using deep learning. Medical Image Analysis 59, pp. 101569. External Links: ISSN 1361-8415, Document, Link Cited by: §1, §1.
  • C. Mylonas, G. Tsialiamanis, K. Worden, and E. N. Chatzi (2022) Bayesian graph neural networks for strain-based crack localization. In Data Science in Engineering, Volume 9, R. Madarshahian and F. Hemez (Eds.), Cham, pp. 253–261. External Links: ISBN 978-3-030-76004-5 Cited by: §D.3, 2nd item, §3.1.
  • Z. Nie, H. Jiang, and L. B. Kara (2019) Stress field prediction in cantilevered structures using convolutional neural networks. Journal of Computing and Information Science in Engineering 20 (1). External Links: ISSN 1944-7078, Link, Document Cited by: §1.
  • J. T. Oden, S. Prudhomme, A. Romkes, and P. T. Bauman (2006) Multiscale modeling of physical phenomena: adaptive control of models. 28 (6), pp. 2359–2389 (English (US)). External Links: Document, ISSN 1064-8275 Cited by: §1.
  • J. T. Oden, K. Vemaganti, and N. Moës (1999) Hierarchical modeling of heterogeneous solids. 172 (1), pp. 3–25. External Links: ISSN 0045-7825, Document, Link Cited by: §1.
  • D. Paladim, J. Almeida, S. Bordas, and P. Kerfriden (2016) Guaranteed error bounds in homogenisation: an optimum stochastic approach to preserve the numerical separation of scales. 110, pp. . External Links: Document Cited by: §1.
  • T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. W. Battaglia (2021) Learning mesh-based simulation with graph networks. Note: arXiv preprint arXiv: 2010.03409 External Links: 2010.03409 Cited by: §D.3, 2nd item, §3.1.
  • W.D. Pilkey and D.F. Pilkey (2008) Peterson’s stress concentration factors, third edition. pp. 1–522. External Links: Document Cited by: §5.1.1.
  • C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) PointNet: deep learning on point sets for 3d classification and segmentation. Note: arXiv preprint arXiv: 1612.00593 External Links: 1612.00593 Cited by: §3.1.
  • P. Raghavan and S. Ghosh (2004) Concurrent multi-scale analysis of elastic composites by a multi-level computational model. 193 (6), pp. 497–538. External Links: ISSN 0045-7825, Document, Link Cited by: §1.
  • M. Raissi, A. Yazdani, and G. E. Karniadakis (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. External Links: Document, Link, https://www.science.org/doi/pdf/10.1126/science.aaw4741 Cited by: §1.
  • C. Rao and Y. Liu (2020) Three-dimensional convolutional neural network (3d-cnn) for heterogeneous material homogenization. 184, pp. 109850. External Links: ISSN 0927-0256, Document, Link Cited by: §1.
  • A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia (2020) Learning to simulate complex physics with graph networks. Note: arXiv preprint arXiv: 2002.09405 External Links: 2002.09405 Cited by: §D.3, 2nd item, §3.1.
  • É. Sanchez-Palencia (1987) General introduction to asymptotic methods. Vol. 272. External Links: ISBN ISBN 978-3-540-17616-9, Document Cited by: §1.
  • N. Schlömer (2021) Pygmsh: a python frontend for gmsh External Links: Document, Link Cited by: §2.3.
  • J. Schult, F. Engelmann, T. Kontogianni, and B. Leibe (2020) DualConvMesh-net: joint geodesic and euclidean convolutions on 3d meshes. Note: arXiv preprint arXiv: 2004.01002 External Links: 2004.01002 Cited by: §3.1, §3.3.
  • A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver, K. Kavukcuoglu, and D. Hassabis (2020) Improved protein structure prediction using potentials from deep learning. Nature 577 (7792), pp. 706—710. External Links: Document, ISSN 0028-0836, Link Cited by: §1.
  • A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver, K. Kavukcuoglu, and D. Hassabis (2019) Protein structure prediction using multiple deep neural networks in the 13th critical assessment of protein structure prediction (casp13). Proteins: Structure, Function, and Bioinformatics 87 (12), pp. 1141–1148. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.25834 Cited by: §1.
  • C. B. Sullivan and A. Kaszynski (2019) PyVista: 3d plotting and mesh analysis through a streamlined interface for the visualization toolkit (VTK). 4 (37), pp. 1450. External Links: Document, Link Cited by: §2.3.
  • Y. Sun, I. Hanhan, M. D. Sangid, and G. Lin (2020) Predicting mechanical properties from microstructure images in fiber-reinforced polymers using convolutional neural networks. Note: arXiv: 2010.03675 External Links: 2010.03675 Cited by: §1.
  • N. N. Vlassis, R. Ma, and W. Sun (2020) Geometric deep learning for computational mechanics part i: anisotropic hyperelasticity. 371, pp. 113299. External Links: ISSN 0045-7825, Link, Document Cited by: §3.1, §3.2.
  • H. Vu, H. Kim, M. Jung, and J. Lee (2020)

    FMRI volume classification using a 3d convolutional neural network robust to shifted and scaled neuronal activations

    .
    223, pp. 117328. External Links: ISSN 1053-8119, Document, Link Cited by: §1.
  • S. Zagoruyko and N. Komodakis (2017) Wide residual networks. Note: arXiv: 1605.07146 External Links: 1605.07146 Cited by: 2nd item.
  • T. Zohdi and P. Wriggers (2005) An introduction to computational micromechanics. Vol. 20. External Links: ISBN ISBN 978-3-540-32360-0, Document Cited by: §1.

Appendix A Accuracy

Below we can see the algorithm that we use to calculate the accuracy of the GNN prediction [Algorithm 1].

1:function acc(, = 0.1)
2:     
3:     
4:     
5:     
6:     for  do 
7:         
8:           get the predicted stress tensor for the current patch
9:           get the real stress tensor for the current patch
10:         
11:            extract the predicted stress values present in the ROI
12:           extract the real stress values present in the ROI
13:         
14:           calculate the Von Mises stress for the predicted stress tensor
15:           calculate the Von Mises stress for the real stress tensor
16:         
17:           get the maximum Von Mises stress for the predicted stress tensor
18:           get the maximum Von Mises stress for the real stress tensor
19:         
20:          =   calculate the relative error
21:         
22:         if  then  decide if the error is acceptable
23:                             
24:     
25:      =
26:     
27:     return
Algorithm 1 Compute accuracy

Appendix B Volume and surface mesh

As discussed in section [3.2] the input and output is the surface and not the volume stress. In [Fig. 31] we see an example of a surface stress extracted from the volume mesh. We can observe that the maximum value of the Von Mises stress is the same in both cases and the location where the maximum Von Mises stress occurs is the same as well.

(a) Volume Mesh
(b) Surface Mesh
Figure 31: Two structures corresponding to one quarter of a random realisation of the dogbone geometry. In the diagram on the left (a) we see the Von Mises stress distribution on the volume mesh. In the diagram on the right (b) we see the Von Mises stress distribution on the surface mesh. We can see, in the red circles, that the maximum values in both structures are in the same location.

Appendix C Data pre-processing

Data pre-processing is a standard technique used in data driven methodologies. A common practice is to pre-process the input data usually with a simple linear rescaling [Bishop, 1995]. In this work we perform rescaling in order to both have smoother convergence of the optimiser and to restrict the space of possible inputs. We divide the input, and the output for consistence reasons, of the NN with the maximum absolute component of the stress tensor. This rescales the input data to a range . This rescaling alleviates the differences in scales across the data points which helps the optimiser to avoid bad local minima and converge to a stable solution. Additionally, scaling the input in the case of linear elasticity caries a deeper physical meaning. Scaling the macroscale stress field with a scaling factor, , results in scaling the microscale stress tensor with the same scaling factor, . By taking advantage of this linear relationship we can effectively train our NN in a range of values from -1 to 1 and then evaluate it to an input of any range. This is achieved by dividing the new input with the scaling factor , to map it to a range from -1 to 1. To retrieve the real output, microscale tensor, the output of the NN needs to be multiplied with the scaling factor .

Appendix D GNN parameters

We identified some key parameters for the training of the GNN and performed several tests to identify their optimum values. The parameters that we examined is the number of filters in the MLPs, the number of GN Blocks, the number of maximum neighbours per node, the existence of a skip connection between the input and the output of the network and finally the type of encoder. For these tests we used 500 patches as the training set and 100 patches for the test set.

d.1 Skip Connection

We suggest that using a skip connection to add the input macroscale stress tensor to the output of the network will improve GNN’s performance. The reason behind this is that the GNN will learn how the microscale stress field deviates from the macroscale stress field instead of learning the microscale stress field from zero which we believe that it is easier and it should also improve the generalization ability of the network. In order to validate our assumption we train 2 identical GNNs with the exact same parameters and training set but one of them has a skip connection and the other does not. Both networks have MLPs with 128 filters, 10 GN blocks and each node has at most 10 neighbours. The results can be found in [Fig 32] where we can observe that the GNN trained with the skip connection has smoother convergence compared to the GNN trained without the skip connection. Also, the skip connection improved accuracy from 65% to 80%. We conclude that indeed the skip connection helped not only to have more stable training but also to improve the accuracy and thus we decide to include it in our architecture.

Figure 32: In the diagram we see accuracy curves for the test dataset defined using the maximum Von Mises stress. The yellow line corresponds to a GNN trained with a skip connection to add the input macroscale stress to the output of the GNN and the the blue line corresponds to a GNN trained without this skip connection. We observe that the skip connection results in smoother training and higher accuracy.

d.2 Number of Filters

Another important parameter that heavily influences not only the accuracy of the GNN but also the training time and memory requirements is the number of filters in the MLPs. In order to identify the minimum number of filters that result in optimum accuracy we trained 3 GNNs with 64, 128 and 256 filters in the MLPs and we kept all the other parameters the same. Specifically, all networks have the skip connection described in [D.1], 10 GN blocks and each node has at most 10 neighbours. The results can be found in [Fig 33] where we observe that 128 and 256 filters result in the same accuracy which is higher than the accuracy for the 64 filters, 80% compared to 72%. Additionally, the training time for the GNN with the 256 filters is 12.7 hours and the maximum required memory is 15.4 GB while for the 128 filter GNN the training was completed in 5.8 hours and the maximum required memory was 7.7 GB. By choosing to use 128 filters in the MLPs of the GNN we achieve the same accuracy as in the 256 filters GNN but with a 54% decrease in the training time and a 50% decrease in memory requirements.

Figure 33: In the diagram we see accuracy curves for the test dataset defined using the maximum Von Mises stress. Coloured lines correspond to networks trained with different number of filters, namely 64, 128 and 256. We observe that 128 and 256 filters result in the same accuracy 80% where 64 filters result in a lower accuracy of 72%.

d.3 Independent Decoder

As already discussed in [3.6], in the first layer of the GNN we choose to encode the edge and node features independently into the latent dimension as suggested by [Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021; Mylonas et al., 2022]. We want to test if this choice results in an improved GNN and thus we perform an experiment between 2 GNNs with the same parameters where one uses a GN block to encode the inputs into the latent dimension and the other encodes them independently. Both of the GNNs have 10 GN blocks, 128 filters in the MLPs, the skip connection described in [D.1] and each node has at most 10 neighbours. The results can be found in [Fig 34] where we observe that both GNNs have similar accuracy curves and they both have a final accuracy of 80%. Nevertheless, we can observe that in the independent encoder case the accuracy starts increasing sooner, at epoch 50, where in the GN block case it starts increasing later, at epoch 120. Lastly, the independent encoder version involves slightly less calculations and results in a 5% decrease in training time. Consequently, we choose to use the independent encoder GNN.

Figure 34: In the diagram we see accuracy curves for the test dataset defined using the maximum Von Mises stress. The yellow line corresponds to a GNN trained with an independent encoder and the blue line corresponds to a GNN trained with a GN block as an encoder. We observe that both GNNs have similar accuracy curves with the same final accuracy but the GNN with the independent encoder starts increasing it’s accuracy sooner, epoch 50, compared to the other one, epoch 120.

d.4 Number of GN blocks

Another important parameter that affects both the memory requirement and the training time is the number of residual GN blocks. Because we are using residual connections we do not expect a decrease in accuracy as we add more GN blocks but we are expecting that there should be a saturation point where the GNN does not benefit anymore from the GN blocks but it performs unnecessary calculations resulting in increased computational cost. To validate this assumption and determine the most proper number of GN blocks we train 3 GNNs with the same parameters but different number of GN blocks. All the GNNs have 128 filters in the MLPs, the skip connection described in [D.1], independent encoder and each node has at most 10 neighbours. The results can be found in [Fig 35] where we observe that the GNN with only 3 GN blocks presents a decrease of 10% in the test accuracy compared to the other two. Also we can see that the GNNs with 5 and 10 GN blocks have similar test accuracy curves and they both reach a test accuracy of 80%. We decide to opt for the GNN with the 5 GN blocks that presents optimum accuracy without additional computational cost. This results in a training time of 3.1 hours and a maximum memory of 4.3 GB which is a decrease of 45% in training time and 44% in memory requirements compared to the GNN with the 10 GN blocks.

Figure 35: In the diagram we see accuracy curves for the test dataset defined using the maximum Von Mises stress. Coloured lines correspond to networks trained with different number of GN blocks, namely 3, 5 and 10. We observe that the GNN with 3 GN blocks has a decreased accuracy compared to the rest. Additionally, we can see that the GNNs trained with 5 and 10 GN blocks have almost exactly the same behaviour.

d.5 Number of neighbours

Lastly, we want to study the effect of the maximum number of neighbours on the GNN training. A small number of maximum neighbours will result in a graph where disconnected areas of the FE mesh do not share an edge and thus cannot exchange information, for instance microscale and macroscale features. We train 3 GNNs with the same parameters apart from the maximum number of neighbours that have values 5, 10 and 15. All the GNNs have 5 GN blocks, 128 filters in the MLPs, the skip connection described in [D.1] and independent encoder. The results can be found in [Fig 36] where we observe that 10 and 15 neighbours result in the same accuracy which is higher than the accuracy for the 5 neighbours, 81% compared to 75%. Additionally, the training time for the GNN with the 15 neighbours is 27% higher and the memory requirements 31% higher compared to the GNN with 10 neighbours. We conclude that in our case we do not have a reason to use more than 10 neighbours although in denser meshes this number could be different.

Figure 36: In the diagram we see accuracy curves for the test dataset defined using the maximum Von Mises stress. Coloured lines correspond to networks trained with different number of maximum neighbours per node, namely 5, 10 and 15. We observe that 10 and 15 neighbours result in the same accuracy 81% where 5 neighbours result in a lower accuracy 75%.

d.6 Geodesic and Euclidean Convolutions

We study the effect of the joint convolutions by training 6 GNNs with different number of Geodesic and Euclidean filters. We keep the total number of filters constant to 128 but we change the ratio between Geodesic and total (Euclidean + Geodesic) filters. We investigate the case where the ratio is 0% (only Euclidean), 25%, 50% and 75%, 87.5% and 100% (only Geodesic). All the GNNs have 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 20 neighbours per node. The results can be found in [Fig 37]. We can observe that both for the accuracy and the loss the worst case is when the ratio is 0, so no Geodesic convolutions are present. Both the accuracy and the loss improve (accuracy increases and loss decreases) as we increase the ratio until the value 75%. After that no further improvement in performance is observed, both the accuracy and loss curves for ratio values 75% and 87.5% are the same. On the contrary when the ratio becomes 1 there is a substantial drop in performance, which is expected since no Euclidean convolutions are performed. We conclude that a 75% ratio is the most beneficial for this case.

(a) Accuracy
(b) Loss
Figure 37: In the diagram on the left (a) we see accuracy curves defined using the maximum Von Mises stress and a 15% threshold for the relative error. Specifically, we see the accuracy as function of the training epochs. In the diagram on the right (b) we see the loss function as a function of the epochs. For both diagrams, coloured lines correspond to GNNs trained with different Geodesic to total (Euclidean + Geodesic) filter ratio, namely 0% (only Euclidean), 25%, 50%, 75%, 87.5% and 100% (only Geodesic).

d.7 Full stress field VS maximum stress

We investigate the option to directly predict the maximum Von Mises stress on the surface of the dogbone structure. To this end, we train 2 identical GNNs on patches extracted from 100 FE simulations and we evaluate them in a different set of patches extracted from 100 FE simulations. One of the GNNs directly predicts the maximum Von Mises stress and the other the full stress tensor. Both GNNs have 5 GN blocks, independent encoder, 128 filters (32 Euclidean and 96 Geodesic) and a maximum of 20 neighbours per node. The results can be found in [Fig 38]. We observe that the 2 GNNs have similar accuracy, but the one predicting the full stress has slightly higher values. We conclude that predicting the full stress is beneficial since it not only results in better prediction of the maximum Von Mises stress but it also provides us with the full stress tensor that can be used for instance for the online bias corrections.

Figure 38: Accuracy curves as a function of the threshold value used for the relative error. The accuracy curves are defined using the maximum Von Mises stress. The blue line corresponds to a GNN that predicts the full microscale stress distribution first and then from this the maximum Von Mises stress in the ROI of the patch. The orange line corresponds to a GNN that directly predicts the maximum Von Mises in the ROI of the patch.

d.8 Patches VS full structure

Lastly, we want to evaluate to what degree our choice to train the GNN using patches of the geometry affects the quality of the results. This choice stems from the fact that defects only locally affect the macroscale stress field and distant areas do not offer significant information. On one hand, training on patches has the benefit of making the GNN unaware of the specific structure and it encourages it to focus on predicting how the microscale features affect the global stress field, thus leading to improved generalisation. On the other hand, our choice for the size of ROI and patch has to be careful so that the network has all the necessary information to make predictions in the ROI. Choosing to train the GNN using the entire structure alleviates us from this problem. We perform an experiment to compare the two strategies. We train two GNNs with the same parameters, namely 5 GN blocks, residual connection, independent encoder, 128 filters (32 Euclidean and 96 Geodesic) and a maximum of 20 neighbours per node. The first one is trained using patches from 400 FE simulations (10 patches from each simulation) and the second one is trained directly on the 400 FE simulation results. We evaluate both GNNs on a separate set of patches extracted from 100 FE simulations. The results can be found in [Fig 39]. We can observe that for the GNN that was trained on patches the maximum Von Mises in the ROI of the patches is less scattered around the line, true value. The coefficient of determination for the patch case is 0.79 compared to 0.71 for the full case and the accuracy significantly drops from 70% for the patch case to 55% for full dogbone case. Thus we conclude that our choice to train the GNN with patches indeed leads to improved generalisation. We also compare the Von Mises stress distribution on the surface of a dogbone predicted by the GNN trained with patches with the GNN trained with the entire structure. In [Fig 40] we see that the GNN that was trained using patches more accurately predicts the high stress area compared to the GNN that was trained with the full structure.

(a) patches
(b) full
Figure 39: In both diagrams the x-axis corresponds to the maximum Von Mises stress in the ROI of the patches calculated by FE simulations and the y-axis to the maximum Von Mises stress in the ROI of the patches predicted by the GNN. The yellow lines correspond to the 15% relative error threshold. The accuracy is defined as the percentage of points with relative error less than 15%. The image on the left (a) corresponds to a GNN trained with patches extracted by 400 FE simulations and the image on the right (b) to a GNN trained directly on the same 400 FE simulations. We can observe that the GNN trained on patches outperforms the GNN trained on the full structures both in terms of accuracy and coefficient of determination.
Figure 40: Von Mises stress on the surface of a dogbone structure (top). On the bottom of the image we have zoomed in the high stress area. From left to right we see the FE result, the prediction of the GNN trained with patches and the GNN prediction of the GNN trained with full structures. We observe that the GNN that is trained with patches is in better agreement with the FE results compared to the one trained on the full structures.

Appendix E Ensemble Kalman Method

e.1 Kalman method for data assimilation

The Kalman method can be used to update the -dimensional state vector of a system, , after observing some data, . We assume that the state vector follows a Gaussian distribution, , that we call prior. This distribution has a mean

and a variance

.

(17)

The data, , are assumed to be noisy. Specifically, we assume that the data follow a Gaussian distribution, centered around which is the value of data if no noise is present, and with a variance , that accounts for the noise. The matrix is called observation matrix. Consequently, the data likelihood can be expressed as

(18)

Using the Bayes theorem, we can obtain the posterior distribution, .

(19)

Where the posterior mean is calculated as

(20)

And the posterior variance is calculated as

(21)

The term is called Kalman gain matrix and is calculated as

(22)

e.2 Ensemble Kalman method for data assimilation

The ensemble Kalman method is an approximate version of the Kalman method where the state distribution is represented by a sample (or an ensemble) of the distribution of size . This leads to the creation of a prior ensemble matrix whose columns correspond to the ensemble samples.

(23)

The state covariance, , is replaced with the ensemble covariance, .

(24)

A data matrix, , is created such as

(25a)
(25b)
(25c)

The new Kalman gain matrix is calculated as

(26)

The posterior distribution, , is calculated as

(27)

e.3 Observation matrix free implementation

We can further simplify the calculations by avoiding to explicitly define the observation matrix. Instead we can define a function that will provide the noise free value of the data. This function is called observation function and is of the form

(28)

The posterior can be written as

(29a)
(29b)
(29c)
(29d)
(29e)