1 Introduction
Multiscale computational analysis is very common in various engineering areas such as geoengineering (heterogeneous soil layers and subterranean cavities), biomechanical engineering (bones and soft tissues) and mechanical engineering (composite and architectured materials). Finite Element Analysis (FEA) in these structures is notoriously challenging since the FE mesh required to fully resolve the finer scale features has to be very dense resulting into intractable computational problems.
Common approaches to tackle multiscale problems usually fall either in the homogenisation or concurrent scale coupling approaches. In both cases the multiscale problem is split into a macroscale and a microscale problem. The macroscale problem diffuses the overall stress field in the entire structure, and the microscale computations are performed to correct the macroscale fields. In homogenisation it is assumed that the gradients of the macroscale displacement are homogeneous over the entire material sample (i.e. the representative volume element, RVE) [SanchezPalencia, 1987; Zohdi and Wriggers, 2005]. If that assumption does not hold, for instance due to fast macroscale gradients induced by sharp macroscale features, then concurrent scale coupling methods are employed that are usually more expensive and intrusive compared to RVE based methods. In these techniques the results of homogenisation are applied to the boundary of regions of interest for concurrent microscale corrections to be performed [Raghavan and Ghosh, 2004; Oden et al., 2006; Kerfriden et al., 2009; Hesthaven et al., 2015; Paladim et al., 2016; Krokos et al., 2021] which may be performed in a purely downward approach [Oden et al., 1999], or with 2way coupling [Gendre et al., 2009]
In this work we propose a Neural Network (NN) based concurrent scale coupling method to tackle 3D multiscale problems and specifically we focus on porous media with no separation of scales. Neural Networks (NNs) have been successfully employed in solving engineering problems by reproducing the output of simulation codes in a variety of fields like fluid dynamics [Lye et al., 2020; Raissi et al., 2020], solid mechanics [Nie et al., 2019; Jiang et al., 2020; Deshpande et al., 2021] and computational biology [Senior et al., 2019, 2020]. Most of the publications that focus on learning the response of FE models employ Convolutional Neural Networks (CNNs), that are very efficient in working with images [Nie et al., 2019; Mendizabal et al., 2020; Sun et al., 2020].
Unfortunately, most of the aforementioned works deal with 2D problems. One of the reasons, in our opinion, is that 3D CNNs are very expensive and thus the literature on the subject is limited [Mendizabal et al., 2020; Rao and Liu, 2020] and mostly focused on segmentation [Buda et al., 2019; Lu et al., 2019] and classification tasks [Khosla et al., 2018; Vu et al., 2020]. An important reason for this is that representing arbitrary geometries with voxels is highly inefficient because the resolution of the image has to be constant while the geometry of the object can be better described with a variable resolution. Specifically, for multiscale problems, this means that the voxel size needs to be very small to capture the microscale features. This results in a very computationally wasteful data representation that uses a large number of voxels to represent areas with no microscale features that could be represented with only a few voxels. This is also true for CNNs applied to 2D problems, but the computational cost is disproportionately larger in 3D problems. We will follow a different approach, that of geometric learning.
Another problem with current deep learning models is their poor generalisation in cases where the training dataset is small or different from the cases we want to predict. The problem is more prominent when the computational model is very expensive and cannot be evaluated multiple times or when no computational model is in place and experiments need to be performed. Consequently, it is difficult to create a representative and large enough training dataset to get reasonable predictions.
Our paper addresses these two points by extending our previous work [Krokos et al., 2021], while adding two novel algorithmic elements. First, we manage to tackle 3D problems by using geometric learning where instead of voxelised images we learn directly from the mesh that is used to perform the FE simulations. Second, we use physics based online corrections to improve the generalisation of the NN without the need of creating more data.
Geometric learning can be seen as an extension of previously established NN techniques from Euclidean, grid structures to nonEuclidean, graph and manifold structures. The NNs that perform geometric learning are called Graph Neural Networks (GNNs). GNNs can overcome the constant voxel resolution problem by operating directly on the FE mesh used to solve the FE problem. The FE mesh can have a large number of triangles or tetrahedra in areas where the geometry is complex or large gradients are expected during the FEA and a small number of triangles or tetrahedra in the rest of the cases. This adaptive resolution strategy allows for a very efficient use of computational resources. Additionally, because we are primarily interested in predicting the maximum stress, GNNs allow us to further reduce the computational cost. Specifically, we show that the maximum stress, in the cases we examine, is present on the surface and not the volume of the mesh [3.2] and thus we choose to extract and work only on the surface mesh which greatly reduces the training time.
We use Ensemble Kalman method [Evensen, 1994] as the online correction technique. The difference with respect to the original Kalman method is that the covariance matrix is replaced by the sample covariance. This reduces the computational cost and makes this variant suitable for problems with a large number of variables such as in our case where the number of variables is equal to the number of nodes on the FE mesh used to solve the PDE. In order to use the Ensemble Kalman method we need to convert the GNN into a Bayesian GNN (BGNN). In this work we use the Bayes by Backprop (BBB) method [Blundell et al., 2015]
to quantify the uncertainty of the GNN and convert it into a BGNN. The online correction step takes as input the BGNN output, prior distribution, and performs an update step using Bayes’ Theorem, which produces the updated distribution, posterior, that satisfies the desirable physical law. This technique allows us to improve the BGNN prediction by taking advantage of the physics in place without creating new data or retraining the BGNN.
BGNNs are a very useful tool even outside the online correction framework, since they can provide credible intervals (CIs) along with a mean prediction. When the BGNN makes a prediction for a case that is different from the cases in the training dataset the CIs will be broad reflecting the fact that the BGNN is uncertain for the prediction. In case where the CIs are very tight this implies that the BGNN is certain for the prediction. The BGNN not only warns the user for the potential uncertainty in the prediction but it also attempts to squeeze the correct value between an upper and a lower value.
Our paper is organised as follows. In section [2] we present the reference multiscale mechanical model that we aim to solve online, and we introduce useful definitions for our methodology. In section [3] we discuss the GNN architecture and the input and output features. Moreover, in section [4] we explain the strategy that we follow to convert the deterministic GNN to a BGNN. Furthermore, in section [4.2] we introduce the online correction method we use. Finally, in section [5] we apply our multiscale GNN method to two different problems, we demonstrate the effectiveness of the GNN on accurately predicting the maximum stress values and the stress distributions and also we demonstrate the online stress correction technique that we use to improve the GNN prediction in common cases where NNs fail namely small training dataset and extrapolation.
2 Deeplearningbased metamodelling for multiscale finite element analysis
2.1 Problem statement: linear elasticity
We consider a 3D body occupying domain with a boundary . The body is subjected to prescribed displacements on its boundary and prescribed tractions on the complementary boundary = . We consider a body force f and a displacement . For an initial configuration x, the boundary value problem of linear elasticity consists in finding where the potential energy is defined as
(1) 
We consider a linear SaintVenant–Kirchhoff material model defined by its strain energy density
(2) 
In the equations above, and are the Lamé elasticity parameters and
the linearised strain tensor defined as
(3) 
The Cauchy stress tensor is calculated as follows:
(4) 
2.2 Case study: mechanical specimen made of a porous material
For the purpose of training the GNN we need to create multiple realisations of a multiscale geometry with both macroscale and microscale features. In this work we consider a structure that we will refer to from now on as dogbone [Fig 1]. The dogbone specimen has a cylindrical hole as a macroscale feature. The material is porous, made of randomly distributed spherical pores as microscale features.
2.3 Mesh and Finite element solver
The FE meshes are created using the open source 3D finite element mesh generator gmsh
[Geuzaine and Remacle, 2009], using pygmsh [Schlömer, 2021] as a python interface. Unstructured tetrahedral meshes will be used.The FE problems, comprising the macroscopic surrogates models and the full finescale models (as defined in the next subsection [2.4
]), will be solved using the FEniCS software which is a python based popular opensource computing platform for solving partial differential equations (PDEs)
[Alnæs et al., 2015; Logg et al., 2012]. We use standard linear tetrahedral elements to solve the elasticity problem.Meshes are manipulated using the python package pyvista [Sullivan and Kaszynski, 2019] that allows for easy interaction with VTK objects through Numpy arrays.
2.4 Globallocal framework
The multiscale framework that we use here is a direct extension of our previous work [Krokos et al., 2021]. As mentioned in section [2.2], the examples that we show in this work are dedicated to linear elasticity for porous media made of a homogeneous matrix with a random distribution of microscale features that are modelled as spherical voids.
Firstly, we use a coarse finite element mesh that ignores the existence of the microscale features, to obtain the macroscale solution, with a small computational cost, over the entire domain . Then, we assume there exists a function that takes as input the geometry of the microscale features and the local macroscale solution in a window , which we call patch, and outputs the microscale stress field in a subregion of namely that we call Region of Interest (ROI) [Fig 2].
The function will be approximated through a GNN. The existence of such function is not proved mathematically but it is supported by the SaintVenant’s principle, our previous work [Krokos et al., 2021] and will be shown to be valid via numerical experiments in the context of the present paper, which extends previous findings to the threedimensional setting.
2.5 Quantity of interest and accuracy measure
We are interested in predicting a mechanical quantity that indicates potential crack initiation sites. In this work we choose the Von Mises stress (VM) for this purpose, which can be calculated from the following formula
(5) 
Additionally, we will define accuracy in a way that serves the purpose of accurately predicting the maximum Von Mises stress in the patches. We consider that the maximum Von Mises stress is accurately calculated in the ROI of the patch if it is predicted with a relative error less than the threshold of 10%, unless stated otherwise. This procedure is summarised in [Appendix A].
3 Graphbased geometric deep learning
In this section we briefly review geometric learning methods reported in literature. Also, we explain the geometric learning framework that we choose to use and finally we discuss the structure of the GNN along with the input and output features.
3.1 Brief review of earlier work on geometric deep learning
GNNs have been vastly developed over the last years with various works on segmentation [Hanocka et al., 2019; Schult et al., 2020; Lei et al., 2021] and shape correspondence or retrieval tasks [Masci et al., 2018; Gong et al., 2020]. One early attempt of deep learning in NonEuclidean spaces is introduced by [Qi et al., 2017] where a method for geometric learning using point clouds is described. Even though point clouds is a very simple way to describe 3D objects, a more common and informative way is by using a mesh. One common approach for geometric learning with meshes is described in [Hanocka et al., 2019] where a generic framework for processing 3D models for classification and segmentation tasks is described. MeshCNN operates directly on the mesh and not a voxel based representation of the object, which leads to increased computational cost, or a point cloud, which results in loss of valuable topology information.
GNNs have also been used for various engineering applications. [Vlassis et al., 2020] used geometric deep learning to model anisotropic hyperelastic materials. Specifically they modelled polycrystals using a graph where the nodes represent the monocrystals and the edges their connectivity. This utilization of nonEuclidean data allowed them to take advantage of the rich micro structure information not described by classical Euclidean descriptors like density and porosity. Additionally, [SanchezGonzalez et al., 2020] used GNNs to simulate particle based dynamic systems where 1 or more types of particles, ranging from water to sand, are involved. The physical state of the system is described by a graph of interacting particles and the physics is predicted by message passing through the nodes (particles) of this graph. Later, [Pfaff et al., 2021] used GNNs for mesh based dynamic simulations. In contrast to particle based problems, in this work the authors took advantage of the connectivity of the mesh to define the edges of the graph. They demonstrated their method in a variety of applications such as deformable bodies, fluid flow around obstacle and air flow around an aircraft wing. Specifically for the air flow around an aircraft wing they show that GNNs have superior performance compared to CNNs. Recently, [Mylonas et al., 2022] used a Bayesian GNN to infer the position and shape of an unknown crack via patterns of dynamic strain field measurements at discrete locations. Lastly, [Lino et al., 2021] developed a multiscale GNN that efficiently diffuses information across different scales making it ideal for tackling strongly non local problems such as advection and incompressible fluid dynamics.
Most of these methods are using the formulation described in [Battaglia et al., 2018] which provides a general framework to work with GNNs. We use the same formulation since it allows for a very natural encoding of information on the mesh where absolute information, for instance material properties, can be encoded on the vertices of the mesh while relative information, for example distance between vertices, is encoded on the edges of the mesh.
3.2 Geometric learning for multiscale stress analysis: assumptions and justification
As can be seen from the literature review, researchers can take advantage of the flexibility that GNNs offer, to reduce the computational cost by modelling their system in more meaningful ways that stem from the physical understanding of the underlying problem, as for example [Vlassis et al., 2020] who modelled the behaviour of a polycrystal by considering every crystal as a node of the graph instead of operating on the volume mesh. Following this paradigm, for reasons we explain in the next paragraphs, we choose to work only on the surface mesh of the porous medium and not the volume mesh.
First of all, we claim that the maximum stress in the cases we examine is located at the surface of the structure, not in the bulk of it. In order to test this hypothesis we conduct an experiment where we perform 100 FE simulations with 100 realisations of the dogbone structure. For the boundary conditions we apply a displacement along the and direction of the same magnitude but with opposite direction. Afterwards, we extract from the volume mesh the surface mesh with the stress tensors encoded on the nodes of the surface mesh [Appendix B]. Lastly, we calculate the maximum Von Mises stress both on the volume and surface mesh and we compare the two. From [Fig 3] we can conclude that indeed the maximum volume stress is the same as the maximum surface stress.
Additionally, we hypothesise that the surface mesh is a comprehensive representation of the geometry of the specimen. Lastly, we assume that the trace of the, smooth, macroscale stress over the skin mesh is sufficient to inform the GNN of the macroscopic stress state over the patch. We prove these 2 hypotheses numerically in the context of the current paper.
Consequently, we choose to work only on the surface mesh since we are primarily interested in predicting the maximum stress. This greatly reduces the memory requirements and the training time.
3.3 Graph construction
In our attempt to perform geometric learning on the surface mesh of a dogbone structure we consider a graph that can be described by a set of nodes where is the number of nodes of and a set of edges where is the number of edges of .
The nodes of the graph coincide with the nodes of the surface mesh, since their position is optimised by the mesh generation software to best describe the geometry of the structure. The edges of the graph are used to pass messages between the nodes and they define the connectivity of the graph. Consequently, the edges of the graph cannot coincide with the edges of the surface mesh since the surface mesh has disconnected areas. This means that it is possible for areas of the surface mesh that are close to each other to not share an edge, for example two interacting but non intersecting defects. This does not allow passing of information between the two and thus their interaction cannot be modelled. To overcame this problem we define 2 different neighbourhoods [Fig 4].

Geodesic: The Geodesic, or 1ring neighbourhood, includes the nodes that share an edge with the central node. This is equivalent to the neighbourhood dictated by the surface mesh connectivity.

Euclidean: The Euclidean neighbourhood in this work is defined using the radius neighbours. Every node , is connected with every node that falls into a sphere with a predefined radius and with centre . The predefined radius that is used for this work is 4 times the radius of the spherical voids as suggested in [Krokos et al., 2021]. In practice that results into a very high number of neighbours and thus into a very expensive computational problem. To tackle this problem we need to restrict the number of neighbours for each node by setting a threshold and randomly choosing neighbours so that the total number of neighbours is below the threshold. In section [D.5] we discuss the most appropriate value for this threshold.
In this work we use either Euclidean or a combination of Euclidean and Geodesic neighbours, as proposed by [Schult et al., 2020], both of which allow the modeling of interactions between nonintersecting parts of the mesh.
The edges of the graph are bidirectional so that the information can be exchanged both ways between two connected nodes. This means that for every edge passing a message from node to there exists the edge passing a message from node to .
3.4 Input and Output features
The framework we use requires the input and output graphs to be of the same structure. Consequently, we choose to interpolate the 3D FE solution from the macroscale mesh, i.e. mesh without spherical voids, to the microscale mesh, i.e. mesh with the spherical voids.
For the input of the GNN we encode node features, where , on the nodes of the graph and edge features, where , on the edges of the graph. The input node features are the independent components of the macroscale stress tensor along with the microscale feature indicator, a single integer indicating if the node corresponds to a microscale feature or not. The input edge features are the relative position between the 2 nodes that the edge is connecting along with the distance between the 2 nodes. This is summarised in [Fig 5].
The output of the graph contains information only on the nodes of the mesh. Specifically, the output node features are the components of the microscale stress tensor.
We sum up in [Fig 6] the input and output of the GNN. We have only included the node features since the edge features are calculated as a preprocessing step from the connectivity of the mesh.
3.5 GN block
The GN block is the basic building block of the GNN and it is used to map an input graph to an output graph with the same nodes and edges but with updated node and edge features. The node and edge features are jointly passed through an edge update Multilayer Perceptron (MLP) and a node update MLP in a 2step procedure described in
[Battaglia et al., 2018] that sequentially updates the edge features first and the node features later. A sketch of this procedure can be found in [Fig 7].For the edge update step, if we consider an edge with edge features that sends a message from the sender node with node features to the receiver node with node features then the updated edge feature, , is calculated by passing through the edge MLP the concatenation of these features, )). If is the number of edges of the graph, is the number of node features and is the number of edge features then the input of the edge update MLP is of size
For the node update step, the updated edge features, , are aggregated per node. In this work we use mean aggregation although elementwise summation, maximum or any other symmetric operation can be used. After the aggregation step, we have a single edge feature, , corresponding to each node feature . The updated node features, , are calculated by passing through the node MLP the concatenation of the original node features with the updated and aggregated edge features corresponding to this node, )). If is the number of nodes of the graph, is the number of node features and is the dimensionality of the updated edge features then the input of the node update MLP is of size .
3.6 Model
In this work we deploy an EncodeProcessDecode GNN as described by [Battaglia et al., 2018]. The encoder, encodes the node and edge features in a latent space of higher dimensions. The processor updates these node and edge features and finally the decoder, decodes these features from the latent space to the output space.

Encoder: The node and edge features are independently encoded into a latent space of 128 dimensions using 2 distinct MLPs with the same structure. The MLP has 3 hidden layers followed by a Dropout layer [Hinton et al., 2012]
and ReLU activation function each and a Layer Normalization layer after the output layer, that improves training stability and decreases training time in recurrent NNs
[Ba et al., 2016]. 
Processor: The processor is composed of 5 residual GN Blocks. By using residual blocks the NN itself can choose its depth by skipping the training of a few layers using skip connections. This is currently the most common way to train deep NNs [He et al., 2015; Kim et al., 2016; Zagoruyko and Komodakis, 2017; Lim et al., 2017] and it also applies to GNNs [SanchezGonzalez et al., 2020; Pfaff et al., 2021; Mylonas et al., 2022]. The structure of the MLPs of the Processor is the same as the structure of the encoder MLPs. The processor combines information from edges and nodes and passes messages between nodes that share an edge.

Decoder: The decoder operates only on the node features as the output of the GNN is the microscale stress field on the nodes. A single MLP is used to decode the node features from the latent space to the output space. The decoder MLP is similar to the MLPs used in the encoder and the processor but there is no Layer Normalization layer applied after the output layer
There is a skip connection that connects the macroscale stress tensor, from the input, with the output of the Decoder. That way the network can learn how the microscale stress field diverges from the macroscale stress field. In section [D.1] we demonstrate that this leads to smoother and faster convergence. A sketch of the GNN can be found below [Fig 8].
4 Stress correction using homogeneous Neuman conditions: an online Kalmanbased approach
In this section we introduce an online hierarchical Bayesian method to constrain the prediction of the GNN. This method allows us to improve the network prediction without creating more data and retraining the network, which is computationally expensive and requires access to the function that creates the training data. We impose homogeneous Neumann conditions, that cannot be explicitly imposed while training the network, even though the training examples satisfy them up to the FE accuracy.
To apply this method we need a distribution of outputs from the GNN. To achieve that, we convert the deterministic GNN to a Bayesian GNN (BGNN). The posterior estimation of the BGNN for the stress field over the surface of the structure becomes a prior for the online assimilation step. Physical constraints are applied statistically by introducing them as partial information of the stress field, yielding a Bayesian posterior update. We chose to resort to a Gaussian approximation of the posterior in the form of the ensemble Kalman method. Alternative methods include Langevin MCMC and Variational Inference, however the online cost may prove to be prohibitive.
4.1 Bayesian GNN
Several techniques have been proposed to quantify the uncertainty in NNs resulting in Bayesian NNs [Hinton and van Camp, 1993; Graves, 2011; Kingma and Welling, 2014; Blundell et al., 2015]. In this work we use the Bayes by Backprop method (BBB) to convert the deterministic GNN to a BGNN [Blundell et al., 2015].
In order to define the optimisation objective, we consider a probabilistic NN where we replace the constant weights of a deterministic NN with a distribution over each weight. Before observing the data, , the weight distribution is called prior, , and after observing the data the weight distribution is updated, so that the BNN will fit the data, and called posterior . For an input
the output of the probabilistic NN will be a probability distribution over all possible outputs
. Minimising the loss function results in determining the posterior distribution of the weights given the training data,
. The posterior cannot be computed analytically through the Bayes’ rule, , because of the existence of intractable integrals. A common approach is to perform Variational Inference (VI) to approximate the posterior with a variational distribution [Hinton and van Camp, 1993; Graves, 2011], parameterised by . The new objective is to find the parameters [Eq. 6] that minimise the KullbackLeibler (KL) divergence between the approximate posterior and the true Bayesian posterior.(6) 
We consider the approximate posterior to be a fully factorised Gaussian [Graves, 2011], corresponding to a squared loss [Blundell et al., 2015]
. The prior is also a Gaussian distribution corresponding to L2 regularization
[Blundell et al., 2015]. Each layer has a single prior and thus a single prior mean,, and a single prior standard deviation,
. In a forward pass of the BGNN the weights are sampled from the posterior distribution. This creates a problem during backpropagation since we cannot define a gradient for this operation with respect to the weights. To tackle this problem we couple the BBB method with the reparametrisation trick
[Kingma and Welling, 2014], as described in [Blundell et al., 2015], which replaces the sampling from the posterior with sampling from a unit Gaussian scaled by the posterior standard deviation, , and shifted by the posterior mean, . The standard deviation needs to be positive, thus it is parameterised using a parameter, , as . Consequently, the variational parameters to be optimised are , where and .4.2 Ensemble Kalman method for stress correction
The ensemble Kalman method is used to update the prior state vector of a system,
. In this work the state vector corresponds to the microscale stress components predicted by the BGNN on the nodes of the mesh. In order to generate the prior ensemble we generate a number of samples from the BGNN. For node of sample we have(7) 
Assuming that the mesh has nodes and that we draw samples from the BGNN the prior ensemble matrix can be expressed as
(8) 
where
(9) 
The updated stress distribution (posterior), , after observing the data , can be obtained as
(10) 
where H is the observation matrix that is used to provide the noise free value of the data, , and is the Kalman gain matrix, calculated as
(11) 
where is the error of the data and is the ensemble covariance calculated as
(12) 
The data, , reflect the homogeneous Neumann conditions that we impose on every node
(13) 
Instead of explicitly defining the observation matrix we use an observation function, , (as described in appendix E.3) and we calculate the term on every node as
(14) 
where is the symmetric microscale stress tensor and is the unit normal vector
(15) 
For more details the reader can refer to [appendix E]
5 Numerical Examples
5.1 Cubical Heterogeneous material
In this section we want to test our methodology using a dataset containing samples from a heterogeneous material. We will demonstrate how we can use our methodology to

predict the microscale stress distribution

predict the maximum microscale Von Mises stress over the patches

quantify the uncertainty of the prediction

improve the BGNN prediction using the online correction technique introduced in section [4.2]
5.1.1 FEA setup and generation of the training dataset
For the purpose of training our model we will work with a synthetic heterogeneous material from which we extract and examine cubical specimens. The size of each specimen is 2 units along each spatial dimension. The specimen has two elliptical pores, with random position, size and orientation and a distribution of 50 to 100 spherical pores with the same radius, units. The spherical and elliptical pores are generated without intersecting. For the macroscale simulations, input of the BGNN, only the elliptical pores are taken into account and the spheres are ignored. Two realisations of the cubical specimen are shown in [Fig 9].
As already explained in [2.4] the input to the GNN is a patch of the geometry, not the entire structure, and the prediction of the GNN happens in a sub region of the patch that we called ROI. The patch and ROI size depend on the radius of interaction of the spherical pores. Based on the work of [Pilkey and Pilkey, 2008] on stress intensity factors, we assume that the effect of spherical voids on the global stress field fades out for a distance larger than 4 radii from the centre of the spherical voids. We combine this information with our previous work [Krokos et al., 2021] and we choose a patch size of and a ROI size of , where is the radius of the spherical voids, as can be seen in [Fig 10].
The patches that are extracted never intersect with the exterior boundaries of the FE mesh. Additionally, we apply homogeneous Dirichlet boundary conditions away from the material specimen, as represented in [Fig 11], [Eq. 16].
(16) 
where , , , , , are farfield load parameters, is the position of a point in and is the initial position of the centre of the body in .
For data preprocessing we use a simple linear rescaling to map the input data to a range from 1 to 1. Further information can be found in [appendix C]
The dataset we use consists of 200 FE simulations completed in 8.7 hours on 5 cores on an Intel^{®} Xeon^{®} Gold 6148 CPU @ 2.40GHz CPU. From every FE simulation we extract 5 patches, at random locations in the cubic specimen, resulting in 1,000 data points, 900 for training and 100 for testing.
5.1.2 Training parameters
For the training of the network we used a batch size of 2 and the Adam optimiser with an initial learning rate of
decreasing by 5% every 50 epochs. Additionally, we identified some key parameters for the training of the GNN and performed several tests to identify their optimum values. We concluded that for achieving optimum behaviour we will use a GNN with 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 10 neighbours per node. More details can be found in [Appendix
D.1  D.5].Training on 900 patches using the parameters identified above requires 6 hours on an Nvidia V100 GPU with 16GB of RAM with a maximum memory requirement of 4.3GB.
5.1.3 BGNN MAP prediction
In this section we will examine the mean prediction provided by the BGNN, without referring to the uncertainty estimation.
In [Fig 11(a)] we observe that the maximum stress components predicted by the BGNN are in a good agreement with the ones calculated by the FEA. Specifically, the coefficient of determination, , for all the 6 stress components is larger that 0.98. In [Fig 11(b)] we observe similar results for the maximum Von Mises stress. Although the points are more scattered and the coefficient of determination dropped to 0.84, the accuracy is 96%.
Additionally, we examine the stress distribution on the FE mesh. In [Fig 13] we compare the microscale Von Mises stress in the ROI of a patch as predicted by the GNN with the one calculated by FE simulations. We observe that the predicted stress distribution is very close to the groundtruth.
5.1.4 BGNN uncertainty estimation
After studying the quality of the predictions provided by the mode of the BGNN, we propose to evaluate the ability of the BGNN to quantify the uncertainty of the prediction and provide credible intervals (CIs) that reflect the error in the prediction. From [Fig 14] we can see that the percentage of points inside the 95% CIs is 92%, which is satisfactory.
5.2 Online bias correction
In this section we will demonstrate the use of the Ensemble Kalman method, section [4.2], to improve the BGNN prediction. We will examine 2 realistic cases, first case is to improve the prediction of an under trained network and second case is to improve the prediction of a network in an extrapolation regime.
5.2.1 Under trained GNN
With the phrase under trained we refer to a network for which we use a small dataset for training. In [Fig 15] we show the accuracy plots for the mean BGNN prediction as a function of the epochs for 2 GNNs trained with the same parameters but the one on the left uses data from 32 FE simulations for training and the one on the right uses data from 100 FE simulations for training. The BGNN trained with more data performs better than the undertrained network.
Using the Ensemble Kalman method we want to update the output distribution of the BGNN, prior, and obtain the posterior. In [Fig 16] we can see the posterior and the prior for the prediction of the maximum Von Mises stress. The coefficient of determination between the posterior and the FE results has a value of 0.3074 which is larger than the coefficient of determination between the prior and the FE results that has a value of 0.0176. Also we can see that best line that fits the posterior data is closer to the line (ideal result) compared to the best line that fits the prior data, i.e. the output of the BGNN, without the online correction. Therefore, the posterior mean better fits the data compared to the prior mean and thus the online stress correction resulted in an improvement of the mean BGNN prediction.
Furthermore, we want to examine the posterior distribution and not only the posterior mean value. In [Fig 17] we see the comparison between the prior 95% CIs (left) and the posterior 95% CIs (right). Firstly, we can observe that even in the prior case, where we have only used 32 FE simulations, the 95% CIs give a good estimation of where the real value is. We can see that the percentage of points that are inside the 95% CIs is 79%. Additionally, we see that this value climbs to 88% in the posterior case. This clearly means that by using the online stress update we were able to improve the BGNN prediction without adding more data to the training dataset. We note that we only update the mean value and the posterior standard deviation is taken equal to the prior one.
Lastly, in [Fig 18] we compare the mean Von Mises stress distribution on the mesh before and after the stress update with the one calculated through FE simulations. We can see that the posterior, middle image, is closer to the FE results, image on the left, compared to the prior, image on the right.
5.2.2 Prediction for outoftraining microstructural inputs
In this section, with the phrase outoftraining we refer to a network trained with spherical defects but evaluated in cases that have elliptical defects. The BGNN is extrapolating which is a particularly undesirable but common scenario for real applications. In [Fig 19] we show a realisation of the training dataset on the left and a realisation of the test dataset on the right.
In [Fig 20] we see the comparison between the prior 95% CIs (left) and the posterior 95% CIs (right). Firstly, we can observe that even in the prior case, where the BGNN extrapolates, the 95% CIs give a good estimation of where the real value is. We can see that the percentage of points that are inside the 95% CIs is 91%. Additionally, we see that this value climbs to 95% in the posterior case. Most importantly we see that using the ensemble Kalman method we managed to avoid underpredicting the maximum values. This clearly means that by using the Ensemble Kalman method we were able to improve the BGNN prediction without adding more data to the training dataset. We note that we only update the mean value and the posterior standard deviation is taken equal to the prior one.
5.3 Dogbone
After demonstrating the applicability of our GNN and studying the online correction technique, in this subsection we apply the proposed deep learning methodology to a more challenging case. We also provide multiple examples of the Von Mises stress distribution on the surface of the mesh.
5.3.1 Training dataset
The geometry that we choose to model is a dogbone with a single hole in the middle that does not change from one realisation to another, and with a random distribution of 50 to 100 spherical pores. The size of the dogbone is 2 units along the axis (length), 0.6 units along the axis (height) and 0.08 units along the axis (width). The radius of the hole is 0.032 units and the radius of the spherical pores is 0.016 units. The spherical pores may intersect with each other, with the boundaries of the geometry and the hole. For the macroscale simulations, input of the BGNN, only the hole is taken into account and the spherical pores are ignored. In [Fig 21] we show 2 examples of the geometry. For the boundary conditions we apply a displacement along the direction of the same magnitude but of opposite sign.
We performed 500 FE simulations completed in 5.7 hours on 10 cores on an Intel^{®} Xeon^{®} Gold 6148 CPU @ 2.40GHz CPU. From every FE simulation we extract 10 patches resulting in 5,000 data points, 4,000 for training and 1,000 for testing. As explained in section [5.1.1] the patch is of size and the ROI is of size , where is the radius of the spherical pores.
5.3.2 GNN parameters
For the GNN training we used the Adam optimiser with an initial learning rate of decreasing by 5% every 50 epochs. Additionally, we started with the parameters that we identified in [5.1.2
], namely 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 10 neighbours per node. After experiments we concluded that in this case the GNN can benefit from a maximum of 20 neighbours per node but all the other hyperparameters remained the unchanged. Training of the GNN was performed on an Nvidia V100 GPU with 16GB of RAM.
In this more challenging problem we investigate the idea of using dual convolutions. Instead of only considering the Euclidean neighbourhood we also consider the Geodesic neighbourhood of every node in order to perform joint Geodesic and Euclidean convolutions as explained in [3.3]. After experiments, we concluded that in our case the optimum value for the ratio between Geodesic and total convolutions is 75%. For more details the reader can refer to [Appendix D.6].
5.3.3 Numerical examples with Deterministic GNN
As can be seen in [Fig 21(a)] when training with 4000 patches the accuracy for the maximum Von Mises stress is 71% which is considerably lower than the accuracy reported for the cubical heterogeneous material case, 96%. In [Fig 21(b)] we can also evaluate the performance of the GNN on the entire dogbone and not only at patch level. The accuracy does not change considerably in this case but the data are more spread around the line which is indicated by the lower coefficient of determination .
Furthermore, we demonstrate the quality of the stress distribution. In [Fig 23] and [Fig 24] we show that the stress field predicted by the GNN is in good agreement with that provided by direct FEA.
Lastly, we are interested in studying the predicted stress distribution in the highly stressed regions of the specimen. In [Fig 25] we have zoomed in a region where a spherical void interacts strongly with the cylindershaped hole, and in [Fig 25] we have zoomed in at a strong interaction between 2 spherical voids. We observe that in both cases the GNN successfully located the area where the maximum Von Mises occurs, predicted its value and the stress distribution around it.
5.4 Variable Dimension Dogbone
In this section we demonstrate the ability of our method to be used for exploration of responses in a space of shapes described by a few geometrical parameters. To this end, we perform geometric learning in a family of dogbone shapes where not only the spherical void distribution will be different from realisation to realisation but also the cylindricalhole and the dimensions of the dogbone. We focus on the dependence of the quality of the results with respect to the size of the dataset both in terms of of maximum Von Mises stress and stress distribution.
5.4.1 Training dataset
To create this dataset we follow the same process as in the Dogbone section, 5.3. The difference lays on the fact that here the dimension of the specimen is not constant. The length of the dogbone is equal to 2, the width varies from 0.06 to 0.12 and the height to varies from 0.24 to 0.7. Additionally, the number of cylindrical holes varies from 1 to 4 in every realisation, and their radii from 0.024 to 0.06. Two realisations of the variable dogbone geometry can be found in [Fig 26].
5.4.2 Numerical Examples with variable dimension dogbone
We investigate the performance of the GNN with respect to the size of the training dataset. We use patches extracted from 100 FE simulations for the test dataset and 4 training datasets consisting of patches extracted from 50, 100, 200, and 400 FE simulations. Results can be found in [Fig 27]. From [Fig 26(a)] we can observe that the accuracy increases as we increase the size of the training dataset, from 50.21% for the GNN trained with 50 FE simulations to 64.30% for the GNN trained with 400 FE simulations. We see the opposite trend in [Fig 26(b)] where the loss decreases in fewer epochs and reaches a lower value as the size of the training dataset increases. Compared to the previous family of dogbones, where the dimensions of the dogbone and the cylindrical hole did not change from one realisation to another we can observe that the accuracy that can be achieved using 400 FE simulations decreases from 71% to 64% for the variable dogbone case. This is an expectable result since the space of shapes we try to learn now has more parameters than before. In order to achieve similar accuracy, we create additional 600 FE simulations and we train the GNN for 100 epochs using as initial weights the ones determined by the GNN trained with 400 FE simulations. The accuracy achieved for this case is 70%.
Additionally, in order to examine the stress distribution, in the following figure we have extracted the upper part of the structure so that we can observe the stress distribution inside. This can be found in [Fig 28], where we can see that the GNN successfully reconstructed the Von Mises distribution.
Lastly, want to examine the quality of the stress distribution with respect to the size of the training dataset. In [Fig 29] we see the Von Mises prediction for a case where not many strong interactions are present. We can observe that all the GNNs even the one trained with 100 FE simulations successfully predicts the correct Von Mises stress distribution. On the other hand, in [Fig 30] we can see a case where two intersecting defects, located at the corner of the domain, strongly interact and we can observe that the GNNs trained with less than 200 FE results fail to predict this interaction.
6 Conclusions
In this work we have developed a Neural Network (NN) based concurrent scale coupling method to deal with 3D multiscale stress predictions on porous media to overcome the need for computationally expensive direct numerical multiscale simulations. Our approach does not assume separation of scales nor does it require prior geometrical parametrisation of the multiscale problem.
Our approach is based on geometric learning which is more suitable for multiscale problems as it allows us to operate on a variable resolution mesh of the geometry rather than a fixed size voxelbased image. Additionally, we made the choice to analyse and predict only the surface stress of the geometry. This choice was based on 3 assumptions, 2 for the input and 1 for the output. For the input of the GNN we assumed that the surface representation is a comprehensive representation of the geometry and that the trace of the smooth macro stress over the surface mesh is sufficient to inform the GNN of the macroscopic stress state. For the output, we assumed and proved, that in our dataset the maximum microscale stress field can be found on the surface of the structure.
This work extended our previous work on 2D image based CNNs [Krokos et al., 2021], to 3D graphbased Graph Neural Networks (GNNs). We found the same efficiency we had in the 2D case, to predict stress fields, local maxima and quantify uncertainty.
Moreover, we introduced a way to enforce physics constraints at evaluation time, using a Bayesian framework whereby the Bayesian GNN (BGNN) predictions were considered as prior, and posterior predictions that conform to homogeneous Neumann conditions on free surfaces of the porous materials were computed via an ensemble Kalman update. We have used this method to tackle problems commonly reported with datadriven methodologies, namely lack of training data and poor extrapolation ability. We have shown that without this online correction, downward bias is observed for maximum equivalent stresses, which is problematic. This effect is alleviated thanks to the online physicsbased correction.
Lastly, we show that for MonteCarlo simulations of porous microstructure, relatively few simulations are required to train the GNN, 50 simulations already give 50% accuracy. The trained GNN can be used in an inverse problem setting for instance for the optimisation of a CAD model under uncertainty stemming from the microstructure.
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SklodowskaCurie grant agreement No. 764644.
This paper only contains the author’s views and the Research Executive Agency and the Commission are not responsible for any use that may be made of the information it contains.
Stéphane P.A. BORDAS received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 811099 TWINNING Project DRIVEN for the University of Luxembourg and from the Fonds National de la Recherche Luxembourg FNR under grant O17QCCAAS11758809. We acknowledge the support of the Supercomputing Wales project, which is partfunded by the European Regional Development Fund (ERDF) via Welsh Government.
We thank Synopsys Northern Europe Ltd for its support and specifically Dr Viet Bui Xuan for his constructive suggestions during the planning and development of this research work.
References
 The fenics project version 1.5. 3, pp. . External Links: Document Cited by: §2.3.
 Layer normalization. Note: arXiv preprint arXiv: 1607.06450 External Links: 1607.06450 Cited by: 1st item.
 Relational inductive biases, deep learning, and graph networks. Note: arXiv preprint arXiv: 1806.01261 External Links: 1806.01261 Cited by: §3.1, §3.5, §3.6.

Neural networks for pattern recognition
. Oxford University Press, Inc., USA. External Links: ISBN 0198538642 Cited by: Appendix C.  Weight uncertainty in neural networks. Note: arXiv preprint arXiv: 1505.05424 External Links: 1505.05424 Cited by: §1, §4.1, §4.1.
 Association of genomic subtypes of lowergrade gliomas with shape features automatically extracted by a deep learning algorithm. Computers in Biology and MedicineProcedia ManufacturingNeuroImageComputational Materials ScienceComputational MechanicsACM Transactions on Graphics (TOG)J. Machine Learning Res.Rendiconti del Seminario MatematicoIEEE Transactions on MagneticsPeterson’s Stress Concentration Factors, Third EditionInternational Journal for Numerical Methods in EngineeringJournal of Open Source SoftwareComputer Methods in Applied Mechanics and EngineeringSIAM Journal on Scientific ComputingComputational MechanicsSIAM Journal on Multiscale Modeling and SimulationInternational Journal for Numerical Methods in EngineeringJournal of Geophysical Research: OceansComputer Methods in Applied Mechanics and EngineeringComputational MechanicsComputer Methods in Applied Mechanics and Engineering 109, pp. 218–225. External Links: ISSN 00104825, Document, Link Cited by: §1.
 FEMbased realtime simulations of large deformations with probabilistic deep learning. Note: arXiv preprint arXiv: 2111.01867 External Links: 2111.01867 Cited by: §1.
 Sequential data assimilation with a nonlinear quasigeostrophic model using monte carlo methods to forecast error statistics. 99 (C5), pp. 10143–10162. External Links: Document, Link, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/94JC00572 Cited by: §1.
 Nonintrusive and exact global/local techniques for structural problems with local plasticity. 44 (2), pp. 233–245. External Links: Link, Document Cited by: §1.
 Gmsh: a 3d finite element mesh generator with builtin pre and postprocessing facilities. 79, pp. 1309 – 1331. External Links: Document Cited by: §2.3.
 Geometrically principled connections in graph neural networks. Note: arXiv preprint arXiv: 2004.02658 External Links: 2004.02658 Cited by: §3.1.
 Practical variational inference for neural networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, Red Hook, NY, USA, pp. 2348–2356. External Links: ISBN 9781618395993 Cited by: §4.1, §4.1, §4.1.
 MeshCNN: a network with an edge. 38, pp. 1 – 12. Cited by: §3.1.
 Deep residual learning for image recognition. Note: arXiv: 1512.03385 External Links: 1512.03385 Cited by: 2nd item.
 Reduced basis multiscale finite element methods for elliptic problems. 13, pp. 316–337. External Links: Document Cited by: §1.
 Improving neural networks by preventing coadaptation of feature detectors. Note: arXiv preprint arXiv: 1207.0580 External Links: 1207.0580 Cited by: 1st item.

Keeping the neural networks simple by minimizing the description length of the weights.
In
Proceedings of the Sixth Annual Conference on Computational Learning Theory
, COLT ’93, New York, NY, USA, pp. 5–13. External Links: ISBN 0897916115, Link, Document Cited by: §4.1, §4.1.  StressGAN: a generative deep learning model for 2d stress distribution prediction. Note: arXiv: 2006.11376 External Links: 2006.11376 Cited by: §1.
 A threescale domain decomposition method for the 3d analysis of debonding in laminates. 44, pp. 343–362. External Links: Document Cited by: §1.
 3D convolutional neural networks for classification of functional connectomes: 4th international workshop, dlmia 2018, and 8th international workshop, mlcds 2018, held in conjunction with miccai 2018, granada, spain, september 20, 2018, proceedings. pp. 137–145. External Links: ISBN 9783030008888, Document Cited by: §1.

Deeplyrecursive convolutional network for image superresolution
. Note: arXiv: 1511.04491 External Links: 1511.04491 Cited by: 2nd item.  Autoencoding variational bayes. Note: arXiv preprint arXiv: 1312.6114 External Links: 1312.6114 Cited by: §4.1, §4.1.
 A bayesian multiscale cnn framework to predict local stress fields in structures with microscale features. pp. 1–34. Cited by: §1, §1, §2.4, §2.4, 2nd item, §5.1.1, §6.
 Picasso: a cudabased library for deep learning over 3d meshes. Note: arXiv preprint arXiv: 2103.15076 External Links: Document, Link Cited by: §3.1.
 Enhanced deep residual networks for single image superresolution. Note: arXiv preprint arXiv: 1707.02921 External Links: 1707.02921 Cited by: 2nd item.
 Simulating continuum mechanics with multiscale graph neural networks. Note: arXiv preprint arXiv: 2106.04900 External Links: 2106.04900 Cited by: §3.1.
 Automated solution of differential equations by the finite element method. Springer. External Links: Document, ISBN 9783642230981 Cited by: §2.3.
 A 3d convolutional neural network for volumetric image semantic segmentation. 39, pp. 422–428. Note: 25th International Conference on Production Research Manufacturing Innovation: Cyber Physical Manufacturing August 914, 2019 — Chicago, Illinois (USA) External Links: ISSN 23519789, Document, Link Cited by: §1.
 Deep learning observables in computational fluid dynamics. Journal of Computational Physics 410, pp. 109339. External Links: ISSN 00219991, Document, Link Cited by: §1.
 Geodesic convolutional neural networks on riemannian manifolds. Note: arXiv preprint arXiv: 1501.06297 External Links: 1501.06297 Cited by: §3.1.
 Simulation of hyperelastic materials in realtime using deep learning. Medical Image Analysis 59, pp. 101569. External Links: ISSN 13618415, Document, Link Cited by: §1, §1.
 Bayesian graph neural networks for strainbased crack localization. In Data Science in Engineering, Volume 9, R. Madarshahian and F. Hemez (Eds.), Cham, pp. 253–261. External Links: ISBN 9783030760045 Cited by: §D.3, 2nd item, §3.1.
 Stress field prediction in cantilevered structures using convolutional neural networks. Journal of Computing and Information Science in Engineering 20 (1). External Links: ISSN 19447078, Link, Document Cited by: §1.
 Multiscale modeling of physical phenomena: adaptive control of models. 28 (6), pp. 2359–2389 (English (US)). External Links: Document, ISSN 10648275 Cited by: §1.
 Hierarchical modeling of heterogeneous solids. 172 (1), pp. 3–25. External Links: ISSN 00457825, Document, Link Cited by: §1.
 Guaranteed error bounds in homogenisation: an optimum stochastic approach to preserve the numerical separation of scales. 110, pp. . External Links: Document Cited by: §1.
 Learning meshbased simulation with graph networks. Note: arXiv preprint arXiv: 2010.03409 External Links: 2010.03409 Cited by: §D.3, 2nd item, §3.1.
 Peterson’s stress concentration factors, third edition. pp. 1–522. External Links: Document Cited by: §5.1.1.
 PointNet: deep learning on point sets for 3d classification and segmentation. Note: arXiv preprint arXiv: 1612.00593 External Links: 1612.00593 Cited by: §3.1.
 Concurrent multiscale analysis of elastic composites by a multilevel computational model. 193 (6), pp. 497–538. External Links: ISSN 00457825, Document, Link Cited by: §1.
 Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. External Links: Document, Link, https://www.science.org/doi/pdf/10.1126/science.aaw4741 Cited by: §1.
 Threedimensional convolutional neural network (3dcnn) for heterogeneous material homogenization. 184, pp. 109850. External Links: ISSN 09270256, Document, Link Cited by: §1.
 Learning to simulate complex physics with graph networks. Note: arXiv preprint arXiv: 2002.09405 External Links: 2002.09405 Cited by: §D.3, 2nd item, §3.1.
 General introduction to asymptotic methods. Vol. 272. External Links: ISBN ISBN 9783540176169, Document Cited by: §1.
 Pygmsh: a python frontend for gmsh External Links: Document, Link Cited by: §2.3.
 DualConvMeshnet: joint geodesic and euclidean convolutions on 3d meshes. Note: arXiv preprint arXiv: 2004.01002 External Links: 2004.01002 Cited by: §3.1, §3.3.
 Improved protein structure prediction using potentials from deep learning. Nature 577 (7792), pp. 706—710. External Links: Document, ISSN 00280836, Link Cited by: §1.
 Protein structure prediction using multiple deep neural networks in the 13th critical assessment of protein structure prediction (casp13). Proteins: Structure, Function, and Bioinformatics 87 (12), pp. 1141–1148. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.25834 Cited by: §1.
 PyVista: 3d plotting and mesh analysis through a streamlined interface for the visualization toolkit (VTK). 4 (37), pp. 1450. External Links: Document, Link Cited by: §2.3.
 Predicting mechanical properties from microstructure images in fiberreinforced polymers using convolutional neural networks. Note: arXiv: 2010.03675 External Links: 2010.03675 Cited by: §1.
 Geometric deep learning for computational mechanics part i: anisotropic hyperelasticity. 371, pp. 113299. External Links: ISSN 00457825, Link, Document Cited by: §3.1, §3.2.

FMRI volume classification using a 3d convolutional neural network robust to shifted and scaled neuronal activations
. 223, pp. 117328. External Links: ISSN 10538119, Document, Link Cited by: §1.  Wide residual networks. Note: arXiv: 1605.07146 External Links: 1605.07146 Cited by: 2nd item.
 An introduction to computational micromechanics. Vol. 20. External Links: ISBN ISBN 9783540323600, Document Cited by: §1.
Appendix A Accuracy
Below we can see the algorithm that we use to calculate the accuracy of the GNN prediction [Algorithm 1].
Appendix B Volume and surface mesh
As discussed in section [3.2] the input and output is the surface and not the volume stress. In [Fig. 31] we see an example of a surface stress extracted from the volume mesh. We can observe that the maximum value of the Von Mises stress is the same in both cases and the location where the maximum Von Mises stress occurs is the same as well.
Appendix C Data preprocessing
Data preprocessing is a standard technique used in data driven methodologies. A common practice is to preprocess the input data usually with a simple linear rescaling [Bishop, 1995]. In this work we perform rescaling in order to both have smoother convergence of the optimiser and to restrict the space of possible inputs. We divide the input, and the output for consistence reasons, of the NN with the maximum absolute component of the stress tensor. This rescales the input data to a range . This rescaling alleviates the differences in scales across the data points which helps the optimiser to avoid bad local minima and converge to a stable solution. Additionally, scaling the input in the case of linear elasticity caries a deeper physical meaning. Scaling the macroscale stress field with a scaling factor, , results in scaling the microscale stress tensor with the same scaling factor, . By taking advantage of this linear relationship we can effectively train our NN in a range of values from 1 to 1 and then evaluate it to an input of any range. This is achieved by dividing the new input with the scaling factor , to map it to a range from 1 to 1. To retrieve the real output, microscale tensor, the output of the NN needs to be multiplied with the scaling factor .
Appendix D GNN parameters
We identified some key parameters for the training of the GNN and performed several tests to identify their optimum values. The parameters that we examined is the number of filters in the MLPs, the number of GN Blocks, the number of maximum neighbours per node, the existence of a skip connection between the input and the output of the network and finally the type of encoder. For these tests we used 500 patches as the training set and 100 patches for the test set.
d.1 Skip Connection
We suggest that using a skip connection to add the input macroscale stress tensor to the output of the network will improve GNN’s performance. The reason behind this is that the GNN will learn how the microscale stress field deviates from the macroscale stress field instead of learning the microscale stress field from zero which we believe that it is easier and it should also improve the generalization ability of the network. In order to validate our assumption we train 2 identical GNNs with the exact same parameters and training set but one of them has a skip connection and the other does not. Both networks have MLPs with 128 filters, 10 GN blocks and each node has at most 10 neighbours. The results can be found in [Fig 32] where we can observe that the GNN trained with the skip connection has smoother convergence compared to the GNN trained without the skip connection. Also, the skip connection improved accuracy from 65% to 80%. We conclude that indeed the skip connection helped not only to have more stable training but also to improve the accuracy and thus we decide to include it in our architecture.
d.2 Number of Filters
Another important parameter that heavily influences not only the accuracy of the GNN but also the training time and memory requirements is the number of filters in the MLPs. In order to identify the minimum number of filters that result in optimum accuracy we trained 3 GNNs with 64, 128 and 256 filters in the MLPs and we kept all the other parameters the same. Specifically, all networks have the skip connection described in [D.1], 10 GN blocks and each node has at most 10 neighbours. The results can be found in [Fig 33] where we observe that 128 and 256 filters result in the same accuracy which is higher than the accuracy for the 64 filters, 80% compared to 72%. Additionally, the training time for the GNN with the 256 filters is 12.7 hours and the maximum required memory is 15.4 GB while for the 128 filter GNN the training was completed in 5.8 hours and the maximum required memory was 7.7 GB. By choosing to use 128 filters in the MLPs of the GNN we achieve the same accuracy as in the 256 filters GNN but with a 54% decrease in the training time and a 50% decrease in memory requirements.
d.3 Independent Decoder
As already discussed in [3.6], in the first layer of the GNN we choose to encode the edge and node features independently into the latent dimension as suggested by [SanchezGonzalez et al., 2020; Pfaff et al., 2021; Mylonas et al., 2022]. We want to test if this choice results in an improved GNN and thus we perform an experiment between 2 GNNs with the same parameters where one uses a GN block to encode the inputs into the latent dimension and the other encodes them independently. Both of the GNNs have 10 GN blocks, 128 filters in the MLPs, the skip connection described in [D.1] and each node has at most 10 neighbours. The results can be found in [Fig 34] where we observe that both GNNs have similar accuracy curves and they both have a final accuracy of 80%. Nevertheless, we can observe that in the independent encoder case the accuracy starts increasing sooner, at epoch 50, where in the GN block case it starts increasing later, at epoch 120. Lastly, the independent encoder version involves slightly less calculations and results in a 5% decrease in training time. Consequently, we choose to use the independent encoder GNN.
d.4 Number of GN blocks
Another important parameter that affects both the memory requirement and the training time is the number of residual GN blocks. Because we are using residual connections we do not expect a decrease in accuracy as we add more GN blocks but we are expecting that there should be a saturation point where the GNN does not benefit anymore from the GN blocks but it performs unnecessary calculations resulting in increased computational cost. To validate this assumption and determine the most proper number of GN blocks we train 3 GNNs with the same parameters but different number of GN blocks. All the GNNs have 128 filters in the MLPs, the skip connection described in [D.1], independent encoder and each node has at most 10 neighbours. The results can be found in [Fig 35] where we observe that the GNN with only 3 GN blocks presents a decrease of 10% in the test accuracy compared to the other two. Also we can see that the GNNs with 5 and 10 GN blocks have similar test accuracy curves and they both reach a test accuracy of 80%. We decide to opt for the GNN with the 5 GN blocks that presents optimum accuracy without additional computational cost. This results in a training time of 3.1 hours and a maximum memory of 4.3 GB which is a decrease of 45% in training time and 44% in memory requirements compared to the GNN with the 10 GN blocks.
d.5 Number of neighbours
Lastly, we want to study the effect of the maximum number of neighbours on the GNN training. A small number of maximum neighbours will result in a graph where disconnected areas of the FE mesh do not share an edge and thus cannot exchange information, for instance microscale and macroscale features. We train 3 GNNs with the same parameters apart from the maximum number of neighbours that have values 5, 10 and 15. All the GNNs have 5 GN blocks, 128 filters in the MLPs, the skip connection described in [D.1] and independent encoder. The results can be found in [Fig 36] where we observe that 10 and 15 neighbours result in the same accuracy which is higher than the accuracy for the 5 neighbours, 81% compared to 75%. Additionally, the training time for the GNN with the 15 neighbours is 27% higher and the memory requirements 31% higher compared to the GNN with 10 neighbours. We conclude that in our case we do not have a reason to use more than 10 neighbours although in denser meshes this number could be different.
d.6 Geodesic and Euclidean Convolutions
We study the effect of the joint convolutions by training 6 GNNs with different number of Geodesic and Euclidean filters. We keep the total number of filters constant to 128 but we change the ratio between Geodesic and total (Euclidean + Geodesic) filters. We investigate the case where the ratio is 0% (only Euclidean), 25%, 50% and 75%, 87.5% and 100% (only Geodesic). All the GNNs have 5 GN blocks, residual connection, independent encoder, 128 filters and a maximum of 20 neighbours per node. The results can be found in [Fig 37]. We can observe that both for the accuracy and the loss the worst case is when the ratio is 0, so no Geodesic convolutions are present. Both the accuracy and the loss improve (accuracy increases and loss decreases) as we increase the ratio until the value 75%. After that no further improvement in performance is observed, both the accuracy and loss curves for ratio values 75% and 87.5% are the same. On the contrary when the ratio becomes 1 there is a substantial drop in performance, which is expected since no Euclidean convolutions are performed. We conclude that a 75% ratio is the most beneficial for this case.
d.7 Full stress field VS maximum stress
We investigate the option to directly predict the maximum Von Mises stress on the surface of the dogbone structure. To this end, we train 2 identical GNNs on patches extracted from 100 FE simulations and we evaluate them in a different set of patches extracted from 100 FE simulations. One of the GNNs directly predicts the maximum Von Mises stress and the other the full stress tensor. Both GNNs have 5 GN blocks, independent encoder, 128 filters (32 Euclidean and 96 Geodesic) and a maximum of 20 neighbours per node. The results can be found in [Fig 38]. We observe that the 2 GNNs have similar accuracy, but the one predicting the full stress has slightly higher values. We conclude that predicting the full stress is beneficial since it not only results in better prediction of the maximum Von Mises stress but it also provides us with the full stress tensor that can be used for instance for the online bias corrections.
d.8 Patches VS full structure
Lastly, we want to evaluate to what degree our choice to train the GNN using patches of the geometry affects the quality of the results. This choice stems from the fact that defects only locally affect the macroscale stress field and distant areas do not offer significant information. On one hand, training on patches has the benefit of making the GNN unaware of the specific structure and it encourages it to focus on predicting how the microscale features affect the global stress field, thus leading to improved generalisation. On the other hand, our choice for the size of ROI and patch has to be careful so that the network has all the necessary information to make predictions in the ROI. Choosing to train the GNN using the entire structure alleviates us from this problem. We perform an experiment to compare the two strategies. We train two GNNs with the same parameters, namely 5 GN blocks, residual connection, independent encoder, 128 filters (32 Euclidean and 96 Geodesic) and a maximum of 20 neighbours per node. The first one is trained using patches from 400 FE simulations (10 patches from each simulation) and the second one is trained directly on the 400 FE simulation results. We evaluate both GNNs on a separate set of patches extracted from 100 FE simulations. The results can be found in [Fig 39]. We can observe that for the GNN that was trained on patches the maximum Von Mises in the ROI of the patches is less scattered around the line, true value. The coefficient of determination for the patch case is 0.79 compared to 0.71 for the full case and the accuracy significantly drops from 70% for the patch case to 55% for full dogbone case. Thus we conclude that our choice to train the GNN with patches indeed leads to improved generalisation. We also compare the Von Mises stress distribution on the surface of a dogbone predicted by the GNN trained with patches with the GNN trained with the entire structure. In [Fig 40] we see that the GNN that was trained using patches more accurately predicts the high stress area compared to the GNN that was trained with the full structure.
Appendix E Ensemble Kalman Method
e.1 Kalman method for data assimilation
The Kalman method can be used to update the dimensional state vector of a system, , after observing some data, . We assume that the state vector follows a Gaussian distribution, , that we call prior. This distribution has a mean
and a variance
.(17) 
The data, , are assumed to be noisy. Specifically, we assume that the data follow a Gaussian distribution, centered around which is the value of data if no noise is present, and with a variance , that accounts for the noise. The matrix is called observation matrix. Consequently, the data likelihood can be expressed as
(18) 
Using the Bayes theorem, we can obtain the posterior distribution, .
(19) 
Where the posterior mean is calculated as
(20) 
And the posterior variance is calculated as
(21) 
The term is called Kalman gain matrix and is calculated as
(22) 
e.2 Ensemble Kalman method for data assimilation
The ensemble Kalman method is an approximate version of the Kalman method where the state distribution is represented by a sample (or an ensemble) of the distribution of size . This leads to the creation of a prior ensemble matrix whose columns correspond to the ensemble samples.
(23) 
The state covariance, , is replaced with the ensemble covariance, .
(24) 
A data matrix, , is created such as
(25a)  
(25b)  
(25c) 
The new Kalman gain matrix is calculated as
(26) 
The posterior distribution, , is calculated as
(27) 
e.3 Observation matrix free implementation
We can further simplify the calculations by avoiding to explicitly define the observation matrix. Instead we can define a function that will provide the noise free value of the data. This function is called observation function and is of the form
(28) 
The posterior can be written as
(29a)  
(29b)  
(29c)  
(29d)  
(29e) 