1 Introduction
Computer Aided Engineering (CAE) tools are integral parts of industrial product development today. Among these, Computational Fluid Dynamics (CFD), addressing the simulation and prediction of fluid and air flows as well as their effects, is the fastest growing domain. One of the major limitations of CFD is its demand in terms of computational resources. The accuracy of the prediction depends very much on the specific mesh, e.g. a very fine mesh in regions where boundary layer separation sets in or turbulent structures develop. Identifying an optimal mesh is a highly complex task and particularly depends on the specific flow situations. Years of experience are typically required to be able to generate efficient computational meshes.
Several strategies have been proposed in the past to determine optimal computational meshes [23]
. Among these, refinements based on goaloriented error estimators
[9]have been the most successful ones. Given a simulation goal, e.g. drag calculation, they determine which regions are likely to contribute the biggest error and refine the mesh in these regions correspondingly. Performing this task iteratively Pareto optimal meshes in terms of accuracy and degrees of freedom can be found. However, this calculation itself is highly computationally demanding and only available in a limited number of computational tools.
In the past years, a number of machine learning technologies have been proposed to accelerate the process of flow simulations, by either learning from corresponding simulations and predict the results by means of neural networks
[16] or by combining both approaches and thus accelerating simulations [20]. However in the latter case, the identification of optimal computational meshes is not resolved. Furthermore, a major concern of these approaches in the context of industrial use is that any machinelearning based approach will be hard to be validated.Within this contribution, we adopt a different approach. Instead of accelerating the flow simulation itself we focus on a machinelearningbased solution to accelerate optimal grid generation. Following the approach of [19], we determine optimal meshes for a large number of specific situations using a commercial solver (Simcenter STARCCM+ [24]). Based on a large set of examples / data points we train a convolutional neural network to predict optimal mesh distributions. The predicted optimal mesh densities can be used as an input for any CFD meshing/ simulation tool. Here, we focus on the task of identifying an optimal mesh for a 2dimensional channel/ windtunnellike flow with arbitrary geometries inside. A generalization to other examples is straight forward.
Such kind of mesh refinement based on machine learning is a field with high likelihood for industrial adoption. First, compared to many previous works combining machine learning and simulations, the approach is not sensitive to validation and verification. Second, while nonoptimal meshes can lead to bad convergence behavior, they still result in correct predictions. Third and finally, only a limited set of industrial simulation tools provide solutions for the adjoint problem while a lot of adaptive mesh refinement strategies rely on them. The predicted mesh—trained using adjoint solutions—is computationally inexpensive to create and can directly be used or provide an initial starting guess for further refinement.
In the following sections we present our new approach for machinelearning based optimal mesh generation. In section 2 we discuss the current state of the art and give a recap on adaptive mesh refinement as well as a brief history on how machine learning is used in CFD. In section 3 we present how we create the data which is later on used for training the convolutional neural network which is introduced in section 4. Afterwards we show in section 5 results for the training as well as prediction of two sample geometries—one from the test set and one randomly generated which has not been seen during training or validation. We close this paper with a discussion in section 6 on how this approach can be used in CFD.
2 State of the Art
Current CFD methodologies allow for very accurate results to be produced at the price of high computational cost. This can lead to simulation times lasting days or even weeks for large and complex geometries. As a result, design optimization studies are often too costly and slow to be feasible.
Simulation time can be reduced by accepting a higher error or coarser resolutions but this can lead to nonphysical behaviour in the simulation. The goal of introducing adaptive mesh refinement and a corresponding machine learning version is to achieve a speedup of the overall process without sacrificing in terms of accuracy and resolution. Therefore, in the next section we will give a review on adaptive mesh literature and approaches for machine learning in CFD.
2.1 Adaptive Grid Refinement
Since the first days of continuum mechanics simulations, adaptive mesh refinement has been in the focus of research. The computation time of corresponding simulations scales at least linearly with the size of the mesh, i.e. the number of vertices or degrees of freedom, but in most cases even quadratically. Since in many situations the physical quantity of interest is heavily influenced by local phenomena, having an effective nonhomogeneous adaptive grid with an appropriate local refinement, i.e. an adaptivity in mesh size, is key for economic simulations. Choosing an appropriate computational mesh very often depends on the experience of the simulation expert using corresponding tools. Though over the years quite a number of heuristic approaches have been developed
[23].One approach to refine the mesh is to use classical error estimation methods, e.g. estimating errors in global (energy) norms exploiting variational formulations of the underlying problems [2]. However, these kind of global bounds are often agnostic of the actual physical quantities of interest, e.g. drag of an object or a physical value at a specific location. That is, bounds on the errors of the quantities of interest cannot be related directly to the global estimates. This makes it difficult to get indications where and how to refine the computational mesh.
Dual Weighted Residual (DWR) error estimators take a different approach [9]: They estimate local residuals of the numerical solutions and estimate their effect on the physical quantities of interests. The latter is achieved by solving an adjoint problem identifying sensitivities of the quantities of interest on the local error, i.e. they allow to estimate and localize the error. Thereby, the functional of interest cannot only be norms but also highly specific values such as any integral or point quantities. Realizing a feedback loop, the computational grid can be iteratively refined achieving an optimal mesh. This approach does not only allow for an efficient a posteriori error control for the physical quantities of interests but at the same time also results in highly economical meshes with optimal efficiency for the functional of interest.
The DWR method goes back to Becker and Rannacher [8] and is based on the pioneering work of Babuška and Rheinboldt [4, 3] and later refined by many others as Eriksson, Estep, Hansbo and Johnson [12]. For a complete overview we refer e.g. to the following surveys [27, 1]. Though, originally introduced for Finite Element Methods the DWR is versatile can can be generalized to other discretizations as well, e.g. Finite Volume Methods [6]. Corresponding functionality is available in academic as well as industrial simulation tools, e.g. deal.II [5], FEniCS [21], Simcenter STARCCM+ [24], or Ingrid Cloud [17], and has been applied for a large number of different applications, including fluiddynamics [7], structural dynamics, as well as to complex multiphysics problems like chemically reactive flows [10] or fluid–structure interactions [15].
The usage of these methods, however, is rather limited in industrial practice today despite the opportunity of allowing for optimal meshes, i.e. highly efficient computations, and reliable error estimates. They are inherently based on the solution of adjoint problems which requires dedicated functionality of corresponding simulation tools. In particular, only a limited set of industrial simulation tools provides a corresponding capability. To circumvent that, we propose to learn optimal meshes trained on corresponding solvers with adjoint functionality which can be used as an inital starting point also in other simulation tools independent of the specific numerical method. In the next section, we review how machine learning techniques are applied in CFD as of now.
2.2 Machine Learning in CFD
Many methodologies exist for CFD. Similarly, machine learning techniques can be incorporated in a multitude of ways. The most straightforward method is to replace the CFD simulation step altogether: training a network using the geometry and flow parameters in order for it to output the resulting flow.
The results of these replacement networks are highly dependent on the amount and quality of data used for training as well as the flow setup it is trained for. For example, Guo et. al. [16] were quite successful in deploying this technique for steady state laminar flows. They report a speedup of at least 2 orders of magnitude while keeping the overall error increase moderate for most scenarios. However, this result does not hold for all flow setups.
Employing machine learning techniques for turbulent flows, however, has proved challenging, e.g., the straight forward idea to deploy one network to replace an entire turbulent flow solver was not successful [25, 11]. There are, nevertheless, approaches that yield approximate results with higher throughput by replacing the most expensive solver step, the pressure step, with a neural network [26]. Another example is Ling et. al. [20]
who used a modified a RANS solver to include a network predicting the Reynolds stress anisotropy tensor with encouraging results. Furthermore, Kutz
[19] suggests that especially deep neural networks can yield high accuracy and throughput results as they are better suited for the highdimensional input data and could potentially capture very complex flow phenomena in the deeper layers of the network.These are only a few examples in a rapidly growing field. For a more extensive summary we refer to the recent review [11].
3 Data Generation Pipeline
One of the most important requirements for proper training is a representative set of training data. In this work, we employ supervised learning. Hence, we need to generate inputoutput pairs with ground truth results. This is accomplished via a data generation pipeline consisting of several key parts: First, we generate random geometries, described in
subsection 3.1. Then, an iterative refinement process is employed using repeated primal (subsection 3.2) and adjoint (subsection 3.3) solves. The results of the adjoint solves are used to refine the mesh at the end of each iteration, using the process described in subsection 3.4. The data generated by the pipeline can then be processed for training.3.1 Random Geometry Generation
The aim of geometry generation is to generate a randomized set of geometries that is representative of the full space of possible CFD simulations within the selected constraints. Corresponding simulations will be the basis for generating datasets to be used for the machine learning part. We divide the general geometry creation process into several phases—each with a randomized component. These phases are illustrated as diamonds in the flowchart in figure 1. Additionally, we show an example for the occurring transformations in the column on the right starting from two primitives—a triangle and a square. Note that in this work, we restrict the geometry to a square region within the larger flow domain.
The first phase in geometry creation is the primitives phase. In this phase, the number of primitives is randomly determined. We use five basic primitives: a triangle, square, pentagon, hexagon, and duodecagon. This extends the space of possible created geometries since relying on a single geometry cannot produce representative results. Currently, the pipeline only supports the use of one or two primitives.
The second phase is the primitive type selection. Our approach mirrors the methodology applied in [20], where several primitives for random geometry creation were used when training a neural network on a similar problem. Unlike in [20]
, we omit the circle or ellipse. Instead we smooth some of the sharp corners in primitives by replacing lines between vertices by splines. We employ a uniform distribution, i.e. there is an equal chance that any one geometry is selected.
The third phase is the transform phase
. In this phase, several transformations are performed on base primitives to create a randomized shape. We begin with a uniform, unit primitive, such as a unit square. Then, two extensions, one rotation and a displacement are performed. An extension has two random variables, the axis of extension and the extension factor. The axis of extension is an axis with an angle
from the horizontal plane between 0 and that passes through the centroid of the base primitive. Each vertex is then extended by the extension factor in the direction normal to the axis of extension. While the axis of extension has a uniform distribution between 0 and 180°, the extension factor is normally distributed between 1 and 5, with a mean of 3 and a sigma of 1.5.
In the fourth phase, the positioning phase, the geometry is randomly rotated and then positioned within the specified bounds. The first primitive is positioned in the middle of the allowed bounds to avoid any shift of the flow towards boundaries. The second primitive (if one is used) is positioned randomly until both overlap. Thereby a uniform distribution is used to determine the random position of the second primitive.
The fifth phase is the spline creation phase. The use of splines is suggested in the place of straight lines between vertices to reduce the number of sharp edges. We implemented this by placing midpoints randomly between vertices and drawing splines using the new set of points. There are two random aspects when performing this operation: the number of midpoints and their random distribution between the two vertices. Here, we use 05 midpoints with a uniform distribution.
The final phase in random geometry creation is to merge transformed primitives by performing a boolean operation. This phase only takes place if two primitives are used. There are two possible boolean operations—union and intersection. A union combines the two primitives while an intersect uses only the region where the two primitives intersect. Note that the intersect operation introduces some complexity. Firstly, there is a general tendency to form slivers. Secondly, there is also no current control on the minimum size of intersected region. To prevent arbitrary small sizes of intersecting regions, we extend all intersect regions by a random factor uniformly chosen between 35 after the intersect operation is performed.
3.2 CFD Simulation: Primal Solve
After having a pipeline to generate random geometries, the next step is to setup the primal CFD simulation to allow for convergence with a wide variety of random geometries. This section overviews the inputs for the primal solver, the adjoint solver and the resulting mesh refinement. We employ Simcenter STARCCM+ [24], a commercial CFD software developed by Siemens Digital Industry Software, for the simulations.
3.2.1 Computational Domain
Since turbulence effects extend extremely far from the base geometry, we place the generated geometry in a region of size which is positioned away from the inlet as well as the upper and lower boundaries and away from the outlet as illustrated in figure 2. The resultant domain size has a blockage ratio of less than to ensure the free slip boundary conditions. For this experiment, we set the length to be , with an appropriately scaled domain. This domain is fully within the turbulent regime, such that it serves as a starting point for similar projects involving fast moving mediumsized objects, like moving vehicles or windtunnellike applications.
The boundary conditions applied to the flow problem are depicted on the left side of figure 2.

At the inlet boundary, the flow is defined as subsonic and incompressible with an inflow velocity perpendicular to the boundary wall. An inflow velocity of was used, which is approximately .

At the outlet boundary, the flow regime is also defined as subsonic, with a static pressure of .

The upper and lower wall boundaries are considered as farfield and defined as freeslip walls.

The randomly generated geometry or obstacle is defined as a noslip wall with the wall roughness set to smooth.
3.2.2 Initial Mesh Setup
Though the goal of this work is to learn the mesh sensitivity with respect to certain input geometries, it is still necessary to have a mesh with good quality to improve the robustness of the mesh refinement process.
A meshing example is illustrated in figure 2. We employ polygonal mesh elements as the Simcenter STARCCM+ solver is optimized for this element type. Additionally, two wake refinements are defined to capture the eddies behind the obstacle. Prism layers are employed at the obstacle boundary. Detailed parameters for the setup of the mesh including prism layers and wake refinement can be found in appendix A.1.
3.2.3 Primal Solver
To make our CFD runs as fast but also as robust as possible we set the following solver specifications. We employ a coupled implicit scheme due to its robustness. Furthermore, the Algebraic Multigrid (AMG) method is used to accelerate the convergence. To speed up the convergence further, we use a large CFL number of 200. To avoid instability in the initial iteration steps, we employ linear CFL ramping for the first 50 solver iterations, starting with a CFL number of 0.25. Additionally, we use grid sequencing to provide a better "guess" of the flow solution. The detailed parameters for grid sequencing initialization can be found in appendix A.2.
To ensure an efficient use of the computing ressources, we furthermore define several stopping criteria to allow the easy automation of running CFD simulations. The goal of these criteria is to stop simulations once we reach the desired accuracy or if the simulation does not converge. There are three situations where the simulation should terminate: reaching steady state, reaching sufficient accuracy, or reaching maximum iteration number. The three cases as well as the associated parameters are detailed in appendix A.2.
3.3 CFD Simulation: Adjoint Solve
Once the primal solution converges, the adjoint solver is employed. In general, the adjoint solution is an efficient way to compute gradients of a quantity of interest based on the input parameters. For a brief introduction we refer to the appendix B and to [14]. In our specific case, we can use the resulting gradients to compute sensitivites of the input quantities with respect to cell densities of the current mesh. This helps us to identify areas where mesh refinement is needed to reduce the numerical error of the computed drag force. We refer to the calculated sensitivity as the estimated adjoint error or mesh sensitivity.
First, the cell residuals of the NavierStokes equations are recomputed with a higherorder discretization scheme. The difference between the primal solution and the higher order residual is taken as an estimation of the true numerical error. Next, the field of residual differences is multiplied with the adjoint solution of the given cost function to obtain the mesh sensitivity. Thereby, adjoint methods significantly reduce the cost of calculating parametric sensitivities by selecting a specific Lagrange multiplier. For a short explanation we refer to appendix B. The main savings of the adjoint method stem from the lower complexity of solving the adjoint equation. Let us assume that solving for has a complexity of , and solving the adjoint equations has a complexity of . If , when investigating a set of x parameters, implementing the adjoint equations reduces the complexity of sensitivity calculation from to [13].
In this contribution, the drag force on the obstacle surface is chosen as the cost function. Since the magnitude of the error varies from geometry to geometry, we normalize the adjoint error by the drag force. An example of mesh sensitivity is shown in figure 3.
3.4 CFD Simulations with Adaptively Refined Meshes
After obtaining the adjoint error, the estimated error per cell is checked and compared to a threshold value, e.g. . If the error is above this value, we refine the affected cell by reducing the base size of the cells in the local region. We note that when a cell is refined, the neighbouring cells are also adapted to generate a continuous mesh based on the growth ratio. This ratio determines that maximum change in neighboring cell sizes. A growth ratio of was chosen to ensure smooth transitions in the domain after refinement.
After refining the mesh, we then obtain a new adjoint solution by repeating the primal and adjoint solve using the newly refined mesh. For the purpose of data generation, we repeat this refinement process until either the adjoint solution reaches a steady state or the adjoint solver no longer converges. This means that both a new primal solve and adjoint solve is required for each refinement step. An example of a mesh after multiple refinements can be seen in figure 4.
4 Neural Network Training Pipeline
CFD simulations are usually based on nonrectilinear mesh structures with varying cell sizes in the simulation domain. The spatial irregularity of the mesh poses a difficult prediction target. As the number of cells and their size vary between simulations, predicting individual edge or node positions via a neural network approach is infeasible. Fortunately, the user is typically not interested in the exact position of a specific edge in the mesh, but rather in a typical cell size in a given area of the simulation domain. We therefore chose the average size of cells in a region as the prediction target—essentially corresponding to the refinement level of the mesh. This quantity can easily be described in a rectilinear grid covering the domain of interest and is not concerned with individual cells, which there can be hundreds of thousands of. The regular nature of this grid also lends itself to techniques used in image processing. We therefore chose to export the cell sizes from our CFD solver as an image, with the gray scale channel indicating the relative size of cells. When these images are blurred and down sampled as described in subsection 4.2, the resulting 128x128 images are a representation of mesh refinement and are easily fed into a convolutional neural network as prediction targets.
In the following, we detail the overall processes and architectures chosen in this work. The architecture of choice, the Staircase UNet Architecture, is introduced in subsection 4.1. The preprocessing of data for training is explored in subsection 4.2.
4.1 Neural Network Architecture
The UNet architecture with skipconnections used in [25]
is also used in this work. This is a simple convolutional neural network consisting of multiple layers. In each layer, input data passes through a convolution, and an activation function followed by down or upsampling. Additional steps can be taken for improved performance, such as batch normalisation or dropout but these will be discussed later when addressing hyperparameter tuning. A basic UNet architecture without skipconnections is shown in figure
5 in blue. Here, the number of convolutional channels in each layer increases as the data is downsampled and decreased as it is upsampled. As the size of our input and output data is the same, we consider only symmetric UNet architectures.In addition to the basic UNet structure, the authors of [25] also utilized skipconnections and successfully predicted flowbased quantities. These connections can be seen in figure 5, represented by light red lines.
Skipconnections allow for information to be transferred between certain input and output layers in a neural network [22]. They are important because the magnitude of the coefficients in the initial layers of a neural network are often much smaller compared those in the final layers. As a result, the gradients of the loss in these initial layers is often very small, which typically results in a poorlyconditioned optimization problem with slow convergence. Skipconnections alleviate this problem by equalizing the magnitudes of the gradients since they reduce the effective distance between output and input layers, thereby improving the training of much deeper networks [22].
In this work, we modify the UNet with skipconnections model, creating a Staircase UNet Architecture. This structure adds additional layers at each depth of the network, such that additional convolutions are performed before up or downsampling occurs, a visual representation of this architecture can be seen in figure 5.
With this architecture, we aim to preserve more information when passing data through skipconnections. This is best understood by examining figure 5. When utilizing the staircase architecture, connecting layers and would be analogous to utilizing the basic skip connection seen in figure 5 because the combined data from these two layers would be immediately upsampled. However, if we chose to connect layers and or and , the resulting combined data would undergo another full convolutional layer before upsampling—the idea being that upsampling could destroy features that would otherwise prove useful for training. Henceforth, when referring to these two types of skipconnections, we call the former, where information is immediately upsampled, an outer skipconnection and the latter, which allows for an additional convolution prior to upsampling, an inner skipconnection.
4.2 Data Preparation
4.2.1 Smoothing
Our input images often contain sharp edges where the values of adjacent cells differ greatly. Additionally, capturing individual cells, which in practical CFD applications can have arbitrary shapes and sizes, is difficult in machine learning. Hence, a Gaussian blur (commonly used in graphics and computer vision) is used to smoothen the edges and improve training. That is, we apply a convolution to the image with a Gaussian function:
(1) 
which depends on parameters and radius . Thus for a pixel at the Gaussian blur of a pixel is defined as follows:
(2)  
(3) 
where is a function of pixel value in the original image. When performing the blur, we apply Neumann boundary conditions and assume that the domain beyond the picture is of the same color as that on the boundary.
Additionally, we cannot apply this blur onto geometry pixels as this would destroy geometry boundary information. Hence, we apply a selective blur that ignores data on the geometry. This is done by modifying the convolution kernel for each pixel such that it excludes data, thus restricting smoothing to nongeometry pixels:
(4)  
(5) 
where represents an indicator function of geometry. It returns 0 if pixel is inside the geometry, 1 otherwise. For the following steps, we decided to use and .
4.2.2 Downsampling
We perform downsampling to reduce the resolution of the image to an acceptable size for the neural network. The transformation of the image is also applied to geometry. There are several ways this step can be performed, such as Fourier or Wavelet transforms. However, we use averaging. Initially, images exported from the CFD solver are at a resolution of 4000x4000 and a single gray scale channel, before being cropped to 3840x3840. The images are then downsampled at the same time as blurred by using the kernel from equation (4
) of size 30 with a stride of 30 without padding. This leads to a processed image size of 128x128. Both input and output of the network operate on this size. Due to the large number of images and high initial resolution, this postprocessing was implemented using CUDA allowing to increase throughput by 23 orders of magnitude.
4.3 Training
4.3.1 Hyperparameters
After identifying the best performing networks (c.f. subsection 4.1), a systematic hyper parameter sweep is performed. The models in the preliminary sweep used the parameters seen in appendix C. We note from these initial results that networks with larger kernels and a larger number of convolutional channels performed better. This is unsurprising as larger kernels could capture more complex or larger features. Other than network and kernel size, the following parameters seem to be the most promising:

Beta parameters for the Adam optimizer

Scalar or tensor skipconnection

Skipconnection constraints

Skipconnection location
These are explained in greater detail in the following sections.
Beta Parameters
The Adam optimizer adds an effect akin to "momentum" when updating the weights of the neural network, more specifically, it includes the unbiased first and second moment estimates of previous weight updates. This creates some form of "inertia"—if many past updates tend to move in a specific direction, the current and future updates will be biased to also move in a similar direction. The Adam optimizer is seen having similar benefits to both the AdaGrad and RMSProp extensions of stochastic gradient descent: it is able to deal with sparse data, due to the bias correction when calculating the moving averages, and with noisy, nonstationary data, respectively
[18].The two parameters tuned in this phase, and , control the decay rate of these moving averages. We find that the performance of the networks is highly sensitive to changes in both and and that minor changes to either could result in wildly varying performance when the locations of skip connections were modified. Therefore, we are unable to determine very general rules for better values. However, one general trend is that, with skip connections of depths 13, a lower value produces good results.
Scalar and Tensor SkipConnections
In this contribution, skipconnections are implemented in two different ways: scalar (or simple) skipconnections and tensor skipconnections. In reference to figure 5, the layers and are both tensors of shape , where is the length of the down sampled input and is the number of convolutional channels used in that particular layer. The scalar skip connection combines data using a scalar as follows:
The tensor skip connection allows greater degrees of freedom, with a separate scalar value for each channel and the data is combined as follows:
We experiment with both tensor and scalar skipconnections but there is no clear conclusion on which skipconnection type produced better results.
Constrained SkipConnections
Recalling that the values in scalar or tensor skipconnections are constrained between 0 and 1, we can implement this constraint multiple ways. The simplest method is just to clip the value of the skipconnections if they exceed 1 or drop below 0, which we refer to these skipconnections as constrained
. However, another method of constraining the values is through the use of a sigmoidal function, which maps from the space of real values
to the closed interval of . We refer to these skipconnections as unconstrained. Here, we found that architectures with unconstrained skipconnections tended to display better results.Skip Connection Location
One more variable we experiment with is the location of skip connections. This means varying the depth at which skip connections are placed. A higher skip connection (at depth 1 or 2) passes larger chunks of information as it connects the earlier layers to the last few layers. Skip connections at lower depths pass smaller amounts of data and bypass a smaller number of convolutional layers.
Our experimentation indicats that placing skip connects at a higher depth typically improves performance. Interestingly, not all skip connections see use during training. There are cases where only two out of three skip connections pass any data, i.e., the parameter is close to 0. The best performing architecture has skip connections at depths 2, 3, 4, and 5.
4.3.2 Loss Function
A custom loss function is used in the the training of our neural network. To ensure not to learn inside the geometries, we mask the loss function with the geometry. For example, we compute the
loss as follows.(6) 
where is an indicator function of the geometry and prism layer for prediction and ground truth . As we cannot iteratively refine the prismlayers using the Simcenter STARCCM+ software, we also mask the prism layers, as seen in figure 6, to prevent the model from learning the extremely fine mesh density along the prism layers.
5 Results
5.1 Big Data
Training our neural networks largely relied on the integrity and quality of the data produced in our simulation pipeline. To achieve the amounts of data needed, the process of data acquisition, preprocessing and training had to be mostly automated. Figure 7 shows the overall flow of raw data produced on LRZ compute clusters (https://www.lrz.de/english/), moving through different stages of preprocessing to finally being used as training input for the network architectures.
The pipeline design has 4 stages divided by different compute needs. Raw simulation data is produced with heavy use of parallelism on CPU clusters, with individual runs taking between 3060 minutes with 10 cores (Figure 7 red). Exported data is uploaded to the central element of the pipeline, our locally hosted MongoDB instance handling all data operations from all stages. Generalized preprocessing (Figure 7 blue) including assessment of mesh quality is done on local CPU compute units, selecting which refinement iterations to use further down the pipeline. Image preprocessing (Figure 7 green) is handled by local GPU compute units using CUDAaccelerated code for selected iterations and also stored back in the database. Finally, all data necessary for training is combined in a single file. This file is then distributed to machines equipped with suitable GPUs training different neural network setups, which in return upload the results in form of the final loss and accuracy of the tested architecture (Figure 7 orange).
In total 65.000 simulations were performed on different clusters. At peak production rate, 50 concurrent simulations saturated the IvyMUC^{1}^{1}1https://doku.lrz.de/display/PUBLIC/IvyMUC cluster at LRZ containing 496 Ivy Bridge CPU. With an average of 4.3 refinement iterations/simulation, 280.000 different meshes were created and their quality assessed. For each mesh, multiple qualities were exported for potential future use, which led to a database size of 1.2 TB containing 3.6 Mio. exported fields. Full simulation files allowing for simulation restarts were not permanently saved, as they amounted to another 56 TB. A full dataset containing processed geometries, masks, SDFfields and mesh targets at 128x128 resolution amounts to 3.3GB in compressed state.
5.2 Neural Network Performance
For final training, a reduced set containing simulations that had been refined at least 5 or more times were used. This set holds the binary geometry input to the network, a mask to remove geometry and prism layer from the loss function as described in subsubsection 4.3.2 and the target mesh distribution. All 3 were used at 128x128 resolution. The reduced set amounts to 11.059 samples, split 90% / 10% for training and validation. Training was run with batch size 128 and an initial learning rate with an exponential decay of 0.89 every 650 steps. Figure 8
shows the training and validation loss for the final network training run. Both show a continuous decrease until reaching a plateau after 25.000 training steps, which is equivalent to 300 epochs.
The best performing neural network is a Staircase UNet with Skip Connections using tensor skip connections. This network had a depth of 8 and 2 convolutional layers at each depth. We applied inner tensor skip connections at depths 2,3,4 and 5. We used beta values of and with the exponentially decaying learning rate using an Adam optimizer. This network had degrees of freedom and obtained an accuracy of on the validation dataset. In Figure 9 we illustrate four example geometries with their given reference, the resulting prediction for the velocity magnitude and the corresponding relative error.
5.3 Sample Prediction Validation
Section 5.2 shows the ability of the network to learn refinement maps around a random geometry on a 128x128 scale. For future industrial use, the final step would be to automate a meshing process using these refinement maps to produce a mesh directly e.g. in Simcenter STARCCM+. Here, we present an example of how this process might work as well as quantitative performance of the resulting mesh.
The neural network output predicts an average mesh cell size per pixel. Instead of creating the theoretically available 16384 volume controls, we choose to extract 45 isosurfaces from the mesh prediction and use those for volume control zones in Simcenter STARCCM+ as seen in figure 10.
The network is clearly capturing the general wake behind the geometry as well as the shear layers, which leads to mesh refinement in those critical areas. Given the limitations in capturing the prism layers as mentioned in subsubsection 4.3.2, and the limitations imposed by working with downsampled 128x128 data, the network is not predicting the prism layer and smallest cells directly attached to the geometry. This is clearly visible when comparing the adjoint error estimate between the mesh produced using the network prediction and a mesh produced by the iterative refinement discussed in subsection 3.4.

Figures 11 and 12 compare the adjoint error estimate on drag force per cell between a mesh iteratively refined by our macro pipeline used for training data production on the left, and the mesh generated from the neural network prediction on the right. Figure 11b shows the lack of refinement close to the geometry stemming from the masking of prism layers from our dataset. Compared to the iterative mesh in figure 11a, the error especially in front of the geometry is reduced and the wake and shear layers captured. The large error outside the viewport is a result of the default growth rate behaviour of the mesher, as no further control volumes were set outside the neural network viewport. Table 10(c) shows similar values for overall drag, as well as overall error in the viewport and total domain with the maximum cell error actually lower for the neural network mesh.

Figure 12 shows a similar behaviour for a different geometry, which has not been seen during training or validation. Again the error in front of the object is reduced and the shear layers better resolved. The adjoint error estimate in table 11(c) in this case suggests a significantly reduced error in the viewport and overall domain for the neural network mesh. A common issue between the two meshes is the large error in cells around sharp bends around the object. This most likely is a result of the iterative refinement strategy not affecting prism layer settings, so both the iterative mesh and the network prediction based on training data with the same strategy share this weak point.
Overall, the meshes generated using the output of the neural network produce well converging meshes with error estimates comparable to or better than the iterative approach for the two samples investigated. Generally, the network outputs mesh refinement maps with very smooth gradients in cell size, while the iterative approach used to generate the training data often results in meshes with very localized refinement zones. This noisiness in the training data is a major contributor to the limit in accuracy during our network training. Given the network is seemingly not trying to reproduce this noise but rather going for larger structures and smooth gradients along the field bodes well for the usability of predicted meshes.
6 Conclusions and Outlook
Within this contribution, we presented a novel machinelearning based approach for optimal mesh generation in computational fluid dynamics (CFD) applications. Basis was the commercial computational fluid dynamics simulator Simcenter STARCCM+ [24], which was expanded by an iterative mesh refinement loop to generate optimized meshes with respect to the overall error. This computational pipeline was based on an appropriate error estimator leveraging adjoint sensitivities^{2}^{2}2Meanwhile, corresponding functionality is available in the standard Simcenter STARCCM+ solution.. To achieve optimal results in a machine learning approach, we produced over 60.000 simulations, 6 Terabytes of data, and used about 50 years of serial compute time on CPU.
Based on a subset of this dataset we tested and finetuned different neural network architectures relying on three different base architectures all using the same dataset to train on. Automated testing of different variants was performed on multiple GPUs in what would account to roughly 2 months of continuous single GPU training time. Multiple hundred variants were trained and tested and multiple very accurate architectures found. It was shown that the modified UNet architecture both with and without skip connections is able to predict refined mesh densities for a random geometry with an accuracy of %. Figure 9 illustrates the predictions obtained by the best performing network.
To produce a neural network of such quality required quite some effort with regard to data creation, processing and training, only made possible by the full automation of the entire process from CFD to Neural Network. The trained neural network that came out of this specific pipeline could be used to give inexperienced CFD users a reasonable first guess for a mesh refinement given their geometry. This could reduce the need for tedious handtuning of the mesh as it should predict a mesh that is able to produce a convergent result. This use of neural networks in the general CFD pipeline is one of the most riskaverse ways of using machine learning in this context. Other research focuses on replacing part of the actual CFD solver with neural networks to speed up the simulation. However, this could lead to inaccurate results when dealing with edge cases as the networks itself have no knowledge of the underlying physics. Our approach can at worst produce a bad mesh, which is easily spotted by lack of convergence of the simulation. But it can in the best case significantly speed up the initial phases of setting up the CFD simulation for an inexperienced engineer and actually produce usable results directly after the first run. The compute resources needed for a single prediction would be a matter of seconds on consumer machines and are definitely worth the wait.
So far we have restricted ourselves to very specific setup, i.e. for a fixed speed flow and geometries of a certain size range. Future studies will address the potential to generalize the concept, e.g. predictions for geometry sizes in different orders of magnitude or different flows. Also of high interest would be to scale this approach from 2D to 3D. Though this might lead to an very big amount of data required for training. Leveraging machine learning in the overall CFD workflow of an engineer can speed up tasks that are highly dependent on personal experience like mesh creation. Even more importantly, machine learning in the context of mesh generation has only little influences on validation and verification of simulation methods as required in typical industrial use case.
Acknowledgements: The authors greatly acknowledge the contributions of 2018/19 Bavarian Graduate School of Computational Engineering 2018/19 team^{3}^{3}3https://www.bgce.de/curriculum/projects/honoursproject20182019machinelearningandsimulationforgridoptimization (Peer Breier, Qunsheng Huang, Moritz Krügener, Oleksander Voloshyn, Mengjie Zhao), the support of the Bavarian Graduate School of Computational Engineering^{4}^{4}4https://www.bgce.de, the Leibnitz Supercomputing Centre providing compute resources, NVIDIA for providing GPUs through their GPU grant program, as well as Siemens AG and Siemens Digital Industry Software for providing extensive support throughout the project.
References
 [1] (1993) A unified approach to a posteriori error estimation using element residual methods. Numerische Mathematik 65 (1), pp. 23–50. Cited by: §2.1.
 [2] (1987) A feedback finite element method with a posteriori error estimation: part i. the finite element method and some basic properties of the a posteriori error estimator. Computer Methods in Applied Mechanics and Engineering 61 (1), pp. 1–40. Cited by: §2.1.
 [3] (1978) Aposteriori error estimates for the finite element method. International Journal for Numerical Methods in Engineering 12 (10), pp. 1597–1615. Cited by: §2.1.
 [4] (1978) Error estimates for adaptive finite element computations. SIAM Journal on Numerical Analysis 15 (4), pp. 736–754. Cited by: §2.1.
 [5] (2007) Deal. ii—a generalpurpose objectoriented finite element library. ACM Transactions on Mathematical Software (TOMS) 33 (4), pp. 24–es. Cited by: §2.1.
 [6] (2005) A posteriori error estimation and mesh adaptivity for finite volume and finite element methods. In Adaptive Mesh RefinementTheory and Applications, pp. 183–202. Cited by: §2.1.
 [7] (2002) An optimal control approach to adaptivity in computational fluid mechanics. International journal for numerical methods in fluids 40 (12), pp. 105–120. Cited by: §2.1.
 [8] (1996) Weighted a posteriori error control in fe methods. IWR. Cited by: §2.1.
 [9] (2001) An optimal control approach to a posteriori error estimation in finite element methods. Acta numerica 10, pp. 1–102. Cited by: §1, §2.1.
 [10] (2004) Adaptive computation of reactive flows with local mesh refinement and model adaptation. In Numerical mathematics and advanced applications, pp. 159–168. Cited by: §2.1.
 [11] (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52, pp. 477–508. Cited by: §2.2, §2.2.
 [12] (1995) Introduction to adaptive methods for differential equations. Acta numerica 4, pp. 105–158. Cited by: §2.1.
 [13] (2008) Adjoint methods for shape optimization. In Optimization and computational fluid dynamics, pp. 79–108. Cited by: §3.3.
 [14] (2000) An introduction to the adjoint approach to design. Flow, Turbulence and Combustion 65 (3), pp. 393–415. External Links: Document, ISBN 15731987, Link Cited by: §3.3.
 [15] (2006) Goaloriented error estimation in the analysis of fluid flows with structural interactions. Computer methods in applied mechanics and engineering 195 (4143), pp. 5673–5684. Cited by: §2.1.
 [16] (2016) Convolutional neural networks for steady flow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 481–490. Cited by: §1, §2.2.
 [17] Ingrid Cloud. External Links: Link Cited by: §2.1.
 [18] (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §4.3.1.
 [19] (2017) Deep learning in fluid dynamics. Journal of Fluid Mechanics 814, pp. 1–4. Cited by: §1, §2.2.
 [20] (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics 807, pp. 155–166. Cited by: §1, §2.2, §3.1.
 [21] (2012) Automated solution of differential equations by the finite element method: the fenics book. Vol. 84, Springer Science & Business Media. Cited by: §2.1.
 [22] (2016) Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections. In Advances in neural information processing systems, pp. 2802–2810. Cited by: §4.1.
 [23] (2005) Adaptive mesh refinementtheory and applications. Springer. Cited by: §1, §2.1.
 [24] Siemens Digital Industry Software. External Links: Link Cited by: §1, §2.1, §3.2, §6.
 [25] (2018) Well, how accurate is it? a study of deep learning methods for reynoldsaveraged navierstokes simulations. arXiv preprint arXiv:1810.08217. Cited by: §2.2, §4.1, §4.1.
 [26] (2017) Accelerating eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 3424–3433. Cited by: §2.2.
 [27] (1994) A posteriori error estimation and adaptive meshrefinement techniques. Journal of Computational and Applied Mathematics 50 (13), pp. 67–83. Cited by: §2.1.
Appendix
Appendix A CFD solver parameters
a.1 Mesh Setup
Table 1 states the parameters and their value for the base mesh domain while detailed setups of the prism layer and the wake refinements can be found in table 2 and table 3, respectively.
Default controls  Values  [unit] 

Base size  1  m 
Target surface size  100  Percentage of base 
Mininum surface size  10  Percentage of base 
Growth rate  1.05 
Obstacle surface control parameters  Values  [unit] 

Number of prism layers  10  
Prism layer total thickness  100  Percentage of base 
Prism layer stretching  1.3  
Mininum surface size  0.5  Percentage of base 
Wake refinement 1  Wake refinement 2  [unit]  

Isotropic size  20  40  percentage of Base 
Distance  30  70  m 
Growth rate  1.2  1.2 
a.2 Primal Solver
The parameters for grid sequencing initialization can be found in table 4.
Grid sequencing parameters  Values 

Maximum grid levels  10 
Maximum iterations per level  50 
Convergence tolerance per level  0.01 
CFL number  50 
To ensure an efficient usage of compute resources as well as easy automation, we consider the following three situations where the simulation should terminate to :

Reaching Steady State: The ideal case is when the drag force and residuals reach a steady state and the solution does not change with more iterations. During automation, we only consider the continuity residual when checking for steady state as it is representative of the other residual quantities. This case is visualized in figure 12(a) and figure 12(b) where we plot the residuals. We take a moving average of the latest 100 residual values and if this moving average remains within a given tolerance, we accept that the simulation has converged.

Reaching Sufficient Accuracy: Since the adjoint solver only requires a sufficiently converged result, the simulation can also be stopped when the residual falls below a certain value. Similar to the asymptotic stopping criteria, only the continuity residual is considered. We set the threshold for the continuity residual at . One of the example of reaching the minimum criteria can be found in figure 12(c) where we can clearly see that the continuity residual (red line) touches the xaxis, which indicates a residual of .

Reaching Maximum Iteration Number: Since a large number of converged simulations is required to form a sufficiently large data set, it is more efficient to drop a simulation that does not converge after a long time and to restart. Therefore, a maximum iteration number criteria is set to avoid wasting resources. The maximum iteration number is set to be 5000 since—after running several batches of random geometries—we observed that stopping criteria 1. and 2. most often took effect before reaching 3000 iterations. Hence, we chose 5000 to include a safety factor to avoid ending viable simulations prematurely. One of the example of reaching the maximum iteration number can be found in figure 12(d) where the residuals are not achieving the required threshold.
Appendix B Explanation Adjoint Method
A simple explanation of the adjoint method is shown as follows. Given a solution to a series of governing equations and a cost function both of which vary depending on an arbitrary input parameter and position , we can approximate that gradients of the two variables in relation to changes in F and :
(7) 
(8) 
Based off equations (7) and (8), we can formulate the adjoint equation as in equation (9), with Lagrange multiplier
(9) 
We then select a Lagrange multiplier such that the first term is zero
(10) 
The resulting equations to solve for the sensitivity of the cost function to changes to parameter is then
(11) 
Appendix C HyperParameter Tuning: Phase 1
max width= input mask reference depth channels kernels loss relu batchnorm skip type skip location 1 sdf prism normal 6 64 16x16 L2 .2 down only middle none none 2 sdf prism normal 6 256 4x4 L2 .05 middle scalar all 3 sdf prism normal 7 64 4x4 L2 .2 down only middle none none 4 sdf prism normal 7 64 2: 8x8 L2 .2 middle none none 5 sdf prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 6 sdf prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 7 sdf prism normal 8 64 4x4 L2 .2 down only middle none none 8 sdf prism normal 8 64 8x8 L2 .2 middle none none 9 sdf prism normal 8 64 3: 6x6 L2 .2 middle none none 10 sdf prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 11 geo prism normal 6 64 16x16 L2 .2 down only middle none none 12 geo prism normal 6 256 4x4 L2 .05 middle scalar all 13 geo prism normal 7 64 4x4 L2 .2 down only middle none none 14 geo prism normal 7 64 2: 8x8 L2 .2 middle none none 15 geo prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 16 geo prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 17 geo prism normal 8 64 4x4 L2 .2 down only middle none none 18 geo prism normal 8 64 8x8 L2 .2 middle none none 19 geo prism normal 8 64 3: 6x6 L2 .2 middle none none 20 geo prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 21 sdf dillute_prism normal 6 64 16x16 L2 .2 down only middle none none 22 sdf dillute_prism normal 6 256 4x4 L2 .05 middle scalar all 23 sdf dillute_prism normal 6 64 4x4 L2 0 none tensor all 24 sdf dillute_prism normal 6 64 4x4 L1 0 none tensor all 25 sdf dillute_prism normal 7 64 4x4 L2 .2 down only middle none none 26 sdf dillute_prism normal 7 64 2: 8x8 L2 .2 middle none none 27 sdf dillute_prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 28 sdf dillute_prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 29 sdf dillute_prism normal 7 32 4: 6x6 L2 .2 middle scalar 1, 2 30 sdf dillute_prism normal 7 64 4x4 L2 .2 middle scalar 3, 4, 5, 6 31 sdf dillute_prism normal 7 64 4x4 L2 0 none tensor all 32 sdf dillute_prism normal 7 64 4x4 L1 0 none tensor all 33 sdf dillute_prism normal 8 64 4x4 L1 0 none tensor all 34 sdf dillute_prism normal 8 64 4x4 L2 .2 down only middle none none 35 sdf dillute_prism normal 8 32 4: 6x6 L2 .2 middle scalar 1, 2 36 sdf dillute_prism normal 8 64 4x4 L2 .05 down only middle none none 37 sdf dillute_prism normal 8 64 8x8 L2 .2 middle none none 38 sdf dillute_prism normal 8 64 3: 6x6 L2 .2 middle none none 39 sdf dillute_prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 40 geo dillute_prism normal 6 64 16x16 L2 .2 down only middle none none 41 geo dillute_prism normal 6 256 4x4 L2 .05 middle scalar all 42 geo dillute_prism normal 6 64 4x4 L2 0 none tensor all 43 geo dillute_prism normal 6 64 4x5 L1 0 none tensor all 44 geo dillute_prism normal 7 64 4x4 L2 .2 down only middle none none 45 geo dillute_prism normal 7 64 2: 8x8 L2 .2 middle none none 46 geo dillute_prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 47 geo dillute_prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 48 geo dillute_prism normal 7 32 4: 6x6 L2 .2 middle scalar 1, 2 49 geo dillute_prism normal 7 64 4x4 L2 .2 middle scalar "3 50 geo dillute_prism normal 7 64 4x4 L2 0 none tensor all 51 geo dillute_prism normal 7 64 4x4 L1 0 none tensor all 52 geo dillute_prism normal 8 64 4x4 L1 0 none tensor all 53 geo dillute_prism normal 8 64 4x4 L2 .2 down only middle none none 54 geo dillute_prism normal 8 32 4: 6x6 L2 .2 middle scalar 1, 2 55 geo dillute_prism normal 8 64 4x4 L2 .05 down only middle none none 56 geo dillute_prism normal 8 64 8x8 L2 .2 middle none none 57 geo dillute_prism normal 8 64 3: 6x6 L2 .2 middle none none 58 geo dillute_prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 59 sdf geo dillute_prism 7 64 4x4 L2 .2 down only middle none none 60 sdf geo dillute_prism 7 64 2: 8x8 L2 .2 down only middle none none 61 sdf geo dillute_prism 7 128 4x4 L2 .05 down only middle none none
Appendix D HyperParameter Tuning: Phase 2
max width= Skip Type Beta 1 Beta 2 Constrained 1 scalar 0.9 0.99 yes 2 tensor 0.9 0.99 yes 3 scalar 0.85 0.99 yes 4 tensor 0.85 0.99 yes 5 scalar 0.95 0.99 yes 6 tensor 0.95 0.99 yes 7 scalar 0.9 0.95 yes 8 tensor 0.9 0.95 yes 9 scalar 0.85 0.95 yes 10 tensor 0.85 0.95 yes 11 scalar 0.95 0.95 yes 12 tensor 0.95 0.95 yes 13 scalar 0.9 0.999 yes 14 tensor 0.9 0.999 yes 15 scalar 0.85 0.999 yes 16 tensor 0.85 0.999 yes 17 scalar 0.95 0.999 yes 18 tensor 0.95 0.999 yes 19 scalar 0.9 0.99 no 20 tensor 0.9 0.99 no 21 scalar 0.85 0.99 no 22 tensor 0.85 0.99 no 23 scalar 0.95 0.99 no 24 tensor 0.95 0.99 no 25 scalar 0.9 0.95 no 26 tensor 0.9 0.95 no 27 scalar 0.85 0.95 no 28 tensor 0.85 0.95 no 29 scalar 0.95 0.95 no 30 tensor 0.95 0.95 no 31 scalar 0.9 0.999 no 32 tensor 0.9 0.999 no 33 scalar 0.85 0.999 no 34 tensor 0.85 0.999 no 35 scalar 0.95 0.999 no 36 tensor 0.95 0.999 no
Comments
There are no comments yet.