Machine Learning-Based Optimal Mesh Generation in Computational Fluid Dynamics

02/25/2021 ∙ by Keefe Huang, et al. ∙ Siemens AG Technische Universität München 41

Computational Fluid Dynamics (CFD) is a major sub-field of engineering. Corresponding flow simulations are typically characterized by heavy computational resource requirements. Often, very fine and complex meshes are required to resolve physical effects in an appropriate manner. Since all CFD algorithms scale at least linearly with the size of the underlying mesh discretization, finding an optimal mesh is key for computational efficiency. One methodology used to find optimal meshes is goal-oriented adaptive mesh refinement. However, this is typically computationally demanding and only available in a limited number of tools. Within this contribution, we adopt a machine learning approach to identify optimal mesh densities. We generate optimized meshes using classical methodologies and propose to train a convolutional network predicting optimal mesh densities given arbitrary geometries. The proposed concept is validated along 2d wind tunnel simulations with more than 60,000 simulations. Using a training set of 20,000 simulations we achieve accuracies of more than 98.7 Corresponding predictions of optimal meshes can be used as input for any mesh generation and CFD tool. Thus without complex computations, any CFD engineer can start his predictions from a high quality mesh.



There are no comments yet.


page 5

page 6

page 7

page 10

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computer Aided Engineering (CAE) tools are integral parts of industrial product development today. Among these, Computational Fluid Dynamics (CFD), addressing the simulation and prediction of fluid and air flows as well as their effects, is the fastest growing domain. One of the major limitations of CFD is its demand in terms of computational resources. The accuracy of the prediction depends very much on the specific mesh, e.g. a very fine mesh in regions where boundary layer separation sets in or turbulent structures develop. Identifying an optimal mesh is a highly complex task and particularly depends on the specific flow situations. Years of experience are typically required to be able to generate efficient computational meshes.

Several strategies have been proposed in the past to determine optimal computational meshes [23]

. Among these, refinements based on goal-oriented error estimators


have been the most successful ones. Given a simulation goal, e.g. drag calculation, they determine which regions are likely to contribute the biggest error and refine the mesh in these regions correspondingly. Performing this task iteratively Pareto optimal meshes in terms of accuracy and degrees of freedom can be found. However, this calculation itself is highly computationally demanding and only available in a limited number of computational tools.

In the past years, a number of machine learning technologies have been proposed to accelerate the process of flow simulations, by either learning from corresponding simulations and predict the results by means of neural networks

[16] or by combining both approaches and thus accelerating simulations [20]. However in the latter case, the identification of optimal computational meshes is not resolved. Furthermore, a major concern of these approaches in the context of industrial use is that any machine-learning based approach will be hard to be validated.

Within this contribution, we adopt a different approach. Instead of accelerating the flow simulation itself we focus on a machine-learning-based solution to accelerate optimal grid generation. Following the approach of [19], we determine optimal meshes for a large number of specific situations using a commercial solver (Simcenter STAR-CCM+ [24]). Based on a large set of examples / data points we train a convolutional neural network to predict optimal mesh distributions. The predicted optimal mesh densities can be used as an input for any CFD meshing/ simulation tool. Here, we focus on the task of identifying an optimal mesh for a 2-dimensional channel/ wind-tunnel-like flow with arbitrary geometries inside. A generalization to other examples is straight forward.

Such kind of mesh refinement based on machine learning is a field with high likelihood for industrial adoption. First, compared to many previous works combining machine learning and simulations, the approach is not sensitive to validation and verification. Second, while non-optimal meshes can lead to bad convergence behavior, they still result in correct predictions. Third and finally, only a limited set of industrial simulation tools provide solutions for the adjoint problem while a lot of adaptive mesh refinement strategies rely on them. The predicted mesh—trained using adjoint solutions—is computationally inexpensive to create and can directly be used or provide an initial starting guess for further refinement.

In the following sections we present our new approach for machine-learning based optimal mesh generation. In section 2 we discuss the current state of the art and give a recap on adaptive mesh refinement as well as a brief history on how machine learning is used in CFD. In section 3 we present how we create the data which is later on used for training the convolutional neural network which is introduced in section 4. Afterwards we show in section 5 results for the training as well as prediction of two sample geometries—one from the test set and one randomly generated which has not been seen during training or validation. We close this paper with a discussion in section 6 on how this approach can be used in CFD.

2 State of the Art

Current CFD methodologies allow for very accurate results to be produced at the price of high computational cost. This can lead to simulation times lasting days or even weeks for large and complex geometries. As a result, design optimization studies are often too costly and slow to be feasible.

Simulation time can be reduced by accepting a higher error or coarser resolutions but this can lead to non-physical behaviour in the simulation. The goal of introducing adaptive mesh refinement and a corresponding machine learning version is to achieve a speed-up of the overall process without sacrificing in terms of accuracy and resolution. Therefore, in the next section we will give a review on adaptive mesh literature and approaches for machine learning in CFD.

2.1 Adaptive Grid Refinement

Since the first days of continuum mechanics simulations, adaptive mesh refinement has been in the focus of research. The computation time of corresponding simulations scales at least linearly with the size of the mesh, i.e. the number of vertices or degrees of freedom, but in most cases even quadratically. Since in many situations the physical quantity of interest is heavily influenced by local phenomena, having an effective non-homogeneous adaptive grid with an appropriate local refinement, i.e. an adaptivity in mesh size, is key for economic simulations. Choosing an appropriate computational mesh very often depends on the experience of the simulation expert using corresponding tools. Though over the years quite a number of heuristic approaches have been developed


One approach to refine the mesh is to use classical error estimation methods, e.g. estimating errors in global (energy) norms exploiting variational formulations of the underlying problems [2]. However, these kind of global bounds are often agnostic of the actual physical quantities of interest, e.g. drag of an object or a physical value at a specific location. That is, bounds on the errors of the quantities of interest cannot be related directly to the global estimates. This makes it difficult to get indications where and how to refine the computational mesh.

Dual Weighted Residual (DWR) error estimators take a different approach [9]: They estimate local residuals of the numerical solutions and estimate their effect on the physical quantities of interests. The latter is achieved by solving an adjoint problem identifying sensitivities of the quantities of interest on the local error, i.e. they allow to estimate and localize the error. Thereby, the functional of interest cannot only be norms but also highly specific values such as any integral or point quantities. Realizing a feedback loop, the computational grid can be iteratively refined achieving an optimal mesh. This approach does not only allow for an efficient a posteriori error control for the physical quantities of interests but at the same time also results in highly economical meshes with optimal efficiency for the functional of interest.

The DWR method goes back to Becker and Rannacher [8] and is based on the pioneering work of Babuška and Rheinboldt [4, 3] and later refined by many others as Eriksson, Estep, Hansbo and Johnson [12]. For a complete overview we refer e.g. to the following surveys [27, 1]. Though, originally introduced for Finite Element Methods the DWR is versatile can can be generalized to other discretizations as well, e.g. Finite Volume Methods [6]. Corresponding functionality is available in academic as well as industrial simulation tools, e.g. deal.II [5], FEniCS [21], Simcenter STAR-CCM+ [24], or Ingrid Cloud [17], and has been applied for a large number of different applications, including fluid-dynamics [7], structural dynamics, as well as to complex multi-physics problems like chemically reactive flows [10] or fluid–structure interactions [15].

The usage of these methods, however, is rather limited in industrial practice today despite the opportunity of allowing for optimal meshes, i.e. highly efficient computations, and reliable error estimates. They are inherently based on the solution of adjoint problems which requires dedicated functionality of corresponding simulation tools. In particular, only a limited set of industrial simulation tools provides a corresponding capability. To circumvent that, we propose to learn optimal meshes trained on corresponding solvers with adjoint functionality which can be used as an inital starting point also in other simulation tools independent of the specific numerical method. In the next section, we review how machine learning techniques are applied in CFD as of now.

2.2 Machine Learning in CFD

Many methodologies exist for CFD. Similarly, machine learning techniques can be incorporated in a multitude of ways. The most straight-forward method is to replace the CFD simulation step altogether: training a network using the geometry and flow parameters in order for it to output the resulting flow.

The results of these replacement networks are highly dependent on the amount and quality of data used for training as well as the flow setup it is trained for. For example, Guo et. al. [16] were quite successful in deploying this technique for steady state laminar flows. They report a speedup of at least 2 orders of magnitude while keeping the overall error increase moderate for most scenarios. However, this result does not hold for all flow setups.

Employing machine learning techniques for turbulent flows, however, has proved challenging, e.g., the straight forward idea to deploy one network to replace an entire turbulent flow solver was not successful [25, 11]. There are, nevertheless, approaches that yield approximate results with higher throughput by replacing the most expensive solver step, the pressure step, with a neural network [26]. Another example is Ling et. al. [20]

who used a modified a RANS solver to include a network predicting the Reynolds stress anisotropy tensor with encouraging results. Furthermore, Kutz

[19] suggests that especially deep neural networks can yield high accuracy and throughput results as they are better suited for the high-dimensional input data and could potentially capture very complex flow phenomena in the deeper layers of the network.

These are only a few examples in a rapidly growing field. For a more extensive summary we refer to the recent review [11].

3 Data Generation Pipeline

One of the most important requirements for proper training is a representative set of training data. In this work, we employ supervised learning. Hence, we need to generate input-output pairs with ground truth results. This is accomplished via a data generation pipeline consisting of several key parts: First, we generate random geometries, described in

subsection 3.1. Then, an iterative refinement process is employed using repeated primal (subsection 3.2) and adjoint (subsection 3.3) solves. The results of the adjoint solves are used to refine the mesh at the end of each iteration, using the process described in subsection 3.4. The data generated by the pipeline can then be processed for training.

3.1 Random Geometry Generation

The aim of geometry generation is to generate a randomized set of geometries that is representative of the full space of possible CFD simulations within the selected constraints. Corresponding simulations will be the basis for generating data-sets to be used for the machine learning part. We divide the general geometry creation process into several phases—each with a randomized component. These phases are illustrated as diamonds in the flowchart in figure 1. Additionally, we show an example for the occurring transformations in the column on the right starting from two primitives—a triangle and a square. Note that in this work, we restrict the geometry to a square region within the larger flow domain.

Figure 1: Geometry creation pipeline

The first phase in geometry creation is the primitives phase. In this phase, the number of primitives is randomly determined. We use five basic primitives: a triangle, square, pentagon, hexagon, and duodecagon. This extends the space of possible created geometries since relying on a single geometry cannot produce representative results. Currently, the pipeline only supports the use of one or two primitives.

The second phase is the primitive type selection. Our approach mirrors the methodology applied in [20], where several primitives for random geometry creation were used when training a neural network on a similar problem. Unlike in [20]

, we omit the circle or ellipse. Instead we smooth some of the sharp corners in primitives by replacing lines between vertices by splines. We employ a uniform distribution, i.e. there is an equal chance that any one geometry is selected.

The third phase is the transform phase

. In this phase, several transformations are performed on base primitives to create a randomized shape. We begin with a uniform, unit primitive, such as a unit square. Then, two extensions, one rotation and a displacement are performed. An extension has two random variables, the axis of extension and the extension factor. The axis of extension is an axis with an angle

from the horizontal plane between 0 and that passes through the centroid of the base primitive. Each vertex is then extended by the extension factor in the direction normal to the axis of extension

. While the axis of extension has a uniform distribution between 0 and 180°, the extension factor is normally distributed between 1 and 5, with a mean of 3 and a sigma of 1.5.

In the fourth phase, the positioning phase, the geometry is randomly rotated and then positioned within the specified bounds. The first primitive is positioned in the middle of the allowed bounds to avoid any shift of the flow towards boundaries. The second primitive (if one is used) is positioned randomly until both overlap. Thereby a uniform distribution is used to determine the random position of the second primitive.

The fifth phase is the spline creation phase. The use of splines is suggested in the place of straight lines between vertices to reduce the number of sharp edges. We implemented this by placing midpoints randomly between vertices and drawing splines using the new set of points. There are two random aspects when performing this operation: the number of midpoints and their random distribution between the two vertices. Here, we use 0-5 midpoints with a uniform distribution.

The final phase in random geometry creation is to merge transformed primitives by performing a boolean operation. This phase only takes place if two primitives are used. There are two possible boolean operations—union and intersection. A union combines the two primitives while an intersect uses only the region where the two primitives intersect. Note that the intersect operation introduces some complexity. Firstly, there is a general tendency to form slivers. Secondly, there is also no current control on the minimum size of intersected region. To prevent arbitrary small sizes of intersecting regions, we extend all intersect regions by a random factor uniformly chosen between 3-5 after the intersect operation is performed.

3.2 CFD Simulation: Primal Solve

After having a pipeline to generate random geometries, the next step is to set-up the primal CFD simulation to allow for convergence with a wide variety of random geometries. This section overviews the inputs for the primal solver, the adjoint solver and the resulting mesh refinement. We employ Simcenter STAR-CCM+ [24], a commercial CFD software developed by Siemens Digital Industry Software, for the simulations.

3.2.1 Computational Domain

Since turbulence effects extend extremely far from the base geometry, we place the generated geometry in a region of size which is positioned away from the inlet as well as the upper and lower boundaries and away from the outlet as illustrated in figure 2. The resultant domain size has a blockage ratio of less than to ensure the free slip boundary conditions. For this experiment, we set the length to be , with an appropriately scaled domain. This domain is fully within the turbulent regime, such that it serves as a starting point for similar projects involving fast moving medium-sized objects, like moving vehicles or wind-tunnel-like applications.

The boundary conditions applied to the flow problem are depicted on the left side of figure 2.

  1. At the inlet boundary, the flow is defined as subsonic and incompressible with an inflow velocity perpendicular to the boundary wall. An inflow velocity of was used, which is approximately .

  2. At the outlet boundary, the flow regime is also defined as subsonic, with a static pressure of .

  3. The upper and lower wall boundaries are considered as far-field and defined as free-slip walls.

  4. The randomly generated geometry or obstacle is defined as a no-slip wall with the wall roughness set to smooth.

3.2.2 Initial Mesh Setup

Though the goal of this work is to learn the mesh sensitivity with respect to certain input geometries, it is still necessary to have a mesh with good quality to improve the robustness of the mesh refinement process.

Figure 2: Global mesh overview of a random geometry: complete setup and close-up of the geometry and grid. (Note that the illustration is not in scale to also illustrate the mesh refinement.)

A meshing example is illustrated in figure 2. We employ polygonal mesh elements as the Simcenter STAR-CCM+ solver is optimized for this element type. Additionally, two wake refinements are defined to capture the eddies behind the obstacle. Prism layers are employed at the obstacle boundary. Detailed parameters for the setup of the mesh including prism layers and wake refinement can be found in appendix A.1.

3.2.3 Primal Solver

To make our CFD runs as fast but also as robust as possible we set the following solver specifications. We employ a coupled implicit scheme due to its robustness. Furthermore, the Algebraic Multigrid (AMG) method is used to accelerate the convergence. To speed up the convergence further, we use a large CFL number of 200. To avoid instability in the initial iteration steps, we employ linear CFL ramping for the first 50 solver iterations, starting with a CFL number of 0.25. Additionally, we use grid sequencing to provide a better "guess" of the flow solution. The detailed parameters for grid sequencing initialization can be found in appendix A.2.

To ensure an efficient use of the computing ressources, we furthermore define several stopping criteria to allow the easy automation of running CFD simulations. The goal of these criteria is to stop simulations once we reach the desired accuracy or if the simulation does not converge. There are three situations where the simulation should terminate: reaching steady state, reaching sufficient accuracy, or reaching maximum iteration number. The three cases as well as the associated parameters are detailed in appendix A.2.

3.3 CFD Simulation: Adjoint Solve

Once the primal solution converges, the adjoint solver is employed. In general, the adjoint solution is an efficient way to compute gradients of a quantity of interest based on the input parameters. For a brief introduction we refer to the appendix B and to [14]. In our specific case, we can use the resulting gradients to compute sensitivites of the input quantities with respect to cell densities of the current mesh. This helps us to identify areas where mesh refinement is needed to reduce the numerical error of the computed drag force. We refer to the calculated sensitivity as the estimated adjoint error or mesh sensitivity.

First, the cell residuals of the Navier-Stokes equations are recomputed with a higher-order discretization scheme. The difference between the primal solution and the higher order residual is taken as an estimation of the true numerical error. Next, the field of residual differences is multiplied with the adjoint solution of the given cost function to obtain the mesh sensitivity. Thereby, adjoint methods significantly reduce the cost of calculating parametric sensitivities by selecting a specific Lagrange multiplier. For a short explanation we refer to appendix B. The main savings of the adjoint method stem from the lower complexity of solving the adjoint equation. Let us assume that solving for has a complexity of , and solving the adjoint equations has a complexity of . If , when investigating a set of x parameters, implementing the adjoint equations reduces the complexity of sensitivity calculation from to [13].

In this contribution, the drag force on the obstacle surface is chosen as the cost function. Since the magnitude of the error varies from geometry to geometry, we normalize the adjoint error by the drag force. An example of mesh sensitivity is shown in figure 3.

Figure 3: Mesh sensitivity estimation

3.4 CFD Simulations with Adaptively Refined Meshes

After obtaining the adjoint error, the estimated error per cell is checked and compared to a threshold value, e.g. . If the error is above this value, we refine the affected cell by reducing the base size of the cells in the local region. We note that when a cell is refined, the neighbouring cells are also adapted to generate a continuous mesh based on the growth ratio. This ratio determines that maximum change in neighboring cell sizes. A growth ratio of was chosen to ensure smooth transitions in the domain after refinement.

After refining the mesh, we then obtain a new adjoint solution by repeating the primal and adjoint solve using the newly refined mesh. For the purpose of data generation, we repeat this refinement process until either the adjoint solution reaches a steady state or the adjoint solver no longer converges. This means that both a new primal solve and adjoint solve is required for each refinement step. An example of a mesh after multiple refinements can be seen in figure 4.

Figure 4: Top: No mesh refinement. Bottom: After several refinement steps

4 Neural Network Training Pipeline

CFD simulations are usually based on non-rectilinear mesh structures with varying cell sizes in the simulation domain. The spatial irregularity of the mesh poses a difficult prediction target. As the number of cells and their size vary between simulations, predicting individual edge or node positions via a neural network approach is infeasible. Fortunately, the user is typically not interested in the exact position of a specific edge in the mesh, but rather in a typical cell size in a given area of the simulation domain. We therefore chose the average size of cells in a region as the prediction target—essentially corresponding to the refinement level of the mesh. This quantity can easily be described in a rectilinear grid covering the domain of interest and is not concerned with individual cells, which there can be hundreds of thousands of. The regular nature of this grid also lends itself to techniques used in image processing. We therefore chose to export the cell sizes from our CFD solver as an image, with the gray scale channel indicating the relative size of cells. When these images are blurred and down sampled as described in subsection 4.2, the resulting 128x128 images are a representation of mesh refinement and are easily fed into a convolutional neural network as prediction targets.

In the following, we detail the overall processes and architectures chosen in this work. The architecture of choice, the Staircase UNet Architecture, is introduced in subsection 4.1. The pre-processing of data for training is explored in subsection 4.2.

4.1 Neural Network Architecture

The UNet architecture with skip-connections used in [25]

is also used in this work. This is a simple convolutional neural network consisting of multiple layers. In each layer, input data passes through a convolution, and an activation function followed by down- or up-sampling. Additional steps can be taken for improved performance, such as batch normalisation or dropout but these will be discussed later when addressing hyper-parameter tuning. A basic UNet architecture without skip-connections is shown in figure

5 in blue. Here, the number of convolutional channels in each layer increases as the data is down-sampled and decreased as it is up-sampled. As the size of our input and output data is the same, we consider only symmetric UNet architectures.

In addition to the basic UNet structure, the authors of [25] also utilized skip-connections and successfully predicted flow-based quantities. These connections can be seen in figure 5, represented by light red lines.

Figure 5: Left: UNet architecture (blue) with skip-connections (red); Right: Staircase UNet architecture with skip-connections

Skip-connections allow for information to be transferred between certain input and output layers in a neural network [22]. They are important because the magnitude of the coefficients in the initial layers of a neural network are often much smaller compared those in the final layers. As a result, the gradients of the loss in these initial layers is often very small, which typically results in a poorly-conditioned optimization problem with slow convergence. Skip-connections alleviate this problem by equalizing the magnitudes of the gradients since they reduce the effective distance between output and input layers, thereby improving the training of much deeper networks [22].

In this work, we modify the UNet with skip-connections model, creating a Staircase UNet Architecture. This structure adds additional layers at each depth of the network, such that additional convolutions are performed before up- or down-sampling occurs, a visual representation of this architecture can be seen in figure 5.

With this architecture, we aim to preserve more information when passing data through skip-connections. This is best understood by examining figure 5. When utilizing the staircase architecture, connecting layers and would be analogous to utilizing the basic skip connection seen in figure 5 because the combined data from these two layers would be immediately upsampled. However, if we chose to connect layers and or and , the resulting combined data would undergo another full convolutional layer before upsampling—the idea being that upsampling could destroy features that would otherwise prove useful for training. Henceforth, when referring to these two types of skip-connections, we call the former, where information is immediately upsampled, an outer skip-connection and the latter, which allows for an additional convolution prior to upsampling, an inner skip-connection.

4.2 Data Preparation

4.2.1 Smoothing

Our input images often contain sharp edges where the values of adjacent cells differ greatly. Additionally, capturing individual cells, which in practical CFD applications can have arbitrary shapes and sizes, is difficult in machine learning. Hence, a Gaussian blur (commonly used in graphics and computer vision) is used to smoothen the edges and improve training. That is, we apply a convolution to the image with a Gaussian function:


which depends on parameters and radius . Thus for a pixel at the Gaussian blur of a pixel is defined as follows:


where is a function of pixel value in the original image. When performing the blur, we apply Neumann boundary conditions and assume that the domain beyond the picture is of the same color as that on the boundary.

Additionally, we cannot apply this blur onto geometry pixels as this would destroy geometry boundary information. Hence, we apply a selective blur that ignores data on the geometry. This is done by modifying the convolution kernel for each pixel such that it excludes data, thus restricting smoothing to non-geometry pixels:


where represents an indicator function of geometry. It returns 0 if pixel is inside the geometry, 1 otherwise. For the following steps, we decided to use and .

4.2.2 Downsampling

We perform downsampling to reduce the resolution of the image to an acceptable size for the neural network. The transformation of the image is also applied to geometry. There are several ways this step can be performed, such as Fourier or Wavelet transforms. However, we use averaging. Initially, images exported from the CFD solver are at a resolution of 4000x4000 and a single gray scale channel, before being cropped to 3840x3840. The images are then downsampled at the same time as blurred by using the kernel from equation (4

) of size 30 with a stride of 30 without padding. This leads to a processed image size of 128x128. Both input and output of the network operate on this size. Due to the large number of images and high initial resolution, this post-processing was implemented using CUDA allowing to increase throughput by 2-3 orders of magnitude.

4.3 Training

4.3.1 Hyperparameters

After identifying the best performing networks (c.f. subsection 4.1), a systematic hyper parameter sweep is performed. The models in the preliminary sweep used the parameters seen in appendix C. We note from these initial results that networks with larger kernels and a larger number of convolutional channels performed better. This is unsurprising as larger kernels could capture more complex or larger features. Other than network and kernel size, the following parameters seem to be the most promising:

  1. Beta parameters for the Adam optimizer

  2. Scalar or tensor skip-connection

  3. Skip-connection constraints

  4. Skip-connection location

These are explained in greater detail in the following sections.

Beta Parameters

The Adam optimizer adds an effect akin to "momentum" when updating the weights of the neural network, more specifically, it includes the unbiased first and second moment estimates of previous weight updates. This creates some form of "inertia"—if many past updates tend to move in a specific direction, the current and future updates will be biased to also move in a similar direction. The Adam optimizer is seen having similar benefits to both the AdaGrad and RMSProp extensions of stochastic gradient descent: it is able to deal with sparse data, due to the bias correction when calculating the moving averages, and with noisy, non-stationary data, respectively


The two parameters tuned in this phase, and , control the decay rate of these moving averages. We find that the performance of the networks is highly sensitive to changes in both and and that minor changes to either could result in wildly varying performance when the locations of skip connections were modified. Therefore, we are unable to determine very general rules for better values. However, one general trend is that, with skip connections of depths 1-3, a lower value produces good results.

Scalar and Tensor Skip-Connections

In this contribution, skip-connections are implemented in two different ways: scalar (or simple) skip-connections and tensor skip-connections. In reference to figure 5, the layers and are both tensors of shape , where is the length of the down sampled input and is the number of convolutional channels used in that particular layer. The scalar skip connection combines data using a scalar as follows:

The tensor skip connection allows greater degrees of freedom, with a separate scalar value for each channel and the data is combined as follows:

We experiment with both tensor and scalar skip-connections but there is no clear conclusion on which skip-connection type produced better results.

Constrained Skip-Connections

Recalling that the values in scalar or tensor skip-connections are constrained between 0 and 1, we can implement this constraint multiple ways. The simplest method is just to clip the value of the skip-connections if they exceed 1 or drop below 0, which we refer to these skip-connections as constrained

. However, another method of constraining the values is through the use of a sigmoidal function, which maps from the space of real values

to the closed interval of . We refer to these skip-connections as unconstrained. Here, we found that architectures with unconstrained skip-connections tended to display better results.

Skip Connection Location

One more variable we experiment with is the location of skip connections. This means varying the depth at which skip connections are placed. A higher skip connection (at depth 1 or 2) passes larger chunks of information as it connects the earlier layers to the last few layers. Skip connections at lower depths pass smaller amounts of data and bypass a smaller number of convolutional layers.

Our experimentation indicats that placing skip connects at a higher depth typically improves performance. Interestingly, not all skip connections see use during training. There are cases where only two out of three skip connections pass any data, i.e., the parameter is close to 0. The best performing architecture has skip connections at depths 2, 3, 4, and 5.

4.3.2 Loss Function

A custom loss function is used in the the training of our neural network. To ensure not to learn inside the geometries, we mask the loss function with the geometry. For example, we compute the

loss as follows.


where is an indicator function of the geometry and prism layer for prediction and ground truth . As we cannot iteratively refine the prism-layers using the Simcenter STAR-CCM+ software, we also mask the prism layers, as seen in figure 6, to prevent the model from learning the extremely fine mesh density along the prism layers.

(a) Geometry w/o prism layer
(b) Geometry w/ prism layer (gray)
Figure 6: Geometry masks for loss calculation

5 Results

5.1 Big Data

Figure 7: Dataflow from Simulation to trained neural network. Top: LRZ Compute Clusters, Center: Locally hosted MongoDB, Rest: Local Compute Units.

Training our neural networks largely relied on the integrity and quality of the data produced in our simulation pipeline. To achieve the amounts of data needed, the process of data acquisition, pre-processing and training had to be mostly automated. Figure 7 shows the overall flow of raw data produced on LRZ compute clusters (, moving through different stages of pre-processing to finally being used as training input for the network architectures.

The pipeline design has 4 stages divided by different compute needs. Raw simulation data is produced with heavy use of parallelism on CPU clusters, with individual runs taking between 30-60 minutes with 10 cores (Figure 7 red). Exported data is uploaded to the central element of the pipeline, our locally hosted MongoDB instance handling all data operations from all stages. Generalized pre-processing (Figure 7 blue) including assessment of mesh quality is done on local CPU compute units, selecting which refinement iterations to use further down the pipeline. Image pre-processing (Figure 7 green) is handled by local GPU compute units using CUDA-accelerated code for selected iterations and also stored back in the database. Finally, all data necessary for training is combined in a single file. This file is then distributed to machines equipped with suitable GPUs training different neural network setups, which in return upload the results in form of the final loss and accuracy of the tested architecture (Figure 7 orange).

In total 65.000 simulations were performed on different clusters. At peak production rate, 50 concurrent simulations saturated the IvyMUC111 cluster at LRZ containing 496 Ivy Bridge CPU. With an average of 4.3 refinement iterations/simulation, 280.000 different meshes were created and their quality assessed. For each mesh, multiple qualities were exported for potential future use, which led to a database size of 1.2 TB containing 3.6 Mio. exported fields. Full simulation files allowing for simulation restarts were not permanently saved, as they amounted to another 5-6 TB. A full dataset containing processed geometries, masks, SDF-fields and mesh targets at 128x128 resolution amounts to 3.3GB in compressed state.

5.2 Neural Network Performance

For final training, a reduced set containing simulations that had been refined at least 5 or more times were used. This set holds the binary geometry input to the network, a mask to remove geometry and prism layer from the loss function as described in subsubsection 4.3.2 and the target mesh distribution. All 3 were used at 128x128 resolution. The reduced set amounts to 11.059 samples, split 90% / 10% for training and validation. Training was run with batch size 128 and an initial learning rate with an exponential decay of 0.89 every 650 steps. Figure 8

shows the training and validation loss for the final network training run. Both show a continuous decrease until reaching a plateau after 25.000 training steps, which is equivalent to 300 epochs.

Figure 8: Loss during training of best-performing network excluding geometry and prism layer regions. No overfitting is visible. Both losses have reached a plateau.

The best performing neural network is a Staircase UNet with Skip Connections using tensor skip connections. This network had a depth of 8 and 2 convolutional layers at each depth. We applied inner tensor skip connections at depths 2,3,4 and 5. We used beta values of and with the exponentially decaying learning rate using an Adam optimizer. This network had degrees of freedom and obtained an accuracy of on the validation dataset. In Figure 9 we illustrate four example geometries with their given reference, the resulting prediction for the velocity magnitude and the corresponding relative error.

Figure 9: Predictions from neural network, visualized are velocity magnitudes and relative errors between the reference and prediction for different validation data sets.

5.3 Sample Prediction Validation

Section 5.2 shows the ability of the network to learn refinement maps around a random geometry on a 128x128 scale. For future industrial use, the final step would be to automate a meshing process using these refinement maps to produce a mesh directly e.g. in Simcenter STAR-CCM+. Here, we present an example of how this process might work as well as quantitative performance of the resulting mesh.

(a) Extracted isosurfaces: , , , , .
(b) Resulting mesh with volume control zones and growth factor applied.
Figure 10: Extracted isosurfaces from mesh creation using sample geometry refinement prediction with neural network viewport outlined in blue.

The neural network output predicts an average mesh cell size per pixel. Instead of creating the theoretically available 16384 volume controls, we choose to extract 4-5 isosurfaces from the mesh prediction and use those for volume control zones in Simcenter STAR-CCM+ as seen in figure 10.

The network is clearly capturing the general wake behind the geometry as well as the shear layers, which leads to mesh refinement in those critical areas. Given the limitations in capturing the prism layers as mentioned in subsubsection 4.3.2, and the limitations imposed by working with downsampled 128x128 data, the network is not predicting the prism layer and smallest cells directly attached to the geometry. This is clearly visible when comparing the adjoint error estimate between the mesh produced using the network prediction and a mesh produced by the iterative refinement discussed in subsection 3.4.

(a) Iteratively refined mesh.
(b) Mesh generated from NN prediction.
a) Macro refinement b) NN refinement
Drag force (N) 75.2 75.4
Sum estimated error in viewport -1.4 -1.9
Sum estimated error in domain -1.4 -1.9
Max absolute error in viewport 0.13 0.094
(c) Absolute and error value comparison between meshes.
Figure 11: Adjoint error estimate for drag with the neural network viewport shown by square.

Figures 11 and 12 compare the adjoint error estimate on drag force per cell between a mesh iteratively refined by our macro pipeline used for training data production on the left, and the mesh generated from the neural network prediction on the right. Figure 11b shows the lack of refinement close to the geometry stemming from the masking of prism layers from our dataset. Compared to the iterative mesh in figure 11a, the error especially in front of the geometry is reduced and the wake and shear layers captured. The large error outside the viewport is a result of the default growth rate behaviour of the mesher, as no further control volumes were set outside the neural network viewport. Table 10(c) shows similar values for overall drag, as well as overall error in the viewport and total domain with the maximum cell error actually lower for the neural network mesh.

11todo: 1FM: Can you also add something about the computational cost of the meshes, e.g., # of cells or DoF? Similar for the other table
(a) Iteratively refined mesh.
(b) Mesh generated from NN prediction.
a) Macro refinement b) NN refinement
Drag force (N) 426.9 445.6
Sum estimated error in viewport -13.3 -5.0
Sum estimated error in domain -15.0 -8.0
Max absolute error in viewport 0.52 3.2
(c) Absolute and error value comparison between meshes.
Figure 12: Adjoint error estimate for drag with the neural network viewport shown by square.

Figure 12 shows a similar behaviour for a different geometry, which has not been seen during training or validation. Again the error in front of the object is reduced and the shear layers better resolved. The adjoint error estimate in table 11(c) in this case suggests a significantly reduced error in the viewport and overall domain for the neural network mesh. A common issue between the two meshes is the large error in cells around sharp bends around the object. This most likely is a result of the iterative refinement strategy not affecting prism layer settings, so both the iterative mesh and the network prediction based on training data with the same strategy share this weak point.

Overall, the meshes generated using the output of the neural network produce well converging meshes with error estimates comparable to or better than the iterative approach for the two samples investigated. Generally, the network outputs mesh refinement maps with very smooth gradients in cell size, while the iterative approach used to generate the training data often results in meshes with very localized refinement zones. This noisiness in the training data is a major contributor to the limit in accuracy during our network training. Given the network is seemingly not trying to reproduce this noise but rather going for larger structures and smooth gradients along the field bodes well for the usability of predicted meshes.

inlineinlinetodo: inlineFM: Is there an argument for computational gains? We could e.g. assume the training data production and NN training as offline phase. Then, do we see low computational cost when using the network as a premesher or something similar compared to the usual meshing? If that is true in some way, we should add that as an additional paragraph here (What were the savings? Storage vs. Computations?) and also in the conclusion (Plus there: How can this be improved? Combination of NN with standard meshing?).

6 Conclusions and Outlook

Within this contribution, we presented a novel machine-learning based approach for optimal mesh generation in computational fluid dynamics (CFD) applications. Basis was the commercial computational fluid dynamics simulator Simcenter STAR-CCM+ [24], which was expanded by an iterative mesh refinement loop to generate optimized meshes with respect to the overall error. This computational pipeline was based on an appropriate error estimator leveraging adjoint sensitivities222Meanwhile, corresponding functionality is available in the standard Simcenter STAR-CCM+ solution.. To achieve optimal results in a machine learning approach, we produced over 60.000 simulations, 6 Terabytes of data, and used about 50 years of serial compute time on CPU.

Based on a subset of this dataset we tested and fine-tuned different neural network architectures relying on three different base architectures all using the same dataset to train on. Automated testing of different variants was performed on multiple GPUs in what would account to roughly 2 months of continuous single GPU training time. Multiple hundred variants were trained and tested and multiple very accurate architectures found. It was shown that the modified U-Net architecture both with and without skip connections is able to predict refined mesh densities for a random geometry with an accuracy of %. Figure 9 illustrates the predictions obtained by the best performing network.

To produce a neural network of such quality required quite some effort with regard to data creation, processing and training, only made possible by the full automation of the entire process from CFD to Neural Network. The trained neural network that came out of this specific pipeline could be used to give inexperienced CFD users a reasonable first guess for a mesh refinement given their geometry. This could reduce the need for tedious hand-tuning of the mesh as it should predict a mesh that is able to produce a convergent result. This use of neural networks in the general CFD pipeline is one of the most risk-averse ways of using machine learning in this context. Other research focuses on replacing part of the actual CFD solver with neural networks to speed up the simulation. However, this could lead to inaccurate results when dealing with edge cases as the networks itself have no knowledge of the underlying physics. Our approach can at worst produce a bad mesh, which is easily spotted by lack of convergence of the simulation. But it can in the best case significantly speed up the initial phases of setting up the CFD simulation for an inexperienced engineer and actually produce usable results directly after the first run. The compute resources needed for a single prediction would be a matter of seconds on consumer machines and are definitely worth the wait.

So far we have restricted ourselves to very specific setup, i.e. for a fixed speed flow and geometries of a certain size range. Future studies will address the potential to generalize the concept, e.g. predictions for geometry sizes in different orders of magnitude or different flows. Also of high interest would be to scale this approach from 2D to 3D. Though this might lead to an very big amount of data required for training. Leveraging machine learning in the overall CFD workflow of an engineer can speed up tasks that are highly dependent on personal experience like mesh creation. Even more importantly, machine learning in the context of mesh generation has only little influences on validation and verification of simulation methods as required in typical industrial use case.

Acknowledgements: The authors greatly acknowledge the contributions of 2018/19 Bavarian Graduate School of Computational Engineering 2018/19 team333 (Peer Breier, Qunsheng Huang, Moritz Krügener, Oleksander Voloshyn, Mengjie Zhao), the support of the Bavarian Graduate School of Computational Engineering444, the Leibnitz Supercomputing Centre providing compute resources, NVIDIA for providing GPUs through their GPU grant program, as well as Siemens AG and Siemens Digital Industry Software for providing extensive support throughout the project.


  • [1] M. Ainsworth and J. T. Oden (1993) A unified approach to a posteriori error estimation using element residual methods. Numerische Mathematik 65 (1), pp. 23–50. Cited by: §2.1.
  • [2] I. Babuška and A. Miller (1987) A feedback finite element method with a posteriori error estimation: part i. the finite element method and some basic properties of the a posteriori error estimator. Computer Methods in Applied Mechanics and Engineering 61 (1), pp. 1–40. Cited by: §2.1.
  • [3] I. Babuška and W. C. Rheinboldt (1978) A-posteriori error estimates for the finite element method. International Journal for Numerical Methods in Engineering 12 (10), pp. 1597–1615. Cited by: §2.1.
  • [4] I. Babuvška and W. C. Rheinboldt (1978) Error estimates for adaptive finite element computations. SIAM Journal on Numerical Analysis 15 (4), pp. 736–754. Cited by: §2.1.
  • [5] W. Bangerth, R. Hartmann, and G. Kanschat (2007) Deal. ii—a general-purpose object-oriented finite element library. ACM Transactions on Mathematical Software (TOMS) 33 (4), pp. 24–es. Cited by: §2.1.
  • [6] T. J. Barth (2005) A posteriori error estimation and mesh adaptivity for finite volume and finite element methods. In Adaptive Mesh Refinement-Theory and Applications, pp. 183–202. Cited by: §2.1.
  • [7] R. Becker, V. Heuveline, and R. Rannacher (2002) An optimal control approach to adaptivity in computational fluid mechanics. International journal for numerical methods in fluids 40 (1-2), pp. 105–120. Cited by: §2.1.
  • [8] R. Becker and R. Rannacher (1996) Weighted a posteriori error control in fe methods. IWR. Cited by: §2.1.
  • [9] R. Becker and R. Rannacher (2001) An optimal control approach to a posteriori error estimation in finite element methods. Acta numerica 10, pp. 1–102. Cited by: §1, §2.1.
  • [10] M. Braack and A. Ern (2004) Adaptive computation of reactive flows with local mesh refinement and model adaptation. In Numerical mathematics and advanced applications, pp. 159–168. Cited by: §2.1.
  • [11] S. L. Brunton, B. R. Noack, and P. Koumoutsakos (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52, pp. 477–508. Cited by: §2.2, §2.2.
  • [12] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson (1995) Introduction to adaptive methods for differential equations. Acta numerica 4, pp. 105–158. Cited by: §2.1.
  • [13] K. C. Giannakoglou and D. I. Papadimitriou (2008) Adjoint methods for shape optimization. In Optimization and computational fluid dynamics, pp. 79–108. Cited by: §3.3.
  • [14] M. B. Giles and N. A. Pierce (2000) An introduction to the adjoint approach to design. Flow, Turbulence and Combustion 65 (3), pp. 393–415. External Links: Document, ISBN 1573-1987, Link Cited by: §3.3.
  • [15] T. Grätsch and K. Bathe (2006) Goal-oriented error estimation in the analysis of fluid flows with structural interactions. Computer methods in applied mechanics and engineering 195 (41-43), pp. 5673–5684. Cited by: §2.1.
  • [16] X. Guo, W. Li, and F. Iorio (2016) Convolutional neural networks for steady flow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 481–490. Cited by: §1, §2.2.
  • [17] Ingrid Cloud - Smart Wind Simulations Ingrid Cloud. External Links: Link Cited by: §2.1.
  • [18] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §4.3.1.
  • [19] J. N. Kutz (2017) Deep learning in fluid dynamics. Journal of Fluid Mechanics 814, pp. 1–4. Cited by: §1, §2.2.
  • [20] J. Ling, A. Kurzawski, and J. Templeton (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics 807, pp. 155–166. Cited by: §1, §2.2, §3.1.
  • [21] A. Logg, K. Mardal, and G. Wells (2012) Automated solution of differential equations by the finite element method: the fenics book. Vol. 84, Springer Science & Business Media. Cited by: §2.1.
  • [22] X. Mao, C. Shen, and Y. Yang (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in neural information processing systems, pp. 2802–2810. Cited by: §4.1.
  • [23] T. Plewa, T. Linde, V. G. Weirs, et al. (2005) Adaptive mesh refinement-theory and applications. Springer. Cited by: §1, §2.1.
  • [24] Simcenter STAR-CCM+ Siemens Digital Industry Software. External Links: Link Cited by: §1, §2.1, §3.2, §6.
  • [25] N. Thuerey, K. Weissenow, H. Mehrotra, N. Mainali, L. Prantl, and X. Hu (2018) Well, how accurate is it? a study of deep learning methods for reynolds-averaged navier-stokes simulations. arXiv preprint arXiv:1810.08217. Cited by: §2.2, §4.1, §4.1.
  • [26] J. Tompson, K. Schlachter, P. Sprechmann, and K. Perlin (2017) Accelerating eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3424–3433. Cited by: §2.2.
  • [27] R. Verfürth (1994) A posteriori error estimation and adaptive mesh-refinement techniques. Journal of Computational and Applied Mathematics 50 (1-3), pp. 67–83. Cited by: §2.1.


Appendix A CFD solver parameters

a.1 Mesh Setup

Table 1 states the parameters and their value for the base mesh domain while detailed setups of the prism layer and the wake refinements can be found in table 2 and table 3, respectively.

Default controls Values [unit]
Base size 1 m
Target surface size 100 Percentage of base
Mininum surface size 10 Percentage of base
Growth rate 1.05
Table 1: Global mesh setup overview
Obstacle surface control parameters Values [unit]
Number of prism layers 10
Prism layer total thickness 100 Percentage of base
Prism layer stretching 1.3
Mininum surface size 0.5 Percentage of base
Table 2: Obstacle prism layer setup overview
Wake refinement 1 Wake refinement 2 [unit]
Isotropic size 20 40 percentage of Base
Distance 30 70 m
Growth rate 1.2 1.2
Table 3: Wake refinement setup overview

a.2 Primal Solver

The parameters for grid sequencing initialization can be found in table 4.

Grid sequencing parameters Values
Maximum grid levels 10
Maximum iterations per level 50
Convergence tolerance per level 0.01
CFL number 50
Table 4: Grid sequencing setup

To ensure an efficient usage of compute resources as well as easy automation, we consider the following three situations where the simulation should terminate to :

  1. Reaching Steady State: The ideal case is when the drag force and residuals reach a steady state and the solution does not change with more iterations. During automation, we only consider the continuity residual when checking for steady state as it is representative of the other residual quantities. This case is visualized in figure 12(a) and figure 12(b) where we plot the residuals. We take a moving average of the latest 100 residual values and if this moving average remains within a given tolerance, we accept that the simulation has converged.

  2. Reaching Sufficient Accuracy: Since the adjoint solver only requires a sufficiently converged result, the simulation can also be stopped when the residual falls below a certain value. Similar to the asymptotic stopping criteria, only the continuity residual is considered. We set the threshold for the continuity residual at . One of the example of reaching the minimum criteria can be found in figure 12(c) where we can clearly see that the continuity residual (red line) touches the x-axis, which indicates a residual of .

  3. Reaching Maximum Iteration Number: Since a large number of converged simulations is required to form a sufficiently large data set, it is more efficient to drop a simulation that does not converge after a long time and to restart. Therefore, a maximum iteration number criteria is set to avoid wasting resources. The maximum iteration number is set to be 5000 since—after running several batches of random geometries—we observed that stopping criteria 1. and 2. most often took effect before reaching 3000 iterations. Hence, we chose 5000 to include a safety factor to avoid ending viable simulations prematurely. One of the example of reaching the maximum iteration number can be found in figure 12(d) where the residuals are not achieving the required threshold.

(a) Drag force asymptotic stopping criteria
(b) Continuity asymptotic stopping criteria
(c) Continuity minimum criteria
(d) Maximum iteration number criteria
Figure 13: Visualization of stopping criteria

Appendix B Explanation Adjoint Method

A simple explanation of the adjoint method is shown as follows. Given a solution to a series of governing equations and a cost function both of which vary depending on an arbitrary input parameter and position , we can approximate that gradients of the two variables in relation to changes in F and :


Based off equations (7) and (8), we can formulate the adjoint equation as in equation (9), with Lagrange multiplier


We then select a Lagrange multiplier such that the first term is zero


The resulting equations to solve for the sensitivity of the cost function to changes to parameter is then


We see that in the equation (11), the sensitivity to parameter is independent of and the solution can be reused when investigating different parameters. Additionally, the equation (11) is linear and is typically far cheaper than solving for .

Appendix C Hyper-Parameter Tuning: Phase 1

max width= input mask reference depth channels kernels loss relu batchnorm skip type skip location 1 sdf prism normal 6 64 16x16 L2 .2 down only middle none none 2 sdf prism normal 6 256 4x4 L2 .05 middle scalar all 3 sdf prism normal 7 64 4x4 L2 .2 down only middle none none 4 sdf prism normal 7 64 2: 8x8 L2 .2 middle none none 5 sdf prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 6 sdf prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 7 sdf prism normal 8 64 4x4 L2 .2 down only middle none none 8 sdf prism normal 8 64 8x8 L2 .2 middle none none 9 sdf prism normal 8 64 3: 6x6 L2 .2 middle none none 10 sdf prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 11 geo prism normal 6 64 16x16 L2 .2 down only middle none none 12 geo prism normal 6 256 4x4 L2 .05 middle scalar all 13 geo prism normal 7 64 4x4 L2 .2 down only middle none none 14 geo prism normal 7 64 2: 8x8 L2 .2 middle none none 15 geo prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 16 geo prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 17 geo prism normal 8 64 4x4 L2 .2 down only middle none none 18 geo prism normal 8 64 8x8 L2 .2 middle none none 19 geo prism normal 8 64 3: 6x6 L2 .2 middle none none 20 geo prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 21 sdf dillute_prism normal 6 64 16x16 L2 .2 down only middle none none 22 sdf dillute_prism normal 6 256 4x4 L2 .05 middle scalar all 23 sdf dillute_prism normal 6 64 4x4 L2 0 none tensor all 24 sdf dillute_prism normal 6 64 4x4 L1 0 none tensor all 25 sdf dillute_prism normal 7 64 4x4 L2 .2 down only middle none none 26 sdf dillute_prism normal 7 64 2: 8x8 L2 .2 middle none none 27 sdf dillute_prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 28 sdf dillute_prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 29 sdf dillute_prism normal 7 32 4: 6x6 L2 .2 middle scalar 1, 2 30 sdf dillute_prism normal 7 64 4x4 L2 .2 middle scalar 3, 4, 5, 6 31 sdf dillute_prism normal 7 64 4x4 L2 0 none tensor all 32 sdf dillute_prism normal 7 64 4x4 L1 0 none tensor all 33 sdf dillute_prism normal 8 64 4x4 L1 0 none tensor all 34 sdf dillute_prism normal 8 64 4x4 L2 .2 down only middle none none 35 sdf dillute_prism normal 8 32 4: 6x6 L2 .2 middle scalar 1, 2 36 sdf dillute_prism normal 8 64 4x4 L2 .05 down only middle none none 37 sdf dillute_prism normal 8 64 8x8 L2 .2 middle none none 38 sdf dillute_prism normal 8 64 3: 6x6 L2 .2 middle none none 39 sdf dillute_prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 40 geo dillute_prism normal 6 64 16x16 L2 .2 down only middle none none 41 geo dillute_prism normal 6 256 4x4 L2 .05 middle scalar all 42 geo dillute_prism normal 6 64 4x4 L2 0 none tensor all 43 geo dillute_prism normal 6 64 4x5 L1 0 none tensor all 44 geo dillute_prism normal 7 64 4x4 L2 .2 down only middle none none 45 geo dillute_prism normal 7 64 2: 8x8 L2 .2 middle none none 46 geo dillute_prism normal 7 64 3: 6x6 L2 .2 middle scalar 1, 2 47 geo dillute_prism normal 7 128 4x4 L2 .05 down only middle tensor 1, 2 48 geo dillute_prism normal 7 32 4: 6x6 L2 .2 middle scalar 1, 2 49 geo dillute_prism normal 7 64 4x4 L2 .2 middle scalar "3 50 geo dillute_prism normal 7 64 4x4 L2 0 none tensor all 51 geo dillute_prism normal 7 64 4x4 L1 0 none tensor all 52 geo dillute_prism normal 8 64 4x4 L1 0 none tensor all 53 geo dillute_prism normal 8 64 4x4 L2 .2 down only middle none none 54 geo dillute_prism normal 8 32 4: 6x6 L2 .2 middle scalar 1, 2 55 geo dillute_prism normal 8 64 4x4 L2 .05 down only middle none none 56 geo dillute_prism normal 8 64 8x8 L2 .2 middle none none 57 geo dillute_prism normal 8 64 3: 6x6 L2 .2 middle none none 58 geo dillute_prism normal 8 64 4x4 L2 .05 down only middle scalar 1, 2 59 sdf geo dillute_prism 7 64 4x4 L2 .2 down only middle none none 60 sdf geo dillute_prism 7 64 2: 8x8 L2 .2 down only middle none none 61 sdf geo dillute_prism 7 128 4x4 L2 .05 down only middle none none

Appendix D Hyper-Parameter Tuning: Phase 2

max width= Skip Type Beta 1 Beta 2 Constrained 1 scalar 0.9 0.99 yes 2 tensor 0.9 0.99 yes 3 scalar 0.85 0.99 yes 4 tensor 0.85 0.99 yes 5 scalar 0.95 0.99 yes 6 tensor 0.95 0.99 yes 7 scalar 0.9 0.95 yes 8 tensor 0.9 0.95 yes 9 scalar 0.85 0.95 yes 10 tensor 0.85 0.95 yes 11 scalar 0.95 0.95 yes 12 tensor 0.95 0.95 yes 13 scalar 0.9 0.999 yes 14 tensor 0.9 0.999 yes 15 scalar 0.85 0.999 yes 16 tensor 0.85 0.999 yes 17 scalar 0.95 0.999 yes 18 tensor 0.95 0.999 yes 19 scalar 0.9 0.99 no 20 tensor 0.9 0.99 no 21 scalar 0.85 0.99 no 22 tensor 0.85 0.99 no 23 scalar 0.95 0.99 no 24 tensor 0.95 0.99 no 25 scalar 0.9 0.95 no 26 tensor 0.9 0.95 no 27 scalar 0.85 0.95 no 28 tensor 0.85 0.95 no 29 scalar 0.95 0.95 no 30 tensor 0.95 0.95 no 31 scalar 0.9 0.999 no 32 tensor 0.9 0.999 no 33 scalar 0.85 0.999 no 34 tensor 0.85 0.999 no 35 scalar 0.95 0.999 no 36 tensor 0.95 0.999 no