1 Introduction
When designing urban spaces, there is a need to understand the wind environment before the buildings are built to ensure a comfortable environment for the inhabitants. This topic has received significant attention in wind engineering and the scientific literature. Today there are two main methods to analyze the wind environment; experiments in a wind tunnel and simulations using CFD [BLOCKEN201215, JANSSEN2013547, BLOCKEN2009255]. Performing experiments is considered the most accurate method to evaluate the wind environment. CFD simulations are increasingly being accepted as a viable alternative to experiments, thanks to extensive efforts in improving the methodology and validating results against wind tunnel experiments and fullscale measurements. However, even though simulations can be cheaper than performing experiments, it still represents a significant cost.
When designing a new building, it is desirable to see how small changes in the design change the wind flow patterns. Moving a structure closer to its neighbor or adjusting its height would require repeatedly rerunning the whole CFD simulation or multiple simulations from several wind directions. In the early stages of the design process, the an architect might accept slightly lower accuracy in the wind predictions if this means a faster design iteration time.
In that case, there might exist less timeconsuming methods within the field of machine learning, and in particular, deep learning methods that can potentially allow a more interactive evaluation of the wind environment.
There are several examples of the combination of CFD and deep learning in recent literature. CFDNet [2020cfdnet] introduces a physical simulation and deep learning coupled framework for accelerating RANS simulations by adding an iterative refinement stage consisting of a CNN inbetween the warmup and refinement stage of a physical solver. In this way, they significantly accelerate the convergence of the overall scheme. The model is tested on different geometries unseen during training to evaluate how well their method generalize. Experiments showed that the CFDNet still performed fine and was able to perform accurate predictions. This work indicates that combining physical models with datadriven machine learning models could be a promising approach for accelerating simulations. In addition, [Thuerey_2020] compare the accuracy of physical solvers with surrogate models using a modified version of the UNet architecture [ronneberger2015unet] trained in a supervised manner. In contrast to CFDNet, their method is an endtoend surrogate entirely driven by the neural network. Leaving out the physical constraints is something they do intentionally and instead choose to focus on working with stateoftheart CNN and executes detailed evaluations. One problem with models like these is that they can not guarantee that the predictions meet the necessary constraints of the traditional physicsbased algorithms. On the other side, this makes it a more generic approach applicable for various other equations beyond RANS. Similarly, [Bhatnagar_2019] shows how they input the SDF as input for a CNN autoencoder to train a surrogate model for CFD simulations around different shaped airfoils. SDF works efficiently with neural networks for shape learning and is widely used in applications as rendering and segmentation and extracting structural information of different geometries, essentially providing a universal representation of the different shapes. All the mentioned methods have in common that they have a multichannel output consisting of velocity and pressure.
In recent years, Generative Adversarial Networks [goodfellow2014generative] has been shown to be an efficient unsupervised method for learning the underlying distribution of a given dataset. Different extensions of the original architecture have been proposed. StyleGAN [stylegan], for example, is capable of learning to generate fake human faces indistinguishable from real faces. One could think that there exists a function that maps any geometrical shape to a version that exists its corresponding flow fields. The surrogate model ffsGAN presented in [supercriticalcfd]
is trying to do just that. They propose a model that leverages the property of cGANs combined with CNNs to directly establish a onetoone mapping from a given supercritical airfoil to its corresponding flow field structure. Unlike other methods that use an encoder to map the input to lowerdimensional space, they have a way to parameterize the airfoils as a 14dimensional vector. While their approach is promising, FlowGAN
[flowgan] shows how they customize UNet as the generator to include the Reynolds number and angle of attack. The flow parameters are concatenated with the geometry parameters extracted by the encoder of UNet before they are passed to an MLP network to perform a nonlinear inputoutput mapping. The output of the MLP network is what is being decoded by the generator network. In this way, they provide a method for generating solutions to flow fields in various conditions based on observations rather than retraining.Traditional CFD methods produce highaccuracy results, but they are computationally expensive and do not work well in the design process of new prototypes in a given domain. To obtain results, it often takes several hours or days, depending on the prototype’s complexity. Deep learning methods can help create an interactive tool for testing new designs, even when they are getting computationally hard for physical solvers. The experiments in [Guo2016ConvolutionalNN] demonstrate that CNN can work as a surrogate model for physical solver both when given discrete 2D and 3D bluff shapes. These experiments differ from other papers mentioned, as they are focusing more on the interactive application aspect for prototype design of different kinds of bluff shapes. In the 3D domain, [neurips_3d] proposes an architecture based on residual CNN for CFD prediction using 3D convolutions, enabling them to offer designers an interactive tool for prototyping. The dataset they use consists of various geometries representing samples of urban structures. In this way, they can create an interactive tool that could be used for city planners. A unique feature of their tool is that they offer a network trained in reverse, where you input the target wind flow, and it outputs the urban volumes that will produce it. One of their limitations is that only one direction of the wind flow is considered. Usually, we should consider multiple directions to create a representative forecast. Lastly, [regressioncfd] proposes a regression model using Gaussian Process to predict how fluid flows around threedimensional objects interactively. In general, it is challenging to handle detailed 3D shapes in a datadriven manner using machine learning approaches. It requires a consistent parameterization of the input and output of the model. To do this, they purpose PolyCube mapsbased parametrization that can be computed at high rates and allow their method to work efficiently even when doing interactive design and optimization during prototyping. More in detail, in our work, we formulate the wind flow prediction as an imagetoimage translation problem [isola2018imagetoimage], and we explore the potential of the most advanced GANbased architectures for such a task.
The main contributions of this work are as follows:

We rephrase the problem from computing 3D flow fields using computational fluid dynamics to a 2D imagetoimage translationbased problem on the building footprints to predict the flow field at pedestrian height level.

We propose stateoftheart GAN architectures for the imagetoimage translation process. The generator can generate realisticlooking wind flows assessment conditioned on a given geometry input of various amounts of buildings.

We perform a systematic comparison of the most advanced GANs experimenting on several new datasets of various bluffshaped buildings generated using computational fluid dynamics methods. Further, we also perform an experimental study on buildings with different heights and a systematic generalization experiment to optimize a model on one of the datasets and investigate how the model performs on the others.

We propose a novel extension of known imagetoimage translation methods where we inject different positional information into the architectures using the signed distance function and coordinates of the Cartesian space seen by the convolutional filters.

We conduct an ablation study, through experiments, to test the effect of different attention mechanisms for airflow prediction.

We optimize models for a real scenario, using buildings from a builtup city environment. This allows us to analyze the problem at a more applied and complex scale than previous work.
We organize the paper as follows. In section 2 we clarify the problem we are trying to solve while introducing the architectures’ implementation details and their corresponding objectives. In section 3 we introduce the dataset and metrics used for training and evaluating the models. Experimental results and discussion are given in section 4, and the conclusion is drawn in section 5.
2 Methods
This section clarifies the problem we are trying to solve while introducing the network architectures being compared and their corresponding objectives. We then introduce the two kinds of positional information we propose to add and describe the optimization and training details. Lastly, we examine spectral normalization and the attention mechanism.
2.1 Problem formulation
Given a building’s 3D geometry, we simplify this to a 2D image using grayscale to represent building height. With this, we can formalize the CFD prediction task to an imagetoimage translation problem as in [isola2018imagetoimage]. Given our two domains and , building geometry and CFD flow field respectively, we want to learn the mapping functions, , between the image pair , having and . This mapping is visualized in Figure 1. We will compare methods performing this translation using both cGANs and autoencoders. We denote our data as , where . To capture the conditional mapping between the image pairs, not only between the two domains, the GAN receives the building geometries as a condition for the CFD flow field to be generated, i.e., . As the mapping between geometry and CFD simulation is a onetoone mapping, we would like our model to be deterministic; therefore, we do not give our generators a random noise vector as proposed in [isola2018imagetoimage].
The generator is optimized to generate outputs that are indistinguishable from the real simulations together with an adversarially trained discriminator , which is trained to discriminate between real CFD simulation, , or from the generator, . The adversarial training procedure is shown in Figure 2.
We will investigate the datadriven CFD prediction by defining three main frameworks based on the most advanced state of the art methods in computer vision: 1) Pix2Pix architecture
[isola2018imagetoimage], 2) CycleGAN [zhu2020unpaired] and 3) UNetbased autoencoder [ronneberger2015unet].2.2 Network architectures
Generative modeling is an unsupervised learning task where the model learns the patterns in the input data to generate outputs that are similar to data from the original dataset it was trained on. Generative adversarial networks are an approach used for generative modeling using deep learning methods. What makes GANs special from other generative models is that they try to solve learning patterns in datasets in a unsupervised manner by having one part of the model generate the data and another part classifying it as real or fake. GANs were first introduced back in 2014 by Ian Goodfellow
[goodfellow2014generative], and one year later, Alec Radford introduced a more stable version using Deep Convolutional Generative Adversarial Networks [radford2016unsupervised], DCGANs. Models like these have advanced from generating lowquality greyscale images of faces to high resolution 1024 by 1024 images, nearly impossible to distinguish from real faces.A GAN (see Figure 3) consists of a generative network , that tries to capture the underlying data distribution of the dataset the network is trained on, and a discriminative model
that estimates the probability of a sample being from the real distribution than being output from
. The procedure for training a GAN corresponds to a minimax twoplayer game. Minimax is a term from game theory and forms a strategy for making decisions in a game where the players try to minimize the possible loss from a worst case scenario. In GANs, this principle is used in the network’s training procedure, where the generator’s task is to maximize the probability of D making a mistake. The value function for this game can be defined as:
(1) 
where is the the distribution of the real data set, while is the generated output from based on the input vector noise
. The vector is randomly drawn from a Gaussian distribution and is what makes
produce different outputs.2.2.1 Conditional GANs
In an unconditioned generative model, there is no control of what type of data is being generated. cGAN [mirza2014conditional] can be constructed by simply feeding the data we wish to condition on to both the generator, and the discriminator. Such conditioning could be based on class labels or any other auxiliary information. The additional information is combined with the prior noise vector when passed on to the generator and discriminator. This requires some modifications to the loss in Equation 1 where we need to include the condition as we do in Equation 2.
(2) 
Imagetoimage translation is a graphics problem where the goal is to learn a mapping between an input image and an output image using a dataset built up of pairs of images. This can be learned using cGAN, where the conditional information is the image you want to translate. For the generator to handle this, the image has to be encoded to a onedimensional vector, as the generator is expecting. To perform the image encoding, it is normal to use a CNN [isola2018imagetoimage], as we will see in the next section.
2.2.2 Pix2Pix
Inspired from cGANs [mirza2014conditional], Pix2Pix [isola2018imagetoimage] uses conditional adversarial networks as a generalpurpose solution to imagetoimage translation problems, where instead of conditioning on labels, it is conditioning on an input image and generates a corresponding output image.
One of the contributions of this work is to demonstrate the use of conditional GANs in various problems show the effectiveness of a proposed Pix2Pix based architecture for predicting the wind flow. The GAN is built up of a UNet [ronneberger2015unet] generator and a PatchGAN [isola2018imagetoimage] discriminator. The objective for training the GAN is based on the one presented in cGAN in combination with the L1 distance for regularization:
(3) 
where is the constant weight for the L1 distance term.
UNet [ronneberger2015unet] was first introduced for biomedical image segmentation (see Figure 4
). It has an autoencoder structure consisting of an encoder for contracting the input using convolutions and maxpooling and a decoder for expanding the encoded output using upsampling operators. To localize highresolution features from the input, features from the encoder are combined with the features during the decoder’s upsampling phase. This is called skipconnections and essentially concatenating the channels at a layer with the others. As a result of this, the decoder is more or less symmetric to the encoder, and thereby yields a ushaped architecture, hence the name UNet. A network like this, which includes the skipconnections, makes sense in an imagetoimage translation like this because it requires a lot of information flow through the layers, including the bottleneck between the encoder and decoder.
PatchGAN. PatchGAN is used as the discriminator of the network and only penalizes the structure of the images at patches’ scale. The discriminator effectively tries to classify if an patch of an image is real or fake  creating an output matrix consisting of probabilities of whether the patch is real or not. They show that can be much smaller than the image’s size and still produce highquality results. Smaller patches give fewer parameters, which then run faster, and they can be used on arbitrarily sized images making it a more general approach. The discriminator runs these patches convolutionally over the whole image, averaging all the responses to provide D’s best output.
2.2.3 CycleGAN
CycleGAN [zhu2020unpaired] presents an alternative way of learning such translations where you no longer need pairs of images and when training data are not paired. The goal is to learn the mapping functions between two domains X and Y, given samples from both. What makes CycleGAN different is that they include two mappings, G: and F: in comparison to Pix2Pix, which only includes one. They also introduce two discriminators, and
, one for each domain. For these models to work together, the loss function includes two terms: the adversarial losses for matching the distributions of the generated images close to the real distribution and a cycle consistency loss that prevents the mappings G and F from contradicting each other.
Architecture. For CycleGAN to translate between the two domains and , it uses two generators and two discriminators, one for each translation. The generator they use is similar to the one in Pix2Pix but appends several residual blocks [he2015deep]
between the encoding and decoding blocks. Residual blocks tackle the vanishing/exploding gradient problem to make the generator network even deeper. The residuals blocks are very similar to the skip connections in Unet, but instead of being concatenated as new channels before the convolution, which combines them, it is added directly to the convolution’s output. The approach for using this kind of generator was proposed in
[johnson2016perceptual]for neural style transfer and superresolution. As discriminators, the authors used PatchGAN
[isola2018imagetoimage] as introduced in Pix2pix. The full objective for training this architecture is defined as:(4) 
where is the leastsquares adversarial loss, is the cycle consistency loss, and controls the relative importance of the two objectives.
Least Squares Adversarial Loss. In the adversarial loss described earlier we showed the use negative log likelihood as the objective. CycleGAN replaces this by a leastsquares loss [mao2017squares] as it has shown to be more stable during training and generates higher quality results. The new adversarial loss function for the network is expressed as:
(5) 
Cycle Consistency Loss. Adversarial losses alone can not guarantee that the learned function can map an individual input to a desired output . CycleGAN argue that the learned mapping functions should be cycleconsistent as shown in Figure 5. The image translation cycle, , from Figure 5(b), shows that the cycle should be able to bring back and output as similar as possible to the original . This is what we call forward cycle consistency, while the cycle going from y is called backward cycle consistency. CycleGAN includes both of these, as the model consists of two generators. The cycle consistency loss uses the L1 loss as defined here:
(6) 
2.3 Positional information
We propose to augment the proposed architectures, by injecting positional information in form of extra channel. More in detail, as the filters in convolutions are equivariant, we investigate if adding positional embeddings with regard to the buildings would affect the methods’ performance. Below we define different positional information that will be used in our experiments.
2.3.1 SDF  Signed Distance Function
SDF is widely used for rendering and segmentation and works efficiently with neural networks for shape learning [Bhatnagar_2019]. [Guo2016ConvolutionalNN] reports the effectiveness of SDF in representing the geometry shapes for CNNs.
A mathematical formulation of the signed distance function of a set of points from the boundary of a set of objects .
(7) 
where denotes an object, and measures the shortest distance of each given point from the boundary points of the objects. The SDF will provide a measure of whether a point is inside or outside of an object, and how close it is to the closest object. Figure 6 illustrates the contour plot of the SDF for a geometry sampled from (see Dataset definition later). Visualization of the implemented SDFlayer can be found in Figure 7.
2.3.2 CoordConv
The second positional information are related to the coordinates of the Cartesian space.
Convolutions are widely used in modern deep learning architectures. One of its strengths is its property of translational invariance. Hence, regardless of where a feature is present in an image, the same filter can be applied. [liu2018coordconv] proposes an extension of the vanilla convolutions, allowing filters to know where they are in an image. This is achieved by adding two additional channels that contain coordinates of the Cartesian space seen by the convolutional filters. This extension is visualized in Figure 8. More precisely, the i coordinate channel is an rank1 matrix with its first row filled with 0’s, its second row with 1’s, its third with 2’s, etc. The j coordinate channel is similar, but with columns filled in with constant values instead of rows [liu2018coordconv]. The values are then normalized before concatenating the channels. [liu2018coordconv] propose using the CoordConv architecture for GANs by replacing the first convolutional layer in both the generator and discriminator. Similarly to their approach, we propose to add those two channels as input in the proposed imagetoimage translation frameworks.The results of these experiments are illustrated in subsection 4.3.
2.4 Spectral Normalization
A persisting challenge in the training of GANs is the performance control of the discriminator [miyato2018spectral]. One of the most significant challenges when training GANs is the lack of stability when updating the generator and discriminator’s weights. Spectral normalization is a weight normalization technique used to stabilize the discriminator’s training. It is computationally light, and it is easy to incorporate into other existing GAN architectures. Compared to other regularization techniques like weight normalization, weight clipping, and gradient penalty, spectral normalization has been shown to work better. Using spectral normalization, you control the Lipschitz constant, the maximum absolute value of the derivatives, of the discriminator. It does this by normalizing each network layer’s weights with the spectral norm . By doing this, the Lipschitz constant for the discriminator equals one as shown in Equation 8.
(8) 
A Lipschitz constant of one means that the maximum absolute value of the derivative must be one. Giving this property to the discriminator makes it more stable during the training of the whole GAN. By constraining the derivative, one makes sure that the discriminator is not learning too fast compared to the generator. You then avoid manually finding a proper balance for updating the two adversarial networks and essentially facilitate better feedback from the discriminator to the generator.
2.5 Attention
In the context of neural networks, attention is a technique that imitates how cognitive attention work, which is the process of concentrating on specific parts of information while ignoring less essential elements. We differentiate between soft and hard attention [showandtellattention]
, which is necessary to understand when optimizing attentionbased neural networks. When training a neural network, you want the model to be smooth and differentiable. Creating a differentiable model requires the weights of the attention layer to be given to all areas of an image, in contradiction to only selecting one patch of the image. Soft attention is when we assign weights to the parts of the whole picture. On the other side, hard attention only sets a patch at a time and can only be trained using methods like reinforcement learning as it is nondifferentiable. In this way, using soft attention, the network can essentially filter out the less essential parts of an image and focus its predictions on achieving detailed results in the more critical areas.
This paper focuses on different forms of soft attention used in CNNs to better attend to essential parts of the imagetoimage translation process.
2.5.1 Selfattention
The selfattention mechanism relates various positions of the input to compute an attentive representation of the same sequence. [zhang2019selfattention] proposes SAGAN, a GAN architecture that introduces a selfattention mechanism into convolutional GANs. The selfattention module is complementary to convolutions and helps with modeling longrange, multilevel dependencies across image regions. The selfattention feature maps are calculated from the image features of the previous layer. The feature maps are further multiplied with a learnable scale parameter. This learnable parameter allows the model to learn to assign more weight to the attention maps gradually. These maps are then added back to the initial input features maps to filter what sections are more important to attend.
2.5.2 Convolutional Block Attention Module
As we investigate the effects of attention for our problem we looked at another attention module called Convolutional Block Attention Module (CBAM) , proposed in [woo2018cbam]
, for feedforward convolutional neural networks. Given an intermediate feature map, the attentonal module sequentially infers attention maps along two separate dimensions, channel and spatial. The attention maps (
) are then multiplied to the input feature map () for adaptive feature refinement, summarized as:(9) 
where denotes elementwise multiplication and is the final refined output.
3 Experimental Setup
In this section, we will present our experiments. First, we will describe our dataset and how it was created. Second, we will give our experimental plan and evaluation metrics used to measure the effectiveness of our proposed framework for wind flow prediction. Finally, we define all optimization and training details.
3.1 Datasets
To explore our proposed prediction architecture’s generality for CFD airflow simulations, we test the method on various datasets with different complexities. The datasets used consists of image pairs of building geometries and CFD simulations. The 3D problem of simulating flow fields for 3dimensional buildings are translated to a 2D problem, where we see the buildings from above. More specifically, the CFD simulation images show the magnitude value of each cell’s velocity vector in the flow field. Each datasets can then be formalized as , having , and where and . To simplify the problem, we bucketize the CFD result into different velocity values. The following list details all datasets used in our experiments:

Wall  : The dataset contains geometries of walls, with respective CFD simulations. The geometries have different centeroffsets , in addition to a angle in relation to the wind inlet direction. The dimensions of our input and output are .

Single building  : The dataset contains geometries of single buildings, with a fixed height. The geometries have the same parameterizations as , in addition to varying length and width. The input and output dimensions are equal to .

Two buildings  : The dataset contains geometries of two buildings, with a fixed height. Each building , in each geometry, have different center offsets , while having the same angle to the wind direction. This positioning is due to nearby buildings often is placed symmetrically. The height and width are also varying between the buildings in the same geometry. The input and output dimensions are equal to and .

Two buildings with varying height  : The dataset contains geometries with the same parameterizations as , except the height of the buildings are provided as an additional channel. As we now have two scales, height, and magnitude of the airflow velocity, we need to distinguish between them. Therefore, we input the geometries with an additional channel for building height and single dimension output, only caring about the velocity magnitude, which is the desired target. The input and output dimensions are and , respectively. See Figure 10 for visualization of construction of model input.

Real urban city environment  : The dataset contains 287 geometries from the city center of Oslo, Norway. Compared to the geometries in , these geometries are more complex in shape and allows single buildings to have multiple height values. The dataset was generated by performing simulations on 600 patches of Oslo and using a 300 cropped centered circle for the training data. The resulting simulations’ flow fields have a diameter of 300 meters, with a maximum building height of 130 meters, and contains wind velocities up to 15 m/s. They represent actual buildings from the more urban and builtup areas of Norway. The flow field is encapsulated as a circle to allow rotation of geometries and calculation of comfort maps. The geometries are represented equivalent to , having the same input and output dimensions.
Examples from each dataset are visualized in Figure 9.
3.2 Generation of training data
The training data for the learning is generated using CFD simulations. The simulations for the single building and two buildings are performed in the commercial CFD software Simcenter STARCCM+ from Siemens PLM Software. The urban city environment is simulated using the opensource software OpenFOAM v7.
The chosen model solves the incompressible,
threedimensional, steady NavierStokes equations governing fluid flow, using
the finite volume method on an unstructured grid. An example of the computational grid used for the simulations is shown in Figure 11.
The simulation setup is based on best practice guidelines for CFD simulations of urban flows [franke2007cost, tominaga2008aij]. The turbulence model used is the realizable kepsilon model.
For the simulations of single and dual buildings, a geometry model with one or two buildings is automatically generated with varying dimensions, origin and orientation. The full 3D velocity field is then obtained from the CFD simulation, and the velocity magnitude in a slice above the ground is extracted for the training data. For the urban city environment, more information on the OpenFOAM simulation setup can be found in [hagbo].
3.3 Experimental plan
To investigate the models’ generality for CFD prediction, we evaluate the method on various datasets with different complexity, listed above. All experiments are performed using both Pix2Pix, CycleGAN, and UNet, training for a total of 70 epochs using
of .3.3.1 Experiments
We perform experiments on all datasets,
, separately. Qualitative results are shown in Figures [12, 13, 15, 16, 17], while quantitative measurements are listed in Tables [1, 2, 3, 4, 5, 6, 7, 8]. Training time for the proposed method on each dataset are listed in Table 2. On all our datasets, training can be very fast. For example, the results shown in Figure 12 took less than 2 hours, per model, of training on a Nvidia Tesla V100 32 GB. At inference time, the models performs a forward pass in well under a second.
3.3.2 Investigating generalization
To investigate how well the models generalize the mapping function of building geometry to the CFD flow field, we perform experiments on training on more complex data and evaluating on a simpler one, and vice versa. Furthermore, this experimental design was employed because we want to examine whether the model can detect multiple buildings even though it has only been trained on single buildings. Occlusion of buildings would be a scenario we would want to investigate here. Besides, we want to see if a model trained on two buildings can generalize the mapping function well enough to predict single buildings’ airflow. We have executed the two following experiments;
(a) Optimizing model on single buildings (), evaluating on two buildings (). This will allow us to see if the model is able to generalize the domain transfer function to a more complex scenario.
(b) Training on two buildings (), evaluating on single building (). This experiment will measure how well the method can generalize to a simpler task, a single building CFD simulation.
3.3.3 Stabilizing GANs
As mentioned in subsection 2.4, one of the most significant challenges of training GANs is the lack of stability when updating the generator and discriminator’s weights. Spectral normalization is used to maintain this stability. By constraining the Lipschitz constant of the discriminator to be less than one, the training process should be more stable. Considering the above, we would like to explore what effect spectral normalization has on Pix2Pix and investigate the consequences of keeping the generator and discriminator at a similar skill level throughout training.
Spectral normalization is applied to each layer of the discriminator, and the implementation details are as thoroughly described in [miyato2018spectral].
3.3.4 SDF and CoordConv
In subsection 2.3 we propose to augment the introduced architectures by injecting positional information through extra channels. This information could help the network determine what parts of the input are essential to predict more accurately. Therefore, we would like to perform quantitative analysis to explore the effect of SDF and CoordConv on three different neural networks  Pix2Pix, CycleGAN, and UNet.
3.3.5 Attention
Suppose a model is attending to more critical parts of an input image. In that case, it strongly suggests that the attention mechanism forces the model to focus more on the details in these specific regions. Given our problem, essential areas could be the wake area, immediately behind a building, or turbulent flows around building corners. Based on this, we want to explore selfattention and CBAM, two attention mechanisms, as described in subsection 2.5. The experiments will conduct an ablation study of the attention mechanisms in Pix2Pix’s generator and discriminator.
The implementation details for selfattention and CBAM are described in [zhang2019selfattention, woo2018cbam]. For both attention mechanisms, attention is implemented in the deconvolutional blocks of the UNet generator and is not present during the downsampling. On the other hand, the discriminator only applies attention in the 2nd and 3rd convolutional blocks of the network.
3.3.6 Pedestrian comfort in urban areas
Compared to real case scenarios from an urban city environment, the datasets  could be considered simple. Hence, as our last experiment, we would like to explore how our models perform on a much more complex dataset, . The dataset is, as mentioned earlier, generated from parts of the most builtup areas in Oslo. Therefore, it provides a more realistic scenario for designing an interactive tool for prototyping and city planning.
3.4 Evaluation Details
For evaluations done on dataset , we use a test set containing 20% of the images for the final assessment. We include a set of metrics for evaluating our models’ predictions against the physics solver solution. We denote , and represents the ith pixel intensity in the target simulation and the predicted flow field.
MAE The MAE is calculated for all predicted images produced by the models. The metric is widely used to quantify the difference between predicted and true values in accuracy validation for a model. The lower the MAE score, the better the model is at recreating the corresponding flow field, given a building geometry. The metric is used in [supercriticalcfd] for evaluating CFD airflow predictions. The metric can be formalized as
(10) 
RMSE We calculate the RMSE of all predicted values from the real simulated flow fields. This score provides us a squared measure of how well our model can recreate each pixel value. RMSE has the benefit of penalizing large errors more than smaller errors. For example, an error of 10 would be penalized more than twice as much as a residual of 5. The metric is widely used and used to quantify CFD prediction errors in [wang2020physicsinformed]. We can mathematically formalize this metric as
(11) 
MRE This metric allows us to quantify the relative error of velocity magnitude between a model’s predictions and the physics solver solution of the flow field. The metric is scale and rangeinvariant. Therefore, it can be seen as a better indicator of the quality of a prediction. The metric is commonly used to quantify relative residuals in accuracy validation, and is used by [2020cfdnet, Guo2016ConvolutionalNN, Thuerey_2020, flowgan, Bhatnagar_2019] and can be found in the CFD literature [cfdliterature]. MRE is defined as
(12) 
These three metrics are the evaluation metrics we apply to our predicted wind flows. To further analyze where our predictions have the most significant errors, we take random samples from the test set and investigate the absolute pixel difference between the simulations and predictions.
3.4.1 Models
We compare three stateoftheart models for imagetoimage translation on CFD airflow prediction.

Pix2Pix [isola2018imagetoimage]. For each iteration, we update both the generator and the discriminator. The generator used has a UNet architecture [ronneberger2015unet], with downsamplings in the UNet, resulting in an output resolution of . For our problem, the input and output might differ in surface appearance, but both consist of the same underlying structure. Therefore, we may say that their structure is roughly aligned. The generator is designed around these considerations, having skipconnections between each down and upsampling layer. This way, we circumvent the bottom bottleneck layer; lowlevel information is passed with the aforementioned skip connections. Each skip connection concatenates all channels at layer with those at layer , having
as the total number of layers. We use the leaky rectified linear unit (LeakyReLU) and instance norm and rectified linear unit (ReLU) in each upsample block in each UNet downsample block. Using instance normalization has been demonstrated to be effective at image generation tasks
[ulyanov2017instance]. To reduce the chance of overfitting, the implementation also uses dropout [stava2014dropout]. The generator itself has parameters. The discriminator consists of a PatchGAN, which tries to classify each patch in the inputimage as real or fake. It consists of 5 convolutional layers, interleaved by leaky ReLU activations and instance normalization. This discriminator is applied convolutionally across the image, averaging all responses to provide the final output of . The discriminator has trainable parameters, giving us a total of parameters for the proposed network architecture. 
CycleGAN [zhu2020unpaired] consists of two generators with nine residual blocks and two deconvolutional layers, intertwined by ReLU, dropout, and instance normalization. The model has two discriminators, each with five convolutional layers with leaky ReLU and instance normalization. Each generator has a total of parameters, and each discriminator has parameters, which yields a total of parameters. The model is optimized using subsubsection 2.2.3 and subsubsection 2.2.3. Details about architecture and optimization are detailed in subsubsection 2.2.3.

UNet [ronneberger2015unet]: An autoencoderlike architecture for image translation. The architecture has the same specifications as described for the UNet generator in model (1). The autoencoder has a total of trainable parameters, as it is identical to the generator of the Pix2Pix model. It is optimized using L1 distance to the target simulation.
The UNet architecture used in models (1) and (3) is described in detail in subsubsection 2.2.2.
3.5 Optimization and training details
All model are optimized with the Adam solver [kingma2017adam] with a batch size of 1, a learning rate of , and the momentum parameters , . We keep the same learning rate for the first 50 epochs and linearly decay the rate to zero over the next 20 epochs.
While optimizing our networks, we update both the generator and discriminator for each training iteration. The Pix2pix model is optimized end to end with respect to the objective in Equation 3 with , CycleGAN with respect to subsubsection 2.2.3 with , and UNet with the L1 loss. With CycleGAN, we also use a pool size of 50. As suggested in [isola2018imagetoimage], we divide the discriminator loss in half to slow down the learning rate relative to G.
All models are trained on an Nvidia Tesla V100 GPU 32 GB using NTNUs computing cluster IDUN [idun]. See Table 2 for each baseline’s training time.
At inference time, we use the generator, in evaluation mode, without dropout and instance normalization, as opposed to [isola2018imagetoimage]. For our given problem, we want the model to be deterministic concerning the conditional output.
4 Results & Discussion
This section will present our results and discussion. Firstly, we will analyze the overall results for the different architectures on the datasets. Then, we will explore the impact of adding spectral normalization, attention and additional positional information to the input data. Finally, we evaluate our models on a more complex dataset generated from actual buildings in the city of Oslo and discuss how our networks would work in an interactive tool for wind flow predictions.
4.1 Neural networks for wind flow prediction
Table 1 compares Pix2Pix, CycleGAN and UNet in terms of MAE, RMSE and MRE. We visualize randomly selected predictions from the test set of in Figure 12. For more samples from , see Figures 21, 22 and 23, in [.
To quantify our findings, we have listed all metrics for all models, evaluated on , in Table 1. We see that for all datasets, UNet yields lower residuals than its opponents Pix2Pix and CycleGAN. More qualitatively, in terms of MRE for and , Pix2Pix performs and worse than UNet, respectively. The CycleGAN model performs over worse than UNet on . We suspect a reason for this might be that the UNet model is only optimized using L1loss. As we can see in Figure 12 its predictions are more continuous than Pix2Pix and CycleGAN; hence its residuals would be lower than if it would enforce only the 20 possible velocity values present in the actual simulation. Also, we see that Pix2Pix outperforms the CycleGAN architecture on this task. This difference could be due to CycleGANs additional objectives of the additional mapping between prediction and geometry and a cycle consistency loss. These objectives are not necessary for the given task of CFD prediction and could be why it performs worse than the other models.
Model architecture  
Dataset  Metric  Pix2Pix  CycleGAN  UNet 
MAE  0.0139 .0004  0.1651 .0151  0.0090 .0001  
RMSE  0.0329 .0009  0.2642 .0105  0.0290 .0005  
MRE  0.1261 .0002  1.1806 .1452  0.0841 .0006  
MAE  0.0482 .0014  0.0944 .0253  0.0345 .0003  
RMSE  0.0828 .0014  0.1795 .0773  0.0701 .0007  
MRE  0.2553 .0080  0.4785 .0831  0.1941 .0010  
MAE  0.0554 .0009  0.1022 .0048  0.0438 .0002  
RMSE  0.0971 .0013  0.1678 .0041  0.0847 .0003  
MRE  0.2889 .0053  0.5612 .0260  0.2400 .0017 
To investigate where our models’ predicted image performs best, we illustrate in Figure 13 the absolute difference between airflow simulation and the predicted flow fields. The mean absolute residuals are , , for Pix2Pix, CycleGAN, and UNet, respectively. We see that CycleGAN performs much worse than the other two models throughout the entire flow field. To compare Pix2Pix and UNet, we see that UNet has a smaller average residual. Also, we see that Pix2Pix has residuals around where the airflow velocity magnitude changes bin value, which is less present for the UNet as its prediction seems to be more continuous than the GANarchitectures’ predictions.
Model  Time per epoch  # of parameters (M) 
Pix2Pix  60 sec  57.1 
CycleGAN  236 sec  28.3 
UNet  75 sec  54.4 
All models generate predictions in well under a second at inference time, but training time and model sizes vary. In Table 2 we see that Pix2Pix and UNet are both similar in training time and model size, while the CycleGAN consisting of two generators and discriminators have a longer training time, of 236 seconds per epoch, but is again smaller in size. The size difference also might be an explanation of why the models perform differently. Both the Pix2Pix and UNet architectures rely heavily on the UNet architecture, with skipconnections to keep lowlevel information between the down and upsampling layers. These skipconnections are not present in the CycleGAN and might be a factor in this model’s reduced performance.
We see that during training, for the validation set, Pix2Pix and UNet slowly converge to an MAE and MRE close to each other, while CycleGAN’s residuals are more fluctuant, variant, and less stable throughout the training. For the training loss, see Figure 25, we see that the generator and L1 losses converge rather quickly, while the discriminator losses are lessened throughout the whole training period and yield a high discriminator accuracy.
4.2 The impact of Spectral Normalization
As described in subsection 2.4
, a persisting challenge in the training of GANs is the performance control of the discriminator [24]. In the initial phase of hyperparameter tuning, we saw it was hard to find a good ratio between updating the generator and the discriminator. As a result, our discriminator could almost perfectly distinguish the target model distribution early in the training process, which essentially stopped the GAN from learning more. GANs can use spectral normalization to handle problems like this, and in
Table 3, we compare the results when training a Pix2Pix model with and without spectral normalization. In the model utilizing spectral normalization, the normalization technique is applied at every layer in the discriminator. From the result table, we see a relatively significant decrease in overall metrics when evaluated on . On , which is the most complex of them, we see a 16%, 9%, and 10% drop in error for MAE, RMSE, and MRE, respectively.Dataset  Metric  Pix2Pix  Pix2Pix w/SN  Improvment (%) 
MAE  0.0139 .0004  0.0099 .0005  28.78 5.5  
RMSE  0.0329 .0009  0.0256 .0015  22.19 7.6  
MRE  0.1261 .0002  0.0915 .0040  27.44 3.3  
MAE  0.0482 .0014  0.0384 .0007  20.33 3.7  
RMSE  0.0828 .0014  0.0735 .0009  11.23 2.5  
MRE  0.2553 .0080  0.2178 .0028  14.69 3.7  
MAE  0.0554 .0009  0.0464 .0004  16.25 2.1  
RMSE  0.0971 .0013  0.0882 .0004  9.17 1.6  
MRE  0.2889 .0053  0.2578 .0033  10.76 2.7 
Pix2Pix uses PatchGAN as a discriminator, as explained in subsubsection 2.2.2. The discriminator predicts for each patch if it thinks the patch is real or fake. To calculate the accuracy of the PatchGAN, we calculate the average prediction of all patches. When the average is over 0.5, we consider it as predicting the image to be real. The two graphs in Figure 14 visualize the discriminator’s accuracy during training. This metric could be a good indication of how good the discriminator is compared to the generator. As we know, a discriminator who perfectly distinguishes the target model distribution is not learning anything new. Comparing the two graphs, we observe that the accuracy of the Pix2Pix model with spectral normalization holds a lower accuracy during training than the one without spectral normalization. We believe this gives better feedback for the generator and that this is the reason for the sudden drop in error overall metrics as shown in Table 3.
4.3 The impact of SDF and CoordConv
To investigate the effect of SDF and CoordConv, both providing positional input to the model. We evaluated the different models on with these features implemented both separately and combined. In Table 4 we see that the results are quite close to each other; however, the models having an additional channel with normalized SDF for the geometry performs better than the vanilla architecture.
Using CoordConv for the first convolutional layer in both the generator and discriminator also yields a lower residual average than the vanilla model. When combining both SDF and CoordConv, we see that the residuals yielded are lower than applied separately. More qualitatively, if we investigate the yielded MAEs, we see that the positional information injected results in a significant performance improvement. For CycleGAN, we see an MAE reduction of for SDF and for CoordConv; combining them both, we see a gain of . This trend follows for both Pix2Pix and UNet, and by injecting the two types of positional information, we see an improvement of and , respectively.
We can conclude that these methods positively affect the architectures in predicting CFD airflow velocities and that positional information may be helpful for RANS prediction.
Positional information  
Model  Metric  None  SDF  CoordConv  SDF & CoordConv 
Pix2Pix  MAE  0.0554 .0009  0.0522 .0011  0.0524 .0001  0.0514 .0002 
RMSE  0.0971 .0013  0.0928 .0009  0.0949 .0004  0.0918 .0006  
MRE  0.2889 .0053  0.2719 .0048  0.2799 .0010  0.2709 .0029  
Pix2Pix w/SN  MAE  0.0464 .0004  0.0449 .0003  0.0458 .0001  0.0451 .0008 
RMSE  0.0882 .0004  0.0870 .0004  0.0883 .0001  0.0868 .0010  
MRE  0.2578 .0033  0.2512 .0032  0.2604 .0027  0.2524 .0046  
CycleGAN  MAE  0.1022 .0048  0.0957 .0240  0.0856 .0108  0.0800 .0263 
RMSE  0.1678 .0041  0.1560 .0289  0.1491 .0163  0.1430 .0409  
MRE  0.5612 .0260  0.5528 .1374  0.4822 .0563  0.4581 .1415  
UNet  MAE  0.0438 .0002  0.0427 .0002  0.0440 .0004  0.0422 .0003 
RMSE  0.0847 .0003  0.0829 .0004  0.0850 .0006  0.0821 .0004  
MRE  0.2400 .0017  0.2322 .0024  0.2434 .0034  0.2298 .0036 
4.4 The influence of attention
We have shown that both spectral normalization and embedding positional information positively affect our models in earlier sections. Hence, we bring these features with us further in our analysis when we experiment with attention in this section. The results when applying selfattention and CBAM are presented in Table 5. Here we test with both attention in the generator and the discriminator, as well as separately. We also perform one additional experiment for each of the methods. In this experiment, we take the best result for each of the attention mechanisms and embed the positional information. We embed both CoordConv and SDF as this has shown to give the best results in earlier experiments. All the experiments are executed using Pix2Pix with spectral normalization on .
From the results in Table 5 we see that adding attention in the discriminator only has a negative effect both when using selfattention and CBAM. It is hard to say why the attention maps seem to disturb the discriminator in evaluating the predictions. Still, one could argue that all information given to the discriminator is essential, and trying to filter out the less critical parts work against its purpose in this case. On the contrary, attention in the generator seems to be working better and gives us similar results to what we got earlier when embedding the positional information. We then tried to combine attention in the generator and discriminator. This combination did not positively affect the predictions, making sense based on how the attention mechanism affected the discriminator. From this, we concluded that adding the attention mechanism to just the generator was the best way.
For the final experiment with attention, we embedded positional information together with the attention mechanisms. The addition of positional information did also have a positive effect when combined with attention. If we compare the two attention mechanisms, we see that the best results are given when using CBAM, showing a lower error on all metrics and being more stable during training. Given that we embedded the positional information in different input channels, we believe CBAM did the best because it infers the attention maps along two separate dimensions, both channel and spatialwise.
When we compare these results with the ones previously given on the same dataset without attention in Table 4 we see a slight drop in error. More specifically, a 3.4% decrease in MAE, a 2.0% reduction in RMSE, and a 2.1% decrease in MRE, when applying the CBAM attention mechanism in addition to spectral normalization, CoordConv, and SDF on the Pix2Pix model, when comparing to the same model without CBAM. We conclude that adding attention could help the model make better predictions for simple geometries such as .
Pix2Pix w/ SN  MAE  RMSE  MRE 
0.0575 .0056  0.0998 .0084  0.3516 .0501  
0.0464 .0019  0.0887 .0018  0.2568 .0103  
0.0587 .0083  0.1028 .0106  0.3660 .0773  
, Coord & SDF  0.0446 .0017  0.0873 .0022  0.2518 .0076 
0.0551 .0018  0.0984 .0022  0.3053 .0134  
0.0468 .0008  0.0872 .0010  0.2614 .0029  
0.0527 .0013  0.0954 .0011  0.2931 .0099  
, Coord & SDF  0.0436 .0006  0.0851 .0006  0.2472 .0023 
4.5 Experiment on buildings with varying height
To evaluate how predicting CFD would perform for buildings with varying heights, we have experimented with the different proposed architectures on the dataset. The resulting metrics, describing the predictionsimulation residuals, can be found in Table 6. As we can see, the models and the input sequence can be described to enable the model to generalize the target mapping between geometry and flow fields for buildings with varying heights. If we compare the results in Table 6 and Table 1, we see that the metrics are quite similar to the experiments on . All models can yield satisfactory residuals, leading us to believe that the models can generalize the target mapping for a more complex dataset with buildings of different heights.
MAE  RMSE  MRE  
Pix2Pix  0.0450 .0005  0.0821 .0006  0.1045 .0014 
CycleGAN  0.0813 .0045  0.1329 .0067  0.2105 .0195 
UNet  0.0390 .0001  0.0760 .0003  0.0922 .0006 
Looking at Figure 16, we see that the models can capture the different building heights. Furthermore, we see that the buildings’ velocity is more significant for the taller buildings, which is what one would expect.
4.6 Generalization between data of varying complexity
We wanted to investigate how well our models can generalize predicting airflow velocities around building geometries of different complexity from what the models have been trained. Samples from the generalization experiment are found in Figure 17. We can see that both the Pix2Pix and UNet models can somewhat predict the airflow for single buildings, even though they have been trained on two buildings. The results in Table 7
show the mean absolute error and standard deviation from test sets of 20 geometry simulation pairs. In
Table 7 we see, for the experiment (b), that UNet performs marginally better than Pix2Pix for this generalization task, while CycleGAN yields the highest residual.Pix2Pix  CycleGAN  UNet  
0.1093 .031  0.1137 .032  0.1115 .032  

0.0897 .026  0.1338 .009  0.0861 .026 
For the other task (a), i.e., predicting airflow for two buildings while optimized on single buildings, we see that Pix2Pix has a smaller absolute residual than CycleGAN and UNet. For this task, CycleGAN can perform well and has a marginally lower absolute difference than UNet.
4.7 Interactive tool for wind flow assessment in urban areas
We perform experiments on the more complex dataset , on the Pix2Pix model. We want to determine whether or not a neural networkbased architecture, trained in a datadriven manner, is capable of generating accurate enough wind flow predictions for an interactive tool for city planning. As introduced, was generated by performing simulations on multiple 600 patches of Oslo, Norway, keeping a 300 centered crop. In addition to fewer examples, this problem presents a much more realistic scenario and increases the complexity drastically from .
Table 8 displays the results from our experiments on . We observe that the injection of positional information does not impact the results as significant as before. There is no clear pattern showing that adding the positional information improves the predictions for the more complex scenario. If there is an improvement, it is not that significant. The conditional input contains drastically more buildingarea, which are both more complex and scattered than before. This strongly suggests that the positional information could be less informative than before.
Additionally, we have done experiments with both attention and spectral normalization. As before, we see an improvement when applying spectral normalization to the discriminator, which is expected to benefit from stable training. When the attention mechanism is applied, we observe a slight increase in error. This decline in performance could indicate that attention does not necessarily facilitate or improve the model when the building geometries get more complicated. When comparing the most accurate Pix2Pix model with UNet, we observe that Pix2Pix is unable to yield lower residuals than UNet. While the difference in performance might be insignificant, the training process for UNet is more straightforward than the training cycle for GANs, with fewer parameters as it does not include a discriminator. This could suggest that the GAN architecture might not necessarily be the best architecture for this problem.
One of the most crucial things when building an interactive tool is that the predictions are fast and accurate enough. A benefit of using neural networks for this is that pretrained models can be saved, loaded, and served on a server and produce predictions in a matter of seconds.
A pedestrian wind comfort map illustrates the pedestrians’ annual wind conditions at ground level at a site; see Figure 18 for an example. To produce these maps, you combine wind simulations, statistical weather data, and a set of defined comfort criteria [teigen2020influence]. The comfort map shown in Figure 18, is computed by producing wind predictions from eight different wind inlet directions. Each pixel is then categorized using weather data for the specific location in the form of a wind rose and some comfort criteria. As in Lawsons wind comfort criteria [windcriteria], we have five classes  sitting, standing, strolling, business walking, and uncomfortable. Each class has its wind speed range and is represented in Figure 18. As comfort maps need wind velocities for at least eight different wind inlet directions, sometimes up to 36, it supports our claim that the tool would gain from fast predicting models.
Figure 19 presents the wind rose used in the production of the maps in Figures 18 and 24. Figure 24 displays examples from
with predictions and comfort maps. The wind rose is calculated by interpolating historical hourly wind statistics from the last five years from nearby areas, given a flow field’s geographical coordinate, but have been altered to have stronger winds. This is done to enable the usage of all comfort classes, allowing comparison of the maps. The wind rose is divided into eight wind directions and displays the frequencies of different wind velocities from all directions.
The cloud solution shown in Figure 20 predicts comfort maps through an interactive map. The tool performs eight predictions for the selected area — one prediction for each of the eight wind directions in the windrose above. To predict the wind flow from different wind directions, we rotate the conditioning input, as the models are trained on simulations where the windinlet always comes from the left. While the application is in an early stage, it has demonstrated substantial promise for an interactive tool capable of delivering accurate predictions for CFD analysis in an urban city environment.
Positional information  
Model  Metric  None  SDF  CoordConv  SDF & CoordConv 
Pix2Pix  MAE  0.0732 .0004  0.0749 .0019  0.0732 .0006  0.0746 .0009 
RMSE  0.1389 .0007  0.1412 .0053  0.1380 .0015  0.1405 .0031  
MRE  0.1280 .0011  0.1316 .0018  0.1291 .0019  0.1317 .0011  
Pix2Pix w/SN  MAE  0.0692 .0006  0.0707 .0002  0.0691 .0005  0.0708 .0004 
RMSE  0.1304 .0010  0.1344 .0009  0.1304 .0006  0.1343 .0019  
MRE  0.1224 .0011  0.1260 .0005  0.1228 .0016  0.1266 .0008  
Pix2Pix w/SN & CBAM  MAE  0.0727 .0011  0.0777 .0066  0.0717 .0005  0.0771 .0019 
RMSE  0.1353 .0011  0.1425 .0118  0.1333 .0005  0.1396 .0042  
MRE  0.1298 .0022  0.1428 .0142  0.1284 .0006  0.1442 .0042  
CycleGAN  MAE  0.2746 .2657  0.0962 .0080  0.6507 .0444  0.1101 .0110 
RMSE  0.4127 .3762  0.1796 .0177  0.9720 .0430  0.2027 .0148  
MRE  0.4146 .3016  0.1724 .0157  0.8415 .0697  0.2069 .0237  
UNet  MAE  0.0673 .0002  0.0682 .0004  0.0669 .0002  0.0679 .0006 
RMSE  0.1272 .0001  0.1290 .0008  0.1268 .0001  0.1287 .0007  
MRE  0.1200 .0003  0.1241 .0014  0.1202 .0007  0.1237 .0021 
5 Conclusion
We investigated different adversarial networks, Pix2Pix and CycleGAN, along with a UNet autoencoder to perform imagetoimage translation between conditional geometries of buildings to their corresponding wind flows. The presented results show that the models can produce realistic outputs conditioning on the input for all the different datasets. Also, the models made predictions in a significantly shorter time than traditional CFD methods. Furthermore, our experimental study of injecting positional information about the buildings to the model showed that SDF and CoordConv can help the network make accurate predictions. More precisely, by combining both, we got a performance improvement of , and , for Pix2Pix, CycleGAN, and UNet on . Observing this, we can conclude that the injection of positional information can benefit the airflow prediction task. Our results have also demonstrated a 10% and 4% drop in MRE on and the most complex dataset , respectively, when applying spectral normalization to stabilize training. Moreover, models implementing attention scored better than the ones without it.
We cannot conclude that GAN are better fitted for this domain than other kinds of neural networks as we saw promising results when experimenting with UNet. While the performances are almost equivalent, the training process for GANs are more complex than the one of UNet as it involves a second network. This fact could suggest that the GAN might not necessarily be the bestfitted architecture for this problem.
It is hard to say whether or not the model can learn the underlying Reynoldsaverage NavierStokes equations. Still, by looking at the absolute difference plots, we can observe that areas downstream of the buildings are the areas with the highest error. Errors in these areas indicate that our models might not be accurate for giving final simulations in the most critical areas. However, our experiments on the urban city environment showed that we could use a GAN as the underlying model for an interactive design tool. We consider the results accurate enough, especially when the goal is to produce comfort maps that classify the velocities as in Lawsons wind comfort criteria, which are more or equally accurate than airflow velocity predictions due to scaling, averaging and binning of the velocities.
5.1 Further Work
CFD lets us solve the governing equations for fluid dynamics for complex engineering problems. CFD is today used in a wide range of industries; some examples are air resistance for airplanes and cars, wind and wave loads on buildings and marine structures, and heat and mass transfer in chemical processing plants. These simulations can provide a detailed understanding of the fluid flow, but the simulations are complex and computationally costly. This complexity issue currently makes processes like generative design and optimization complex and interactive design impossible. This thesis rephrased the problem from computing 3D flows fields using CFD to a 2D imagetoimage translationbased problem. Another approach is to compute the aerodynamic forces on a given geometry. These 3D geometries are often fed into the CFD software via a surface triangulation encoded in .STL or .OBJ file formats. These file formats are supported by many software packages and are widely used for rapid prototyping. An approach like this would require some modifications to the underlying models performing the wind predictions.
Since the simpler UNet model trained in a supervised way scored better on several of the metrics listed, further work should look into other architectures in addition to GANs. An architecture that has grown in popularity in the last couple of years is GNN. Deepmind showed in some of their latest work how they learn to simulate complex physics with graph networks [sanchezgonzalez2020learning] in various physical domains like fluid dynamics. Incorporating more of the physical equations into the methods could help optimize the deep learning model, verify the results, and perform uncertainty estimation of the generated output.
Furthermore, solving more complex inputs like whole cities, similar to , probably requires a different approach than conditioning on the entire geometry at once. One opportunity could be to iterate over the prediction area in a more hierarchical way, where the geometries condition on slices earlier in the flow field of the city.
Comments
There are no comments yet.