1 Introduction
Machine learning techniques have become pervasive in recent years due to numerous algorithmic advances and the accessibility of computational power. Accordingly, they have been adopted for many applications in graphics, such as generating terrains [(GDG17)], highresolution faces synthesis [(KALL17)] and cloud rendering [(KMM17)]. In fluid simulation, machine learning techniques have been used to replace [(LJS15)], speed up [(TSSP17)] or enhance existing solvers [(XFCT18)].
Given the amount of available fluid simulation data, datadriven approaches have emerged as attractive solutions. Subspace solvers [(TLP06)], fluid resimulators [(KD13)] and basis compressors [(JSK16)]
are examples of recent efforts in this direction. However, these methods usually represent fluids using linear basis functions, e.g., constructed via Singular Value Decomposition (SVD), which are less efficient than their nonlinear counterparts. In this sense, deep generative models implemented by convolutional neural networks (CNNs) show promise for representing data in reduced dimensions due to their capability to tailor nonlinear functions to input data.
In this paper, we propose the first generative neural network that fully constructs dynamic Eulerian fluid simulation velocities from a set of reduced parameters. Given a set of discrete, parameterizable simulation examples, our deep learning architecture generates velocity fields that are incompressible by construction. In contrast to previous subspace methods [(KD13)], our network achieves a wide variety of fluid behaviors, ranging from turbulent smoke to gooey liquids (Figure 1).
The Deep Fluids CNN enhances the state of the art of reducedorder methods (ROMs) in four ways: efficient evaluation time, a natural nonlinear representation for interpolation, data compression capability and a novel approach for latent space simulations. Our CNN can generate a full velocity field in constant time, contrasting with previous approaches which are only efficient for sparse reconstructions [(TLP06)]. Thanks to its 700 speedup compared to regular simulations, our approach is particularly suitable for animating physical phenomena in realtime applications such as games, VR and surgery simulators.
Our method is not only capable of accurately and efficiently recovering learned fluid states, but also generates plausible velocity fields for input parameters that have no direct correspondence in the training data. This is possible due to the inherent capability of deep learning architectures to learn representative features of the data. Having a smooth velocity field reconstruction when continuously exploring the parameter space enables various applications that are particularly useful for the prototyping of expensive fluid simulations: fast construction of simulations, interpolation of fluids with different parameters, and time resampling. To handle applications with extended parameterizations such as the moving smoke scene shown in Section 5.2, we couple an encoder architecture with a latent space integration network. This allows us to advance a simulation in time by generating a sequence of suitable latent codes. Additionally, the proposed architecture works as a powerful compression algorithm for velocity fields with compression rates that outperform previous work [(JSK16)] by two orders of magnitude.
To summarize, the technical contributions of our work include:

The first generative deep learning architecture that fully synthesizes plausible and fully divergencefree 2D and 3D fluid simulation velocities from a set of reduced parameters.

A generative model for fluids that accurately encodes parameterizable velocity fields. Compression rates of 1300 are achieved, as well as 700 performance speedups compared to using the underlying CPU solver for resimulation.

An approach to encode simulation classes into a latent space representation through an autoencoder architecture. In combination with a latent space integration network to advance time, our approach allows flexible interactions with flow simulations.

A detailed analysis of the proposed method, both when reconstructing samples that have a direct correspondence with the training data and intermediate points in the parameter space.
2 Related Work
Reducedorder Methods
Subspace solvers aim to accelerate simulations by discovering simplified representations. In engineering, these techniques go back decades [(Lum67)], but were introduced to computer graphics by Treuille2006 and Gupta:2007 . Since then, improvements have been made to make them modular Wicke2009 , consistent with widelyused integrators Kim2013 , more energypreserving Liu:2015 and memoryefficient Jones2016
. A related "Laplacian Eigenfunctions” approach
deWitt2012 has also been introduced and refined Gerszewski:2015 , removing the need for snapshot training data when computing the linear subspace. The basis functions used by these methods are all linear however, and various methods are then used to coerce the state of the system onto some nonlinear manifold. Exploring the use of nonlinear functions, as we do here, is a natural evolution. One wellknown limitation of reducedorder methods is their inability to simulate liquids because the nonlinearity of the liquid interface causes the subspace dimensionality to explode. For example, in solidfluid coupling, usually the fluid is computed directly while only the solid uses the reduced model Lu:2016:TCF . Graphbased methods for precomputing liquid motions Stanton:2014 have had some success, but only under severe constraints, e.g. the user viewpoint must be fixed. In contrast, we show that the nonlinearity of a CNNbased approach allows it to be applied to liquids as well.Machine Learning & Fluids
Combining fluid solvers with machine learning techniques was first demonstrated by Ladicky2015 . By employing Regression Forests to approximate the NavierStokes equations on a Lagrangian system, particle positions and velocities were predicted with respect to input parameters for a next time step. Regression Forests are highly efficient, but require handcrafted features that lack the generality and abstraction power of CNNs. An LSTMbased method for predicting changes of the pressure field for multiple subsequent time steps has been presented by Wiewel2018 , resulting in significant speedups of the pressure solver. For a single time step, a CNNbased pressure projection was likewise proposed Tompson2016 ; Yang2016 . In contrast to our work, these models only replace the pressure projection stage of the solver, and hence are specifically designed to accelerate the enforcement of divergencefreeness. To visually enhance low resolution simulations, Chu2017 synthesized smoke details by looking up precomputed patches using CNNbased descriptors, while Xie2018
proposed a GAN for super resolution smoke flows. Other works enhance FLIP simulations with a learned splash model
Um2017 , while the deformation learning proposed by Prantl2017 shares our goal of capturing sets of fluid simulations. However, it only partially employs CNNs and focuses on signed distance functions, while our work targets the velocity spaces of a broad class of fluid simulations. LatticeBoltzmann steadystate flow solutions are recovered by CNN surrogates using signed distance functions as input boundary conditions in Guo2016 . Farimani2017 use Generative Adversarial Networks (GANs) GoodFellow2014 to train steady state heat conduction and steady incompressible flow solvers. Their method is only demonstrated in 2D and the interpolation capabilities of their architecture are not explored. For both methods, the simulation input is a set of parameterized initial conditions defined over a spatial grid, and the output is a single steady state solution. Recently, Umetani2018 developed a datadriven technique to predict fluid flows around bodies for interactive shape design, while Ma et al. maDrl2018 have demonstrated deep learning based fluid interactions with rigid bodies.Machine Learning & Physics
In the physics community, neural networks and deep learning architectures for approximating, enhancing and modeling solutions to complex physics problems are gaining attention. A few recent examples are Carleo2016
using reinforcement learning to reduce the complexity of a quantum manybody problem,
Ling2016employing deep neural networks to synthesize Reynolds average turbulence anisotropy tensors from highfidelity simulation data, and
Paganini2017 modeling calorimeter interactions with electromagnetic showers using GANs. GANs have also been employed to generate Ravanbakhsh2016 and deconvolve Schawinski2017 galaxy images, and reconstruct threedimensional porous media Mosser2017 . As we focus on generative networks for known parameterizations, we will not employ learned, adversarial losses. Rather, we will demonstrate that a high quality representation can be learned by constructing a suitable direct loss function.3 A Generative Model For Fluids
Fluids are traditionally simulated by solving the inviscid momentum and mass conservation equations, where and are the fluid velocity and pressure, is the material derivative and represents external forces. The viscosity can be included, but simulations for visual effects usually rely on numerical dissipation instead. For a given set of simulated fluid examples, our goal is to train a CNN that approximates the original velocity field data set. By minimizing loss functions with subsequent convolutions applied to its input, CNNs organize the data manifold into shiftinvariant feature maps.
Numerical fluid solvers work by advancing a set of fully specified initial conditions. By focusing on scenes that are initially parameterizable by a handful of variables, such as the position of a smoke source, we are able to generate samples for a chosen class of simulations. Thus, the inputs for our method are parameterizable data sets, and we demonstrate that accurate generative networks can be trained in a supervised way.
3.1 Loss Function for Velocity Reconstruction
The network’s input is characterized by a pair , where
is a single velocity vector field frame in
dimensions (i.e. for 2D and for 3D) with height , width and depth (1 for 2D), generated using the solver’s parameters . For the 2D example in Figure 2, is the combination of position and width of the smoke source, and the current time of the frame. Due to the inherent nonlinear nature of the NavierStokes equations, these three parameters (i.e. position, width, and time) yield a vastly different set of velocity outputs.For fitting fluid samples, our network uses velocityparameter pairs and updates its internal weights by minimizing a loss function. This process is repeated by random batches until the network minimizes a loss function over all the training data. While previous works have proposed loss functions for natural images, e.g., norms, MSSSIM ZhaoGFK15 , and perceptual losses Johnson2016 ; Ledig2016 , accurate reconstructions of velocity fields have not been investigated. For fluid dynamics it is especially important to ensure conservation of mass, i.e., to ensure divergencefree motion for incompressible flows. We therefore propose a novel stream function based loss function defined as
(1) 
is the network output and is a simulation sample from the training data. The curl of the model output is the reconstruction target, and it is guaranteed to be divergencefree by construction, as . Thus, implicitly learns to approximate a stream function (i.e. for 2D and for 3D) that corresponds to a velocity sample .
While this formulation is highly suitable for incompressible flows, regions with partially divergent motion, such as extrapolated velocities around the free surface of a liquid, are better approximated with a direct velocity inference. For such cases, we remove the curl term from Equation (1), and instead use
(2) 
where the output of represents a velocity field with .
In both cases, simply minimizing the distance of a highorder function approximation to its original counterpart does not guarantee that their derivatives will match. Consider the example shown in the inset image: given a function (black line, gray circles), two approximations (red line, top image; blue line, bottom image) of it with the same average distances are shown. In the upper image derivatives do not match, producing a jaggy reconstructed behavior; in the bottom image both values and derivatives of the distance are minimized, resulting in matching derivatives. With a sufficiently smooth data set, highfrequency features of the CNN are in the null space of the distance minimization and noise may occur. We discuss this further in the supplemental material.
Thus, we augment our loss function to also minimize the difference of the velocity field gradients. The velocity gradient is a secondorder tensor that encodes vorticity, shearing and divergence information. Similar techniques, as image gradient difference loss Mathieu2015 , have been employed for improving frame prediction on video sequences. However, to our knowledge, this is the first architecture to employ gradient information to improve velocity field data. Our resulting loss function is defined as
(3) 
where for incompressible flows and for compressible flows, and and are weights used to emphasize the reconstruction of either the velocities or their derivatives. In practice, we used for normalized data (see Section 6.1). The curl of the velocity and its gradients are computed internally by our architecture and do not need to be explicitly included in the data set.
3.2 Implementation
For the implementation of our generative model we adopted and modified the network architecture from Berthelot2017 . As illustrated in Figure 3, our generator starts by projecting the initial parameters into an dimensional vector of weights via fully connected layers. The dimension of depends on the network output and on a custom defined parameter . With , is calculated by , meaning that the minimum supported number of cells in one dimension is 8. Additionally, we constrain all our grid dimensions to be divisible by . Since we use a fixed number of feature maps per layer, the number of dimensions of is and those will be expanded to match the original output resolution.
The dimensional vector is then reshaped to a tensor. As shown in Figure 3, the generator component is subdivided into small () and big blocks (). For small blocks, we perform (most of our examples used
) flat convolutions followed by Leaky Rectified Linear Unit (LReLU) activation functions
maas2013lrelu . We substituted the Exponential Liner Unit (ELU) activation function in the original method from Berthelot2017 by the LReLU as it yielded sharper outputs when minimizing the loss function. Additionally, we employ residual skip connections He2015 , which are an elementwise sum of feature maps of input and output from each . While the concatenative skip connections employed by Berthelot2017 are performed between the first hidden states and the consecutive maps with doubling of the depth size to 256, ours are applied to all levels of s with a fixed size of 128 feature maps. After the following upsample operation, the dimension of the output from a after passes is . Our experiments showed that performing these reductions to the feature map sizes with the residual concatenation improved the network training time without degradation of the final result.4 Extended Parameterizations
Scenes with a large number of parameters can be challenging to parameterize. For instance, the dynamically moving smoke source example (Figure 11) can be parameterized by the history of control inputs, i.e., , where and represent the smoke source position and the reconstructed velocity field at time , respectively. In this case, however, the number of parameters grows linearly with the number of frames tracking user inputs. As a consequence, the parameter space would be infeasibly large for datadriven approaches, and would be extremely costly to cover with samples for generating training data.
To extend our approach to these challenging scenarios, we add an encoder architecture to our generator of Section 3, and combine it with a second smaller network for time integration (Section 4.1), as illustrated in Figure 4. In contrast to our generative network, the encoder architecture maps velocity field frames into a parameterization , in which is a reduced latent space that models arbitrary features of the flow in an unsupervised way and is a supervised parameterization to control specific attributes Kulkarni2015 . Note that this separation makes the latent space sparser while training, which in turn improves the quality of the reconstruction. For the moving smoke source example in Section 5.2, and encodes positions used to control the position of the smoke source.
The combined encoder and generative networks are similar to Deep Convolutional autoencoders Vincent2010 , where the generative network acts as a decoder. The encoding architecture is symmetric to our generative model, except that we do not employ the inverse of the curl operator and the last convolutional layer. We train both generative and encoding networks with a combined loss similar to Equation (3), as
(4) 
where is the part of the latent space vector constrained to represent control parameters , and is a weight to emphasize the learning of supervised parameters. Note that the
distance is applied to control parameters unlike vector field outputs, as it is a standard cost function in linear regression. As before, we used
for all our normalized examples (Section 6.1). With this approach we can handle complex parameterizations, since the velocity field states are represented by the remaining latent space dimensions in . This allows us to use latent spaces which do not explicitly encode the time dimension as a parameter. Instead, we can use a second latent space integration network that generates a suitable sequence of latent codes.4.1 Latent Space Integration Network
The latent space only learns a diffuse representation of time by the velocity field states . Thus we propose a latent space integration network for advancing time from reduced representations. The network takes an input vector which is a concatenation of a latent code at current time and a control vector difference between user input parameters . The parameter has the same dimensionality as the supervised part of our latent space, and serves as a transition guidance from latent code to . The output of is the residual between two consecutive states. Thus, a new latent code is computed with as seen in Figure 4.
For improved accuracy we let look ahead in time, by training the network on a window of sequential latent codes with an loss function:
(5) 
where is recursively computed from to . Our window loss Equation (5) is designed to minimize not only errors on the next single step integration but also errors accumulated in repeated latent space updates. We found that yields good results, and a discussion of the effects of different values of is provided in the supplemental material.
We realize
as a multilayer perceptron (MLP) network. The rationale behind choosing MLP instead of LSTM is that
is designed to be a navigator on the manifold of the latent space, and we consider these integrations as controlled individual steps rather than physically induced ones. The network consists of three fully connected layers coupled with ELU activation functions. We employ batch normalization and dropout layers with probability of 0.1 to avoid overfitting. Once the networks
and are trained, we use Algorithm 1 to reconstruct the velocity field for a new simulation. The algorithm starts from an initial reduced space that can be computed from an initial incompressible velocity field. The main loop consists of concatenating the reduced space and the position update into ; then the latent space integration network computes , which is used to update to . Finally, the generative network reconstructs the velocity field by evaluating .5 Results
In the following we demonstrate that our Deep Fluids CNN can reliably recover and synthesize dynamic flow fields for both smoke and liquids. We refer the reader to the supplemental video for the corresponding animations. For each scene, we reconstruct velocity fields computed by the generative network and advect densities for smoke simulations or surfaces for liquids. Vorticity confinement or turbulence synthesis were not applied after the network’s reconstruction, but such methods could be added as a postprocessing step. We trained our networks using the Adam optimizer kingma2014adam for 300,000 iterations with varying batch sizes to maximize GPU memory usage (8 for 2D and 1 for 3D). For the time network , we use 30,000 iterations. The learning rate of all networks is scheduled by a cosine annealing decay Loshchilov2016 , where we use the learning range from Smith2015 . Scene settings, computation times and memory consumptions are summarized in Table 1. Fluid scenes were computed with Mantaflow mantaflow using an Intel i76700K CPU at 4.00 GHz with 32GB memory, and CNN timings were evaluated on a 8GB NVIDIA GeForce GTX 1080 GPU. Networks are trained on a 12GB NVIDIA Titan X GPU.
5.1 2D Smoke Plume
A sequence of examples which portray varying, rising smoke plumes in a rectangular container is shown in Figure 2, where advected densities for different initial source positions (top) and widths (bottom) are shown. Since visualizing the advected smoke may result in blurred flow structures, we display vorticities instead, facilitating the understanding of how our CNN is able to reconstruct and interpolate between samples present in the data set. Additionally, we use the hat notation to better differentiate parameters that do not have a direct correspondence with the ground truth data (e.g., for an interpolated position on the xaxis). Our training set for this example consists of the combination of samples with varying source widths and samples with varying positions . Each simulation is computed for frames, using a grid resolution of and a domain size of . The network is trained with a total of unique velocity field samples.
Reconstruction with Direct Correspondences to the Data Set
To analyze the reconstruction power of our approach, we compare generated velocities for parameters which have a direct correspondence to the original data set, i.e. the ground truth samples. Figure 5 shows vorticity plots comparing the ground truth (G.t.) and our CNN output for two different frames. The CNN shows a high reconstruction quality, where coarse structures are almost identically reproduced, and fine structures are closely approximated.
Sampling at Interpolated Parameters
We show the interpolation capability of our approach in Figure 6. Left and right columns show the CNN reconstructions at ground truth correlated positions and , while the second column shows a vorticity plot interpolated at . The third column shows the simulated ground truth for the same position. For positions not present in the original data, our CNN synthesizes plausible new motions that are close to ground truth simulations.
5.2 3D Smoke Examples
Smoke & Sphere Obstacle
Figure 7 shows a 3D example of a smoke plume interacting with a sphere computed on a grid of size . The training data consists of ten simulations with varying sphere positions, with the spaces between spheres centroid samples consisting of 0.06 in the interval . The left and right columns of Figure 7 show the CNNreconstructed simulations at positions and , while the second column presents the interpolated results using our generative network at . Even with a sparse and hence challenging training data set, flow structures are plausibly reconstructed and compare favorably with ground truth simulations (third column) that were not present in the original data set.
Smoke Inflow and Buoyancy
A collection of simulations with varying inflow speed (columns) and buoyancy (rows) is shown in Figure 1 for the ground truth (left) and our generative network (right). We generated 5 inflow velocities (in the range ) along with 3 different buoyancy values (from to ) for 250 frames. Thus, the network was trained with unique velocity fields. Figure 8 demonstrates an interpolation example for the buoyancy parameter. The generated simulations on the left and right (using a buoyancy of and ) closely correspond to the original data set samples, while the second simulation is reconstructed by our CNN using an interpolated buoyancy of . We show the ground truth simulation on the third image for a reference comparison. Our method recovers structures accurately, and the plume shape matches the reference ground truth.
Rotating Smoke
We trained our autoencoder and latent space integration network for a smoke simulation with a periodically rotating source using 500 frames as training data. The source rotates in the plane with a period of 100 frames. This example is designed as a stress test for extrapolating time using our latent space integration network. In Figure 9, we show that our approach is able to correctly capture the periodicity present in the original data set. Moreover, the method successfully generates another 500 frames, resulting in a simulation that is longer than the original data.
Moving Smoke
A smoke source is moved in the plane along a path randomly generated using Perlin noise. We sampled 200 simulations on a grid of size for 400 frames  a subset is shown in Figure 10  and used them to train our autoencoder and latent space integration networks. In Figure 11, we show a moving smoke source whose motion is not part of the training data and was computed by integrating in the latent space. We extrapolate in time to increase the simulation duration by (i.e., 800 frames). The network generates a plausible flow for this unseen motion over the full course of the inferred simulation. Although the results shown here were rendered offline, the high performance of our trained model would allow for interactive simulations.
5.3 3D Liquid Examples
Spheres Dropping on a Basin
We demonstrate that our approach can also handle splashing liquids. We use a setup for two spheres dropping on a basin, which is parameterized by the initial distance of the spheres, as well as by the initial drop angles along the plane relative to the basin. We sample velocity field sequences by combining 5 different distances and 10 different angles; Figure 12 shows of the training samples. With frames in time, the network is trained with unique velocity fields. We used a singlephase solver and extrapolated velocities from the liquid to the air phase before training (extrapolation size cells). Figure 13, middle, shows our result for an interpolated angle of and a sphere distance of , given two CNNreconstructed input samples on the left () and right (). Our results demonstrate that the bulk of the liquid dynamics are preserved well. Small scale details such as highfrequency structures and splashes, however, are particularly challenging and deviate from the reference simulations.
Viscous Dam Break
In this example, a dam break with four different viscosity strengths was used to train the network. Our method can reconstruct simulations with different viscosities accurately, and also interpolate between different viscosities with high fidelity. In Figure 14, the CNNreconstructed, greencolored liquids have direct correspondences in the original dataset; the pinkcolored simulations are interpolation results between the two nearest green samples. Additionally, it is also possible to increase the viscosity over time as shown in Figure 15. The results show that this works reliably although the original parameterization does neither support timevarying viscosities nor do the training samples represent such behavior.
Slow Motion Fluids
Our supplemental video additionally shows an interesting use case that is enabled by our CNNbased interpolation: the generation of temporally upsampled simulations. Based on a trained model we can create slowmotion effects, which we show for the liquid drop and dam break examples.
6 Evaluation and Discussion
6.1 Training
Our networks are trained on normalized data in the range
. In case of velocity fields, we normalize them by the maximum absolute value of the entire data set. We found that batch or instance normalization techniques do not improve our velocity fields output, as the highest (lowest) pixel intensity and mean deviation might vary strongly within a single batch. Frames from imagebased data sets have a uniform standard deviation, while velocity field snapshots can vary substantially. Other rescaling techniques, such as standardization or histogram equalization, could potentially further improve the training process.
Convergence of the Training
The presented architecture is very stable and all our tests have converged reliably. Training time highly depends on the example and the targeted reconstruction quality. Generally, 3D liquid examples require more training iterations (up to 100 hours of training) in order to get high quality surfaces, while our smoke examples finished on average after 72 hours of training.
Figure 16 shows a convergence plot of the 2D smoke example, with training iterations on the axis and error on the axis. The superimposed images show clearly how quality increases along with training iterations. After about iterations, the smoke plume is already reconstructed with good accuracy. This corresponds to roughly hours of training on our hardware. Scaling tests with various 2D grid resolutions (, , , ) have shown that the training speed scales proportionally with the resolution, while keeping the mean absolute error at a constant level.
6.2 Performance Analysis
Table 1 summarizes the statistics of all presented examples. In terms of wallclock time, the proposed CNN approach generates velocity fields up to 700
faster than resimulating the data with the underlying CPU solver. Some care must be taken when interpreting this number because our Tensorflow network runs on the GPU, while the original Mantaflow code runs on the CPU. Fluid simulations are known to be memory bandwidthlimited
Kim:2008:HAO , and the bandwidth discrepancy between a GTX 1080 (320 GB/s) and our Intel desktop (25.6 GB/s) is a factor of 12.5. However, even if we conservatively normalize by this factor, our method achieves a speedup of up to . Thus, the core algorithm is still at least an order of magnitude faster. To facilitate comparisons with existing subspace methods, we do not include the training time of our CNN when computing the maximum speedup, as precomputation times are customarily reported separately. Instead, we include them in the discussion of training times below.Contrary to traditional solvers, our approach is able to generate multiple frames in time independently. Thus, we can efficiently concatenate CNN queries into a GPU batch, which then outputs multiple velocity fields at once. Adding more queries increases the batch size (Table 1, 5th column, number in brackets), and the maximum batch size depends on the network size and the hardware’s memory capacity. Since we are using the maximum batch size possible for each scene, the network evaluation time scales inversely with the maximum batch size supported by the GPU. Due to the inherent capability of GPUs to efficiently schedule floating point operations, the time for evaluating a batch is independent of its size or the size of the network architecture. Additionally, our method is completely oblivious to the complexity of the solvers used to generate the data. Thus, more expensive stream function Ando2015 or energypreserving Mullen2009 solvers could potentially be used with our approach, yielding even larger speedups.
In contrast, computing the linear basis using traditional SVDbased subspace approaches can take between 20 Kim2013 and 33 Wicke2009 hours. The process is noniterative, so interrupting the computation can yield a drastically inferior result, i.e. the most important singular vector may not have been discovered yet. Stanton:2013:NGP reduced the precomputation time to 12 hours, but only by using a 110node cluster. In contrast, our iterative training approach is fully interruptible and runs on a single machine.
Compression
The memory consumption of our method is at most 30 MB, which effectively compresses the input data by up to 1300. Previous subspace methods Jones2016 only achieved ratios of , hence our results improve on the stateoftheart by two orders of magnitude. We have compared the compression ability of our network to FPZIP lindstrom2006fast , a data reduction technique often used in scientific visualization meyer2017data . FPZIP is a predictionbased compressor using Lorenzo predictor, which is designed for large floatingpoint data sets in arbitrary dimension. For the two data sets of Section 5.1 and Section 5.2, we reached a compression of and , respectively. In comparison, FPZIP achieves a compression of for both scenes with 16 bits of precision (with a mean absolute error comparable to our method). When allowing for a larger mean absolute error, FPZIP achieves a and compression. I.e., the data is more than larger and has a reduced quality compared to our encoding. Thus, our method outperforms commonly used techniques for compressing scientific data. Details can be found in the supplemental material.
Grid  Simulation  Eval. Time  Speed Up  Data Set  Network  Compression  Training  

Scene  Resolution  # Frames  Time (s)  (ms) [Batch]  ()  Size (MB)  Size (MB)  Ratio  Time (h) 
Smoke Plume  21,000  0.033  0.052 [100]  635  2064  12  172  5  
Smoke Obstacle  6,600  0.491  0.999 [5]  513  31143  30  1038  74  
Smoke Inflow  3,750  0.128  0.958 [5]  128  10322  29  356  40  
Liquid Drops  7,500  0.172  1.372 [3]  125  39813  30  1327  134  
Viscous Dam  600  0.984  1.374 [3]  716  2389  29  82  100  
Rotating Smoke  500  0.08  0.52 [10]  308  995  38  26  49  
Moving Smoke  80,000  0.08  0.52 [10]  308  159252  38  49 
* We optimize the network for subspace simulations rather than the quality of reconstruction, so we do not take this number into account when evaluating the maximal compression ratio.
6.3 Quality of Reconstruction and Interpolation
Training Data
Several factors affect the reconstruction and interpolation quality. An inherent problem of machine learning approaches is that quality strongly depends on the data used for training. In our case, the performance of our generative model for interpolated positions is sensitive to the input sampling density and parameters. If the sampling density is too coarse, or if the output abruptly changes with respect to the variation of parameters, errors may appear on reconstructed velocity fields. These errors include the inability to accurately reconstruct detailed flow structures, artifacts near obstacles, and especially ghosting effects in the interpolated results. An example of ghosting is shown in the inset image where only 11 training samples are used (left), instead of the 21 (right) from Section 5.1.
Target Quantities
We have also experimented with training directly with density values (inset image, left) instead of the velocity fields (inset image, right). In case of densitytrained networks, the dynamics fail to recover the nonlinear nature of momentum conservation and artifacts appear. Advecting density with the reconstructed velocity field yields significantly better results. A more detailed discussion about the quality of the interpolation regarding the number of input samples and discrepancies between velocity and density training is presented in the supplemental material.
Velocity Loss
Boundary Conditions
The proposed CNN is able to handle immersed obstacles and boundary conditions without additional modifications. Figure 18 shows sliced outputs for the scene from Figure 7 which contains a sphere obstacle. We compare velocity (top) and vorticity magnitudes (bottom). The first and last images show the reconstruction of the CNN for positions that have correspondences in the training data set. The three images in the middle show results from linearly blending the closest velocity fields, our CNN reconstruction and the ground truth simulation, from left to right respectively. In the case of linearly blended velocity fields, ghosting arises as features from the closest velocity fields are superimposed Thuerey2016 , and the nonpenetration constraints for the obstacle are not respected, as velocities are present inside the intended obstacle positions. In Figure 19, we plot the resulting velocity penetration errors. Here we compute the mean absolute values of the velocities inside the voxelized sphere, normalized by the mean sum of the velocity magnitudes for all cells around a narrow band of the sphere. Boundary errors are slightly higher for interpolated parameter regions (orange line in Figure 19), since no explicit constraint for the object’s shape is enforced. However, the regularized mean error still accounts for less than 1% of the maximum absolute value of the velocity field. Thus, our method successfully preserves the nonpenetration boundary conditions.
Liquidair Interface
Due to the separate advection calculation for particles in our FLIP simulations, smaller splashes can leave the velocity regions generated by our CNNs, causing surfaces advected by reconstructed velocities to hang in midair. Even though the reconstructed velocity fields closely match the ground truth samples, liquid scenes are highly sensitive to such variations. We removed FLIP particles that have smaller velocities than a threshold in such regions, which was sufficient to avoid hanging particles artifacts.
6.4 Extrapolation and Limitations
Extrapolation with Generative Model
We evaluated the extrapolation capabilities for the case where only the generative part of our Deep Fluids CNN (Section 3) is used. Generally, extrapolation works for sufficiently small increments beyond the original parameter space. Figure 20 shows an experiment in which we used weights that were up to 30% of the original parameter range (). The leftmost images show the vorticity plot for the maximum value of the range for the position (top), inflow size (middle), and time (bottom) parameters of the 2D smoke plume example. The rightmost images show the maximum variation of parameters, in which the simulations deteriorate in quality. In practice, we found that up to of extrapolation still yielded plausible results.
Limitations
Our Deep Fluids CNN is designed to generate velocity fields for parameterizable scenes. As such, our method is not suitable for reconstructing arbitrary velocity fields of vastly different profiles by reduction to a shared latent representation. As discussed in Section 6.3, there is also no enforcement of physical constraints such as boundary conditions for intermediate interpolated parameters. Thus, the capability of the network to reconstruct physically accurate samples on interpolated locations depends on the proximity of the data samples in the parameter space. Additionally, the reconstruction quality of the autoencoder and latent space integration networks are affected by the size of the latent space , and there is a possible issue of temporal jittering because of lack of the gradient loss on Equation (5). We provide an extended discussion regarding the reconstruction quality and the latent space size on our supplemental material.
7 Conclusion
We have presented the first generative deep learning architecture that successfully synthesizes plausible and divergencefree 2D and 3D fluid simulation velocities from a set of reduced parameters. Our results show that generative neural networks are able to construct a wide variety of fluid behaviors, from turbulent smoke to viscous liquids, that closely match the input training data. Moreover, our network can synthesize physically plausible motion when the input parameters are continuously varied to intermediate states that were not present during training. In addition, we can handle complex parameterizations in a reduced latent space, enabling flexible latent space simulations by using a latent space integration network.
Our solver has a constant evaluation time and is considerably faster (up to 700) than simulations with the underlying CPU solver, which makes our approach attractive for resimulation scenarios where input interactions can be parameterized. These performance characteristics immediately suggest applications in games and virtual environments. As high resolution fluid simulations are also known to demand large disk and memory budgets, the compression characteristics of our algorithm (with up to 1300) render the method attractive for movie productions as well.
Our CNN architecture was carefully designed to achieve high quality fluid simulations, which is why the loss function considers both the velocity field and its gradient. Over the course of evaluating many alternatives, we found that the most important factor to simulation quality was the amount of training data. If the data sets are too sparse, artifacts appear, and important flow structures are missing. We address this issue by simply increasing the number of training samples, but in scenarios where data was directly captured or the simulation times are prohibitive, this may not be feasible. Improving the reconstruction quality of interpolated states over sparsely sampled data sets is an open direction for future work.
Overall, we found that the proposed CNN is able to reproduce velocity fields accurately. However, for smallscale details or near discontinuities such as boundary conditions, the network can sometimes smooth out fine flow structures. A possible future research direction is the exploration of generative adversarial networks (GANs), partial convolutions Liu2018 , joint training with other fields such as SDF, or alternative distance measures to enhance the accuracy for fine structures in the data. Additionally, it is an interesting direction to improve the latent space integration network in various ways, such as using an endtoend training with the autoencoder or the gradient loss for temporal smoothness.
Generally, combining fluid simulations and machine learning is largely unexplored, and more work is needed to evaluate different architectures and physicsinspired components. Our CNNbased algorithm is the first of its kind in the fluids community, and we believe that the speed, interpolation and compression capabilities can enable a variety of future applications, such as interactive liquid simulations Prantl2017 or implementation of fluid databases.
8 Acknowledgments
This work was supported by the Swiss National Science Foundation (Grant No. 200021_168997) and ERC Starting Grant 637014.
References
 (ATW15) Ando R., Thuerey N., Wojtan C.: A stream function solver for liquid simulations. ACM Transactions on Graphics 34, 4 (2015), 53:1–53:9.
 (BSM17) Berthelot D., Schumm T., Metz L.: BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv:1703.10717.
 (CT17a) Carleo G., Troyer M.: Solving the quantum manybody problem with artificial neural networks. Science 355, 6325 (2017).
 (CT17b) Chu M., Thuerey N.: Datadriven synthesis of smoke flows with CNNbased feature descriptors. ACM Transactions on Graphics 36, 4 (2017), 1–14.
 (DLF12) De Witt T., Lessig C., Fiume E.: Fluid simulation using Laplacian eigenfunctions. ACM Trans. Graph. 31, 1 (2012), 1–11.
 (FGP17) Farimani A. B., Gomes J., Pande V. S.: Deep Learning the Physics of Transport Phenomena. arXiv:1709.02432.
 (GDG17) Guerin E., Digne J., Galin E., Peytavie A., Wolf C., Benes B., Martinez B.: Interactive ExampleBased Terrain Authoring with Conditional Generative Adversarial Networks. ACM Transactions on Graphics (proceedings of Siggraph Asia 2017) 36, 6 (2017).
 (GKSB15) Gerszewski D., Kavan L., Sloan P.P., Bargteil A. W.: Basis Enrichment and Solidfluid Coupling for Modelreduced Fluid Simulation. Computer Animation and Virtual Worlds 26 (2015).
 (GLI16) Guo X., Li W., Iorio F.: Convolutional Neural Networks for Steady Flow Approximation. In Proc. of Knowledge Discovery and Data Mining (2016), pp. 481–490.
 (GN07) Gupta M., Narasimhan S. G.: Legendre fluids: a unified framework for analytic reduced space modeling and rendering of participating media. In Symp. on Computer Anim. (2007), pp. 17–25.
 (GPAM14) Goodfellow I., PougetAbadie J., Mirza M., Xu B., WardeFarley D., Ozair S., Courville A., Bengio Y.: Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. 2014, pp. 2672–2680.
 (HZRS16) He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In CVPR (2016), pp. 770–778.

(JAFF16)
Johnson J., Alahi A., FeiFei L.:
Perceptual losses for realtime style transfer and superresolution.
In
European Conference on Computer Vision
(2016), Springer, pp. 694–711.  (JSK16) Jones A. D., Sen P., Kim T.: Compressing Fluid Subspaces. In Symposium on Computer Animation (2016), pp. 77–84.
 (KA15) Kingma D., Adam J. B.: A method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015), vol. 5.
 (KALL17) Karras T., Aila T., Laine S., Lehtinen J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv:1710.10196.
 (KD13) Kim T., Delaney J.: Subspace fluid resimulation. ACM Transactions on Graphics 32, 4 (2013), 1.
 (Kim08) Kim T.: Hardwareaware Analysis and Optimization of Stable Fluids. In Interactive 3D Graphics and Games (2008), pp. 99–106.
 (KMM17) Kallweit S., Müller T., McWilliams B., Gross M., Novák J.: Deep scattering: Rendering atmospheric clouds with radiancepredicting neural networks. ACM Trans Graph 36, 6 (2017).
 (KWKT15) Kulkarni T. D., Whitney W. F., Kohli P., Tenenbaum J.: Deep convolutional inverse graphics network. In Advances in neural information processing systems (2015), pp. 2539–2547.
 (LH16) Loshchilov I., Hutter F.: SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv:1608.03983.
 (LI06) Lindstrom P., Isenburg M.: Fast and efficient compression of floatingpoint data. IEEE Transactions on Visualization and Computer Graphics 12, 5 (2006), 1245–1250.
 (LJF16) Lu W., Jin N., Fedkiw R.: Twoway Coupling of Fluids to Reduced Deformable Bodies. In SCA (2016), pp. 67–76.
 (LJS15) Ladický L., Jeong S., Solenthaler B., Pollefeys M., Gross M.: Datadriven fluid simulations using regression forests. ACM Transactions on Graphics 34, 6 (2015), 1–9.
 (LKT16) Ling J., Kurzawski A., Templeton J.: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics 807 (2016), 155–166.
 (LMH15) Liu B., Mason G., Hodgson J., Tong Y., Desbrun M.: Modelreduced variational fluid simulation. ACM Transactions on Graphics 34, 6 (2015), 244.
 (LRS18) Liu G., Reda F. A., Shih K. J., Wang T., Tao A., Catanzaro B.: Image inpainting for irregular holes using partial convolutions. arXiv:1804.07723.
 (LTH17) Ledig C., Theis L., Huszár F., Caballero J., Cunningham A., Acosta A., Aitken A. P., Tejani A., Totz J., Wang Z., et al.: Photorealistic single image superresolution using a generative adversarial network. In CVPR (2017), vol. 2, p. 4.
 (Lum67) Lumley J. L.: The Structure of Inhomogeneous Turbulent Flows. In Atmospheric turbulence and radio propagation. Nauka, 1967, pp. 166–178.
 (MCL15) Mathieu M., Couprie C., LeCun Y.: Deep multiscale video prediction beyond mean square error. arXiv:1511.05440.
 (MCP09) Mullen P., Crane K., Pavlov D., Tong Y., Desbrun M.: Energypreserving integrators for fluid animation. ACM Transactions on Graphics 28, 3 (2009), 1.
 (MDB17) Mosser L., Dubrule O., Blunt M. J.: Reconstruction of threedimensional porous media using generative adversarial neural networks. Physical Review E 96, 4 (2017), 043309.
 (MHN13) Maas A., Hannun A., Ng A.: Rectifier nonlinearities improve neural network acoustic models. In ICML (2013), vol. 30.
 (MTP18) Ma P., Tian Y., Pan Z., Ren B., Manocha D.: Fluid directed rigid body control using deep reinforcement learning. ACM Transactions on Graphics (SIGGRAPH Proceedings) (2018).
 (MTV17) Meyer M., Takahashi S., Vilanova A.: Data reduction techniques for scientific visualization and data analysis. STAR 36, 3 (2017).
 (PBT17) Prantl L., Bonev B., Thuerey N.: Precomputed Liquid Spaces with Generative Neural Networks and Optical Flow. arXiv:1704.07854.
 (PdON18) Paganini M., de Oliveira L., Nachman B.: Accelerating science with generative adversarial networks: an application to 3d particle showers in multilayer calorimeters. Physical review letters 120, 4 (2018), 042003.
 (RLM17) Ravanbakhsh S., Lanusse F., Mandelbaum R., Schneider J. G., Poczos B.: Enabling dark energy science with deep generative models of galaxy images. In AAAI (2017), pp. 1488–1494.
 (Smi17) Smith L. N.: Cyclical learning rates for training neural networks. In Applications of Computer Vision (2017), pp. 464–472.
 (SSW13) Stanton M., Sheng Y., Wicke M., Perazzi F., Yuen A., Narasimhan S., Treuille A.: Nonpolynomial galerkin projection on deforming meshes. ACM Trans Graph 32, 4 (2013), 86:1.
 (Sta14) Stanton M.: DataDriven Methods for Interactive Simulation of Complex Phenomena. PhD thesis, Carnegie Mellon Univ., 2014.
 (SZZ17) Schawinski K., Zhang C., Zhang H., Fowler L., Santhanam G. K.: Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Mon Not R Astron Soc.: Letters 467, 1 (2017), L110–L114.
 (Thu17) Thuerey N.: Interpolations of smoke and liquid simulations. ACM Transactions on Graphics (TOG) 36, 1 (2017), 3.
 (TLP06) Treuille A., Lewis A., Popović Z.: Model reduction for realtime fluids. ACM Transactions on Graphics 25, 3 (2006), 826.
 (TP18) Thuerey N., Pfaff T.: MantaFlow, 2018. mantaflow.com.
 (TSSP17) Tompson J., Schlachter K., Sprechmann P., Perlin K.: Accelerating Eulerian fluid simulation with convolutional networks. In Proceedings of Machine Learning Research (Aug 2017), vol. 70, pp. 3424–3433.
 (UB18) Umetani N., Bickel B.: Learning threedimensional flow for interactive aerodynamic design. ACM Transactions on Graphics (SIGGRAPH Proceedings) (2018).
 (UHT18) Um K., Hu X., Thuerey N.: Liquid splash modeling with neural networks. Computer Graphics Forum 37, 8 (2018), 171–182.

(VLL10)
Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P.A.:
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.
J. Mach. Learn. Res. 11 (2010), 3371–3408.  (WBT18) Wiewel S., Becher M., Thuerey N.: Latentspace Physics: Towards Learning the Temporal Evolution of Fluid Flow. arXiv:1802.10123.
 (WST09) Wicke M., Stanton M., Treuille A.: Modular bases for fluid dynamics. In ACM SIGGRAPH (2009).
 (XFCT18) Xie Y., Franz E., Chu M., Thuerey N.: tempogan: A temporally coherent, volumetric gan for superresolution fluid flow. ACM Transactions on Graphics (TOG) 37, 4 (2018), 95.
 (YYX16) Yang C., Yang X., Xiao X.: Datadriven projection method in fluid simulation. CAVW 27, 34 (2016), 415–424.
 (ZGFK17) Zhao H., Gallo O., Frosio I., Kautz J.: Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging 3, 1 (2017), 47–57.
Comments
There are no comments yet.