Recent work has demonstrated the effectiveness of using neural networks for fluid simulations. Among other work, convolutional neural networks (CNNs) have been used to improve the computational efficiency by either replacing the pressure computation [TSSP17] or predicting the fluid motion with an LSTM network [WBT19]. A generative deep neural network for parameterized fluid simulations was introduced that reconstructs the velocity field based on a few input parameters [KAT19]
. This can be used for data compression or to generate simulations by interpolating in the parameter space. A main limitation of this previous approach (and related neural fluid solvers) is the difficulty to reconstruct high frequency flow structures, which can be accounted to the-loss function that is used on the velocity and its gradient.
Loss functions and their influence on reconstruction quality have been studied in the image processing literature. Natural images are known to have an expected power spectrum following an inverse power law: the magnitudes of spectral decompositions are mostly concentrated in lower frequencies. However, high-frequencies with lower magnitudes tend to concentrate perceptually important details. Thus, using a mean square error loss will often be more effective in matching lower frequencies that have higher magnitudes and dominate the power spectrum [WB09].
Rahaman et al. [RBA19]
used Fourier analysis to investigate which frequencies neural networks tend to learn and reported a phenomenon they call the spectral bias when experimenting with ReLU networks. The authors found that neural networks favor low frequencies, even though they are theoretically universal approximators[Hor91]. Other work has arrived at similar conclusions [XZX19, XZL19], reporting that dominant low-frequency components are captured quickly during training and high-frequency ones only slowly thereafter (F-Principle). It is argued that this acts as an intrinsic regularizer and filter for noisy input data. Complementary losses were used to improve high-frequency reconstruction, such as perceptual losses [JAFF16] and adversarial losses (GAN) [GPAM14, BSM17, KALL18].
Previous perceptual-based loss functions were matched by selectively focusing on frequencies through a weighted wavelet per-band loss combined with a term that penalizes low magnitude values [HHST17]
. The wavelet domain was also employed in several image super resolution approaches. A CNN to predict the missing details of wavelet coefficients (sub-bands) of the LR images was used in[GMVM17], and a loop architecture to better explore statistical relationships among wavelet coefficients in different bands was applied in [GCZ18]. Previous work also used clique convolutional blocks [ZSY18], and scatter maps were employed as input to standard CNNs [GX16]. Wavelet super resolution approaches were also augmented with adversarial losses to better reconstruct the distribution of frequencies [HHST19, ZWY19].
Using neural networks for fluid simulations is largely unexplored, and the input data is intrinsically different from images. Our work therefore aims at better understanding the characteristics and reconstruction properties of fluid simulation data. We use the generative approach of [KAT19]
as a baseline, and present a frequency-aware loss function using Fourier transform. Our results show that reconstruction quality can be improved by considering the different frequency bands in the optimization.
2 Generative Network for Fluid Flow Reconstruction
Our work is based on the generative network for fluid simulations presented by Kim et al. Kim2019. Their generator reconstructs a velocity field (height , width , depth , dimension of velocity field ) from a small set of parameters , which can be represented as a function . The values parameterize the scene that was used to create the training data. As an example, could be the position and the width of a smoke source, while represents the simulation time step. The network is trained with the baseline loss that includes -losses on the velocity field and its gradient:
where is the simulated ground truth velocity field, and , and and serve as weighting factors between the two loss terms. A divergence-free velocity field can be enforced by changing the generator to , which changes the network to learn a stream function ( for 2D and for 3D).
2.1 Analysis of the Reconstruction Quality
Although the generative network reliably reconstructs the global structure of the flow, it is apparent that small-scale flow details are lost. To verify and analyze this observation quantitatively, we first transform 100 ground truth and reconstructed velocity field samples into the Fourier domain and plot the distribution of the log-magnitude values. Figure 1
(left) shows that the the distribution of the reconstruction has smaller variance than the ground truth, which can be attributed to the-loss: to achieve a low error over a large number of training examples, the network strives for universally appropriate values and therefore settles for magnitudes close to the mean of the ground truth data. This property might not be a problem for low frequencies since their magnitudes generally stay within a small interval, but higher frequencies usually have high variance. This assumption is supported by Figure 1
(right), which shows the standard deviation of the reconstructed magnitudes separated by bands relative to the standard deviation of the ground truth. Between bands 20 and 30 we observe that the reconstructed standard deviation is only around 60% of the true standard deviation, which results in the observed clustering around the mean value.
2.2 Frequency-Aware Loss Function
Based on the observations discussed above, we propose a frequency-aware method that learns individual frequency bands separately as illustrated in Figure 2. Both the ground truth and the output of the generator are transformed to the Fourier domain () and split into different bands. Then, we compute the -norm of the difference of each band and aggregate them into a weighted sum.
We compute the error directly in the Fourier domain to avoid expensive computations of inverse transforms and gradients. We define the total loss as , with the Fourier loss given as
is the Fourier transform of filtered by band , is the element-wise complex norm, and is the -norm averaged over all pixels in the image. Note that the loss could also be defined on the phase (as in [MDM18]) or magnitude, but we observed low convergence of the phase loss and argue that magnitude loss does not provide enough information for accurate reconstructions.
Defining weights is a non-trivial process, as we cannot simply assign higher weights to higher bands. We consider the following observations: 1) Low frequency bands are visually very important as they define the global flow structure, and we want to retain the high reconstruction quality of the baseline approach in these bands. 2) Very high frequency bands consist of much noise, which negatively affects the overall convergence of the optimization. 3) Bands higher than 30 are visually less relevant, as can be seen in Figure 3
. We therefore reduce the influence of the higher bands, following the intuition that this will eventually lead to a better reconstruction of mid-level frequencies (around bands 10-30). We use a simple heuristic that shifts the weight from the highest frequencies towards lower ones, referred as shift-towards-low (STL). The weightis inversely proportional to the number of pixels the band is covering, with being the total number of pixels and for all but the highest band where .
3 Evaluation and Results
3.1 MRE and Parameter Search
For a quantitative analysis we use the mean relative error (MRE) of the norm that is defined as the sum of differences in complex norm between the reconstructed and the ground truth samples, normalized by the sum of magnitudes of the ground truth. Figure 4 shows the MRE for different bands for the baseline model [KAT19] (black) and the frequency-aware STL (red). It can be seen that the largest improvement of up to 10% is achieved for mid-frequency bands (15-25), while for low and high frequencies the resulting MRE is similar to the baseline.
We performed an extensive grid and random search around the parameters of STL. In the grid search each parameter was set to 50%, 100%, 150%, and 200% of its STL value, except for the highest band which did not include 150%. For the random search we sampled from the standard normal distribution for each parameter. If the resulting value was negative we divided the original STL value by the magnitude, otherwise we multiplied. For better comparison we normalized the final list of values for each run. Figure4 shows the resulting MRE plots for 1482 runs (768 grid search, 714 random search). Although we tested a wide range of different parameter combinations, none of them seem to significantly outperform STL over the whole range of frequencies.
3.2 Reconstruction Quality
The lower error in the mid-frequency bands improves the reconstructed velocity field as shown in Figure 5 (from left to right: baseline, STL and ground truth) and accordingly the resulting density field that is advected with the reconstructed velocity in Figure 6. Although some fine structures are better captured with STL than with the baseline, there is still a discrepancy between the reconstructions and the ground truth.
3.3 Comparison with GAN
We compared our approach with a generative adversarial network (GAN), as GANs are known to perform well for reconstructing details in images. We used PatchGAN [IZZE16] in our implementation. Figure 7 shows that the GAN can generate impressively detailed flow structures that resemble the ground truth data. However, these generated structures only mimic flow details and in fact are less physical than the corresponding result with the baseline approach. This is also reflected by the higher MRE of GANs compared to the baseline, which is shown in Figure 8 (left). Moreover, we observed that GANs are more sensitive to parameters and have an increased training time compared to our STL approach, which is comparable to the baseline performance.
3.4 Histogram Analysis
When investigating the histogram of the log-magnitudes of the Fourier transform in Figure 8 (right), one can notice the ability of the GAN training to better match the underlying distribution since the discriminator learns to distinguish features that are not present on the generated data. In contrast, STL moves the curve only slighly towards the ground truth distribution. Due to this observation, we also evaluated a histogram loss [RWB17], but we found large discrepancies between the resulting velocity field and ground truth, even if histograms better match. We presume that matching a global histogram of the magnitudes does not capture the characteristics of flow data well, and also that the interplay between magnitude and phase might need to be considered in the optimization.
Our evaluation has shown that the baseline method does not minimize the error efficiently for higher frequency bands of the input data, and that a loss function is needed that can better discriminate between low and high frequencies of the input. Results indicate that the inclusion of spectral approaches is a promising direction to improve the reconstruction quality in mid-frequency bands. However, reconstructions still deviate from the ground truth. More research is needed to evaluate the shape and number of bands used in the Fourier loss to develop a definite conclusion on the strengths and weaknesses of the proposed loss. Future work could also include the evaluation of perceptual losses and wavelet approaches. An interesting finding is also that generative adversarial networks can better approximate the ground truth histogram distribution of the Fourier transform’s log-magnitudes and impressively improve the perceived quality. However, the reconstructed flow is non-physical, leading to higher MREs in all bands compared to the baseline and preventing applications related to flow data compression.
This work was supported by the Swiss National Science Foundation (Grant No. 200021_168997).
- [BSM17] Berthelot D., Schumm T., Metz L.: BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv:1703.10717.
- [GCZ18] Geng C., Chen L., Zhang X., Zhou P., Gao Z.: A Wavelet-based Learning for Face Hallucination with Loop Architecture. In Visual Communications and Image Processing (2018), pp. 1–4.
- [GMVM17] Guo T., Mousavi H. S., Vu T. H., Monga V.: Deep Wavelet Prediction for Image Super-Resolution. In Computer Vision and Pattern Recognition Workshops (2017), IEEE, pp. 1100–1109.
- [GPAM14] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y.: Generative adversarial nets. In NIPS 27. 2014, pp. 2672–2680.
- [GX16] Gao X., Xiong H.: A hybrid wavelet convolution network with sparse-coding for image super-resolution. In International Conference on Image Processing (2016), pp. 1439–1443.
- [HHST17] Huang H., He R., Sun Z., Tan T.: Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution. In International Conference on Computer Vision (2017).
- [HHST19] Huang H., He R., Sun Z., Tan T.: Wavelet Domain Generative Adversarial Network for Multi-scale Face Hallucination. International Journal of Computer Vision 127, 6-7 (2019), 763–784.
- [Hor91] Hornik K.: Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251–257.
- [IZZE16] Isola P., Zhu J.-Y., Zhou T., Efros A. A.: Image-to-image translation with conditional adversarial networks. Computer Vision and Pattern Recognition (CVPR) (2016), 5967–5976.
- [JAFF16] Johnson J., Alahi A., Fei-Fei L.: Perceptual losses for real-time style transfer and super-resolution. In ECCV (2016).
- [KALL18] Karras T., Aila T., Laine S., Lehtinen J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR (2018).
- [KAT19] Kim B., Azevedo V. C., Thuerey N., Kim T., Gross M., Solenthaler B.: Deep Fluids: A Generative Network for Parameterized Fluid Simulations. Computer Graphics Forum (Proceedings of Eurographics) 38, 2 (2019), 59–70.
- [MDM18] Meyer S., Djelouah A., McWilliams B., Sorkine-Hornung A., Gross M., Schroers C.: PhaseNet for Video Frame Interpolation. In Computer Vision and Pattern Recognition (2018).
Rahaman N., Baratin A., Arpit D., Draxler F., Lin M., Hamprecht F.,
Bengio Y., Courville A.:
On the spectral bias of neural networks.
Proceedings of Machine Learning Research(2019), vol. 97, pp. 5301–5310.
- [RWB17] Risser E., Wilmot P., Barnes C.: Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses. arXiv:1701.08893.
- [TSSP17] Tompson J., Schlachter K., Sprechmann P., Perlin K.: Accelerating eulerian fluid simulation with convolutional networks. In Proceedings of the 34th ICML Vol. 70 (2017), JMLR. org, pp. 3424–3433.
- [WB09] Wang Z., Bovik A. C.: Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine 26, 1 (2009), 98–117.
- [WBT19] Wiewel S., Becher M., Thuerey N.: Latent-space Physics: Towards Learning the Temporal Evolution of Fluid Flow. Computer Graphics Forum (Proceedings of Eurographics) 38, 2 (2019).
- [XZL19] Xu Z.-Q. J., Zhang Y., Luo T., Xiao Y., Ma Z.: Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks. arXiv:1901.06523.
Xu Z.-Q. J., Zhang Y., Xiao Y.:
Training behavior of deep neural network in frequency domain.In International Conference on Neural Information Processing (2019), Springer, pp. 264–274.
- [ZSY18] Zhong Z., Shen T., Yang Y., Lin Z., Zhang C.: Joint sub-bands learning with clique structures for wavelet domain super-resolution. In NIPS 31. 2018, pp. 165–175.
- [ZWY19] Zhang Q., Wang H., Yang S.: Image Super-Resolution Using a Wavelet-based Generative Adversarial Network. arXiv:1907.10213.