Deep Learning of Turbulent Scalar Mixing

11/17/2018
by   Maziar Raissi, et al.
0

Based on recent developments in physics-informed deep learning and deep hidden physics models, we put forth a framework for discovering turbulence models from scattered and potentially noisy spatio-temporal measurements of the probability density function (PDF). The models are for the conditional expected diffusion and the conditional expected dissipation of a Fickian scalar described by its transported single-point PDF equation. The discovered model are appraised against exact solution derived by the amplitude mapping closure (AMC)/ Johnsohn-Edgeworth translation (JET) model of binary scalar mixing in homogeneous turbulence.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/13/2020

An exact kernel framework for spatio-temporal dynamics

A kernel-based framework for spatio-temporal data analysis is introduced...
01/14/2020

Turbulent scalar flux in inclined jets in crossflow: counter gradient transport and deep learning modelling

A cylindrical and inclined jet in crossflow is studied under two distinc...
08/12/2021

ST-PCNN: Spatio-Temporal Physics-Coupled Neural Networks for Dynamics Forecasting

Ocean current, fluid mechanics, and many other spatio-temporal physical ...
08/24/2021

Physics-Informed Deep Learning: A Promising Technique for System Reliability Assessment

Considerable research has been devoted to deep learning-based predictive...
08/27/2018

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems

Spatio-temporal problems are ubiquitous and of vital importance in many ...
09/27/2021

Variance partitioning in spatio-temporal disease mapping models

Bayesian disease mapping, yet if undeniably useful to describe variation...
02/18/2021

Deep learning as closure for irreversible processes: A data-driven generalized Langevin equation

The ultimate goal of physics is finding a unique equation capable of des...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of turbulent scalar mixing has been the subject of widespread investigation for several decades now O’Brien (1960); Brodkey (1975); Pope (2000); Haworth (2010). The problem is explicitly exhibited in the transported probability density function (PDF) description of turbulence in Reynolds-averaged Navier-Stokes (RANS) simulations. With the single-point PDF descriptor, the effects of mixing of a Fickian scalar appear in unclosed forms of the conditional expected dissipation and/or the conditional expected diffusion terms Pope (2000). A similar closure problem is encountered in large eddy simulation (LES) via the probabilistic filtered density function (FDF) Nouri et al. (2017). Development of closures for these terms has been, and continues to be, an area of active research; see e.g. Refs. Frolov et al. (2004); Haworth (2010); Ansari et al. (2011); Pope (2013) for reviews. The overarching goal of turbulence modeling is to find accurate closures for the unclosed terms that appear in PDF/FDF transport equations. As a common practice in turbulence modeling, the unclosed terms are formulated versus closed terms. The form of this closure is based on physical inspection of the problem at hand and it is inherently error prone. This is the major source of modeling uncertainty in turbulence closure.

In this paper, we introduce a new paradigm for turbulent scalar mixing closure, in which the unclosed terms are learned from high-fidelity observations. Such observations may come from direct numerical simulation (DNS) e.g. Girimaji and Zhou (1996); Jaberi et al. (1996); Christie and Domaradzki (1994) or space-time resolved experimental measurements, e.g. Eckstein and Vlachos (2009); Pereira et al. (2000). Obviously, in DNS, the unclosed term can be extracted directly from the simulated results. However, for most realistic applications, performing DNS is cost prohibitive. On the other hand, finding closure from experimental data involves taking derivatives in space-time and decomposition space from the experimental data (in some cases high-order derivatives), which is nontrivial and, even if possible, introduces new uncertainty on the closure depending on the space-time resolution of the measurements. Our ultimate goal is to develop a closure discovery framework that learns the closure from sparse high-fidelity data, such as experimental measurements. The proposed framework replaces the guessing

work often involved in such model development with a data-driven approach that uncovers the closure from data in a systematic fashion. Our approach draws inspiration from the early and contemporary contributions in deep learning for partial differential equations

Psichogios and Ungar (1992); Lagaris et al. (1998); Sirignano and Spiliopoulos (2018); Weinan et al. (2017); Long et al. (2017); Baymani et al. (2010); Chiaramonte and Kiener (2018) and data-driven modeling strategies Rudy et al. (2017, 2018); Pan and Duraisamy (2018), and in particular relies on recent developments in physics-informed deep learning Raissi et al. (2018) and deep hidden physics models Raissi (2018).

As a demonstration example, we consider the binary scalar mixing which has been very useful for PDF closure developments Dopazo (1973); Janicka et al. (1979); Pope (1982); Kosály and Givi (1987); Givi and McMurtry (1988); Norris and Pope (1991); Girimaji (1992a, b); Jaberi and Givi (1995); Subramaniam and Pope (1998); Pope (2013). The problem is typically considered in the setting of a spatially homogeneous flow in which the temporal transport of the scalar-PDF is considered. In this setting, development of a closure which can accurately predict the evolution of the PDF is of primary objective. The relative simplicity of the problem makes it well suited for both DNS and laboratory experiments. The literature is rich with a wealth of data obtained by these means; see e.g. Refs. Girimaji and Zhou (1996); Jaberi et al. (1996); Tavoularis and Corrsin (1981); Eswaran and Pope (1988); McMurtry and Givi (1989); Christie and Domaradzki (1993, 1994); Solomon and Gollub (1991); Thoroddsen and Van Atta (1992); Jayesh and Warhaft (1991, 1992). We will demonstrate that our proposed framework rediscovers the conditional expected dissipation and diffusion.

2 Binary Scalar Mixing

We consider the mixing of a Fickian passive scalar = ( denotes time and

is the position vector), with diffusion coefficient

from an initially symmetric binary state within the bounds . Therefore, the single-point PDF of at the initial time is where denotes the composition sample space for . Thus , , where indicates the probability mean (average), and

denotes the variance. In homogeneous turbulence, the PDF transport is governed by

(1)

where represents the expected value of the scalar dissipation , conditioned on the value of the scalar

(2)

where the vertical bar denotes the conditional value. Equation (1) is also expressed by

(3)

where denotes the conditional expected diffusion

(4)

The closure problem in the PDF transport is associated with the unknown conditional expected dissipation, , and/or the conditional expected diffusion, . At the single-point level none of these conditional averages are known; neither are their unconditional (total) mean values

(5)

3 Deep Learning Solution

Given data on the PDF , we are interested in inferring the unknown dissipation and diffusion by leveraging Eqs. (1) and (3), respectively, and consequently solving the closure problem. The data may be obtained from DNS or experimental measurements.

3.1 Conditional Expected Diffusion

Inspired by recent developments in physics informed deep learning Raissi et al. (2018) and deep hidden physics models Raissi (2018), we propose to approximate the function

by a deep neural network taking as inputs

and while outputting and . This choice is motivated by modern techniques for solving forward and inverse problems involving partial differential equations, where the unknown solution is approximated either by a Gaussian process (Raissi et al., 2018; Raissi and Karniadakis, 2018; Raissi et al., 2017a, b; Raissi, 2017; Perdikaris et al., 2017; Raissi and Karniadakis, 2016; Gulian et al., 2018) or a neural network (Raissi et al., 2018; Raissi, 2018; Raissi et al., 2018; Raissi, 2018; Raissi et al., 2018a, b)

. Moreover, placing a prior on the solution itself is fully justified by the similar approach pursued in the past century by classical methods of solving partial differential equations such as finite elements, finite differences, or spectral methods, where one would expand the unknown solution in terms of an appropriate set of basis functions. However, the classical methods suffer from the curse of dimensionality mainly due to their reliance on spatio-temporal grids. In contrast, modern techniques avoid the tyranny of mesh generation, and consequently the curse of dimensionality

Raissi (2018); Weinan et al. (2017), by approximating the unknown solution with a neural network Raissi et al. (2017a, b); Raissi (2018) or a Gaussian process. This transforms the problem of solving a partial differential equation into an optimization problem. This is enabling as it allows us to solve forward, backward (inverse), data-assimilation, data-driven discovery, and control problems (in addition to many other classes of problems of practical interest) using a single unified framework. On the flip side of the coin, this can help us design physics-informed learning machines.

Figure 1: Conditional Expected Diffusion Network:

A plain vanilla densely connected (physics uninformed) neural network, with 10 hidden layers and 50 neurons per hidden layer per output variable (

i.e., neurons per hidden layer), takes the input variables and while outputting and

. As for the activation functions, we use

known in the literature as Swish. For illustration purposes only, the network depicted in this figure comprises of 2 hidden layers and 5 neurons per hidden layers. We employ automatic differentiation to obtain the required derivatives to compute the residual (physics informed) network

. The total loss function is composed of the regression loss of the probability density function

and the loss imposed by the partial differential equation . Here, denotes the identity operator and the differential operators and are computed using automatic differentiation and can be thought of as “activation operators”. Moreover, the gradients of the loss function are back-propagated through the entire network to train the neural network parameters using the Adam optimizer.

The aforementioned prior assumption along with Eq. (3) will allow us to obtain the following physics informed neural network (see Fig. 1)

We obtain the required derivatives to compute the residual network

by applying the chain rule for differentiating compositions of functions using automatic differentiation

Baydin et al. (2015). It is worth emphasizing that automatic differentiation is different from, and in several respects superior to, numerical or symbolic differentiation; two commonly encountered techniques of computing derivatives. In its most basic description Baydin et al. (2015)

, automatic differentiation relies on the fact that all numerical computations are ultimately compositions of a finite set of elementary operations for which derivatives are known. Combining the derivatives of the constituent operations through the chain rule gives the derivative of the overall composition. This allows accurate evaluation of derivatives at machine precision with ideal asymptotic efficiency and only a small constant factor of overhead. In particular, to compute the required derivatives we rely on Tensorflow

Abadi et al. (2016), which is a popular and relatively well documented open source software library for automatic differentiation and deep learning computations.

Parameters of the neural networks and can be learned by minimizing the following loss function

where represents the data on the probability density function . Here, the first summation corresponds to the training data on the probability density function while the second summation enforces the structure imposed by Eq. (3) at a finite set of measurement points whose number and locations are taken to be the same as the training data. However, it should be pointed out that the number and locations of the points on which we enforce the set of partial differential equations could be different from the actual training data. Although not pursued in the current work, this could significantly reduce the required number of training data on the probability density function.

3.2 Conditional Expected Dissipation

Figure 2: Conditional Expected Dissipation Network: A plain vanilla densely connected (physics uninformed) neural network, with 10 hidden layers and 50 neurons per hidden layer per output variable (i.e., neurons per hidden layer), takes the input variables and while outputting and . As for the activation functions, we use known in the literature as Swish. For illustration purposes only, the network depicted in this figure comprises of 2 hidden layers and 5 neurons per hidden layers. We employ automatic differentiation to obtain the required derivatives to compute the residual (physics informed) network . If a term does not appear in the blue boxes (e.g., or ), its coefficient is assumed to be zero. It is worth emphasizing that unless the coefficient in front of a term is non-zero, that term is not going to appear in the actual “compiled” computational graph and is not going to contribute to the computational cost of a feed forward evaluation of the resulting network. The total loss function is composed of the regression loss of the probability density function and the loss imposed by the differential equation . Here, denotes the identity operator and the differential operators and are computed using automatic differentiation and can be thought of as “activation operators”. Moreover, the gradients of the loss function are back-propagated through the entire network to train the neural network parameters using the Adam optimizer.

Alternatively, one could proceed by approximating the function

by a deep neural network taking as inputs and while outputting and . This prior assumption along with Eq. (1) will allow us to obtain the following physics informed neural network (see Fig. 2)

We use automatic differentiation Baydin et al. (2015) to acquire the required derivatives to compute the residual network . Parameters of the neural networks and can be learned by minimizing the following loss function

where represents the data on the probability density function . Here, the first summation corresponds to the training data on the probability density function while the second summation enforces the structure imposed by Eq. (1) at a finite set of measurement points whose number and locations are taken to be the same as the training data.

4 Assessment

To assess the performance of our deep learning algorithms, we consider the amplitude mapping closure (AMC) Kraichnan (1989); Chen et al. (1989); Pope (1991). This provides the external closure for the PDF transport in an implicit manner. This is done by mapping of the random field of interest to a stationary Gaussian reference field , via a transformation

. Once this relation is established, the PDF of the random variable

,

, is related to that of a Gaussian distribution. In a domain with fixed upper and lower bounds, the solution for a symmetric field with zero mean, is represented in terms of an unknown time

where

(6)

The AMC captures many of the basic features of the binary mixing problem. Namely, the inverse diffusion of the PDF in the composition domain from a double delta distribution to an asymptotic approximate Gaussian distribution centered around , as the variance goes to zero (or ). There are other means of “driving” the PDF toward Gaussianity (or any other distribution) in a physically acceptable manner. The Johnson-Edgeworth tranlation (JET) Miller et al. (1993) involves the transformation of the random physical field , to a fixed standard Gaussian (or any other) reference field by means of a translation of the form

The function here plays a role similar to that of in the AMC. With appropriate form for the function , the scalar PDF is determined. In this manner, many frequencies can be generated. The AMC, for example, is recovered by the translation ; so can be also labeled as the

-Normal distribution. Recognizing this translation, the relation between

and the physical time can be determined through knowledge of the higher order statistics. For example, the normalized variance:

(7)

determines through specification of the total mean dissipation . With the knowledge of this dissipation, all of the conditional statistics are determined Miller et al. (1993); Jiang et al. (1992)

(8)
(9)

5 Results111All data and codes used in this manuscript will be publicly available on GitHub at https://github.com/maziarraissi/DeepTurbulence.

Figure 3: Conditional Expected Diffusion: The exact probability density function alongside the learned one is depicted in the top panels, while the exact and learned conditional expected diffusion are plotted in the bottom panels. It is worth highlighting that the algorithm has seen no data whatsoever on the diffusion coefficient.
Figure 4: Conditional Expected Diffusion: The exact probability density function alongside the learned one is depicted in the top panels, while the exact and learned conditional expected diffusion are plotted in the bottom panels. It is worth highlighting that the algorithm has seen no data whatsoever on the diffusion coefficient.

In the following, the AMC (or the -Normal distribution) is utilized to assess the performance of our deep learning framework. In particular, Fig. 3 depicts the exact and the learned conditional expected diffusion . It is worth highlighting that the algorithm has seen no data whatsoever on the conditional expected diffusion. To obtain the results reported in this figure we are approximating and by a deep neural network consisting of 10 hidden layers with 100 neurons per each hidden layer (see Fig. 1). As for the activation functions, we use known in the literature Ramachandran et al. (2017) as the Swish

activation function. The smoothness of Swish and its similarity to ReLU make it a suitable candidate for an activation function while working with physics informed neural networks

Raissi (2018). In general, the choice of a neural network’s architecture (e.g., number of layers/neurons and form of activation functions) is crucial and in many cases still remains an art that relies on one’s ability to balance the trade off between expressivity and trainability of the neural network (Raghu et al., 2016). Our empirical findings so far indicate that deeper and wider networks are usually more expressive (i.e., they can capture a larger class of functions) but are often more costly to train (i.e., a feed-forward evaluation of the neural network takes more time and the optimizer requires more iterations to converge). In this work, we have tried to choose the neural networks’ architectures in a consistent fashion throughout the manuscript by setting the number of hidden layers to 10 and the number of neurons to 50 per output variable. Consequently, there might exist other architectures that improve some of the results reported here.

As for the training procedure, our experience so far indicates that while training deep neural networks, it is often useful to reduce the learning rate as the training progresses. Specifically, the results reported here are obtained after , , , and

consecutive epochs of the Adam optimizer

Kingma and Ba (2014) with learning rates of , , , and , respectively. Each epoch corresponds to one pass through the entire dataset. The total number of iterations of the Adam optimizer is therefore given by times the number of data divided by the mini-batch size. The mini-batch size we used is and the number of data points is . Every iterations of the optimizer takes around on a single NVIDIA Titan X GPU card. The algorithm is capable of reconstructing the probability density function as well as the unknown conditional expected diffusion with relative errors of and , respectively. The relative errors in space as a function of time are depicted in Fig. 4. The relative error is high at small and that is due to the singularity of at . However, at larger times, the error decreases as the effect of initial singularity weakens.

Figure 5: Conditional Expected Dissipation: The exact probability density function alongside the learned one is depicted in the top panels, while the exact and learned conditional expected dissipation are plotted in the bottom panels. It is worth highlighting that the algorithm has seen no data whatsoever on the dissipation coefficient.
Figure 6: Conditional Expected Dissipation: The exact probability density function alongside the learned one is depicted in the top panels, while the exact and learned conditional expected dissipation are plotted in the bottom panels. It is worth highlighting that the algorithm has seen no data whatsoever on the dissipation coefficient.

Figure 5 depicts the exact and the learned conditional expected dissipation . It is worth highlighting that the algorithm has seen no data whatsoever on the dissipation coefficient. To obtain the results reported in this figure we are approximating and by a deep neural network outputting two variables consisting of 10 hidden layers with 100 neurons per each hidden layer (see Fig. 2). As for the activation functions, we use . The training procedure is the same as the one explained above, while every iterations of the optimizer takes around . The algorithm is capable of reconstructing the probability density function as well as the unknown conditional expected dissipation with relative errors of and , respectively. The relative errors in space as a function of time are depicted in Fig. 6.

6 Concluding Remarks

In this paper we present a data-driven framework for learning unclosed terms for turbulent scalar mixing. In the presented framework the unclosed terms are learned by (i) incorporating the physics, i.e. the PDF transport equation, and (ii) observe some high-fidelity observations on the PDF. We envision that the presented framework as described above can be straightforwardly extended to high-dimensional cases involving the mixing of multiple species. Early evidence of this claim can be found in Raissi (2018); Weinan et al. (2017), in which the authors circumvents the tyranny of numerical discretization and devise algorithms that are scalable to high-dimensions. A similar technique can be applied here while taking advantage of the fact that the data points lie on a low dimensional manifold simply because is a function from a low dimensional space (i.e., ) to the possibly high-dimensional space of species . Moreover, the approach advocated in the current work is also highly scalable to the big data regimes routinely encountered while studying turbulence simply because the data will be processed in mini-batches.

Acknowledgements

The work at Brown University is supported by the DARPA EQUiPS grant N66001-15-2-4055 and by the AFOSR Grant FA9550-17-1-0013. All data and codes used in this manuscript will be publicly available on GitHub at https://github.com/maziarraiss/DeepTurbulence.

References