The problem of turbulent scalar mixing has been the subject of widespread investigation for several decades now O’Brien (1960); Brodkey (1975); Pope (2000); Haworth (2010). The problem is explicitly exhibited in the transported probability density function (PDF) description of turbulence in Reynolds-averaged Navier-Stokes (RANS) simulations. With the single-point PDF descriptor, the effects of mixing of a Fickian scalar appear in unclosed forms of the conditional expected dissipation and/or the conditional expected diffusion terms Pope (2000). A similar closure problem is encountered in large eddy simulation (LES) via the probabilistic filtered density function (FDF) Nouri et al. (2017). Development of closures for these terms has been, and continues to be, an area of active research; see e.g. Refs. Frolov et al. (2004); Haworth (2010); Ansari et al. (2011); Pope (2013) for reviews. The overarching goal of turbulence modeling is to find accurate closures for the unclosed terms that appear in PDF/FDF transport equations. As a common practice in turbulence modeling, the unclosed terms are formulated versus closed terms. The form of this closure is based on physical inspection of the problem at hand and it is inherently error prone. This is the major source of modeling uncertainty in turbulence closure.
In this paper, we introduce a new paradigm for turbulent scalar mixing closure, in which the unclosed terms are learned from high-fidelity observations. Such observations may come from direct numerical simulation (DNS) e.g. Girimaji and Zhou (1996); Jaberi et al. (1996); Christie and Domaradzki (1994) or space-time resolved experimental measurements, e.g. Eckstein and Vlachos (2009); Pereira et al. (2000). Obviously, in DNS, the unclosed term can be extracted directly from the simulated results. However, for most realistic applications, performing DNS is cost prohibitive. On the other hand, finding closure from experimental data involves taking derivatives in space-time and decomposition space from the experimental data (in some cases high-order derivatives), which is nontrivial and, even if possible, introduces new uncertainty on the closure depending on the space-time resolution of the measurements. Our ultimate goal is to develop a closure discovery framework that learns the closure from sparse high-fidelity data, such as experimental measurements. The proposed framework replaces the guessing
work often involved in such model development with a data-driven approach that uncovers the closure from data in a systematic fashion. Our approach draws inspiration from the early and contemporary contributions in deep learning for partial differential equationsPsichogios and Ungar (1992); Lagaris et al. (1998); Sirignano and Spiliopoulos (2018); Weinan et al. (2017); Long et al. (2017); Baymani et al. (2010); Chiaramonte and Kiener (2018) and data-driven modeling strategies Rudy et al. (2017, 2018); Pan and Duraisamy (2018), and in particular relies on recent developments in physics-informed deep learning Raissi et al. (2018) and deep hidden physics models Raissi (2018).
As a demonstration example, we consider the binary scalar mixing which has been very useful for PDF closure developments Dopazo (1973); Janicka et al. (1979); Pope (1982); Kosály and Givi (1987); Givi and McMurtry (1988); Norris and Pope (1991); Girimaji (1992a, b); Jaberi and Givi (1995); Subramaniam and Pope (1998); Pope (2013). The problem is typically considered in the setting of a spatially homogeneous flow in which the temporal transport of the scalar-PDF is considered. In this setting, development of a closure which can accurately predict the evolution of the PDF is of primary objective. The relative simplicity of the problem makes it well suited for both DNS and laboratory experiments. The literature is rich with a wealth of data obtained by these means; see e.g. Refs. Girimaji and Zhou (1996); Jaberi et al. (1996); Tavoularis and Corrsin (1981); Eswaran and Pope (1988); McMurtry and Givi (1989); Christie and Domaradzki (1993, 1994); Solomon and Gollub (1991); Thoroddsen and Van Atta (1992); Jayesh and Warhaft (1991, 1992). We will demonstrate that our proposed framework rediscovers the conditional expected dissipation and diffusion.
2 Binary Scalar Mixing
We consider the mixing of a Fickian passive scalar = ( denotes time and
is the position vector), with diffusion coefficientfrom an initially symmetric binary state within the bounds . Therefore, the single-point PDF of at the initial time is where denotes the composition sample space for . Thus , , where indicates the probability mean (average), and
denotes the variance. In homogeneous turbulence, the PDF transport is governed by
where represents the expected value of the scalar dissipation , conditioned on the value of the scalar
where the vertical bar denotes the conditional value. Equation (1) is also expressed by
where denotes the conditional expected diffusion
The closure problem in the PDF transport is associated with the unknown conditional expected dissipation, , and/or the conditional expected diffusion, . At the single-point level none of these conditional averages are known; neither are their unconditional (total) mean values
3 Deep Learning Solution
Given data on the PDF , we are interested in inferring the unknown dissipation and diffusion by leveraging Eqs. (1) and (3), respectively, and consequently solving the closure problem. The data may be obtained from DNS or experimental measurements.
3.1 Conditional Expected Diffusion
by a deep neural network taking as inputsand while outputting and . This choice is motivated by modern techniques for solving forward and inverse problems involving partial differential equations, where the unknown solution is approximated either by a Gaussian process (Raissi et al., 2018; Raissi and Karniadakis, 2018; Raissi et al., 2017a, b; Raissi, 2017; Perdikaris et al., 2017; Raissi and Karniadakis, 2016; Gulian et al., 2018) or a neural network (Raissi et al., 2018; Raissi, 2018; Raissi et al., 2018; Raissi, 2018; Raissi et al., 2018a, b)
. Moreover, placing a prior on the solution itself is fully justified by the similar approach pursued in the past century by classical methods of solving partial differential equations such as finite elements, finite differences, or spectral methods, where one would expand the unknown solution in terms of an appropriate set of basis functions. However, the classical methods suffer from the curse of dimensionality mainly due to their reliance on spatio-temporal grids. In contrast, modern techniques avoid the tyranny of mesh generation, and consequently the curse of dimensionalityRaissi (2018); Weinan et al. (2017), by approximating the unknown solution with a neural network Raissi et al. (2017a, b); Raissi (2018) or a Gaussian process. This transforms the problem of solving a partial differential equation into an optimization problem. This is enabling as it allows us to solve forward, backward (inverse), data-assimilation, data-driven discovery, and control problems (in addition to many other classes of problems of practical interest) using a single unified framework. On the flip side of the coin, this can help us design physics-informed learning machines.
We obtain the required derivatives to compute the residual network
by applying the chain rule for differentiating compositions of functions using automatic differentiationBaydin et al. (2015). It is worth emphasizing that automatic differentiation is different from, and in several respects superior to, numerical or symbolic differentiation; two commonly encountered techniques of computing derivatives. In its most basic description Baydin et al. (2015)
, automatic differentiation relies on the fact that all numerical computations are ultimately compositions of a finite set of elementary operations for which derivatives are known. Combining the derivatives of the constituent operations through the chain rule gives the derivative of the overall composition. This allows accurate evaluation of derivatives at machine precision with ideal asymptotic efficiency and only a small constant factor of overhead. In particular, to compute the required derivatives we rely on TensorflowAbadi et al. (2016), which is a popular and relatively well documented open source software library for automatic differentiation and deep learning computations.
Parameters of the neural networks and can be learned by minimizing the following loss function
where represents the data on the probability density function . Here, the first summation corresponds to the training data on the probability density function while the second summation enforces the structure imposed by Eq. (3) at a finite set of measurement points whose number and locations are taken to be the same as the training data. However, it should be pointed out that the number and locations of the points on which we enforce the set of partial differential equations could be different from the actual training data. Although not pursued in the current work, this could significantly reduce the required number of training data on the probability density function.
3.2 Conditional Expected Dissipation
Alternatively, one could proceed by approximating the function
We use automatic differentiation Baydin et al. (2015) to acquire the required derivatives to compute the residual network . Parameters of the neural networks and can be learned by minimizing the following loss function
where represents the data on the probability density function . Here, the first summation corresponds to the training data on the probability density function while the second summation enforces the structure imposed by Eq. (1) at a finite set of measurement points whose number and locations are taken to be the same as the training data.
To assess the performance of our deep learning algorithms, we consider the amplitude mapping closure (AMC) Kraichnan (1989); Chen et al. (1989); Pope (1991). This provides the external closure for the PDF transport in an implicit manner. This is done by mapping of the random field of interest to a stationary Gaussian reference field , via a transformation
. Once this relation is established, the PDF of the random variable,
, is related to that of a Gaussian distribution. In a domain with fixed upper and lower bounds, the solution for a symmetric field with zero mean, is represented in terms of an unknown timewhere
The AMC captures many of the basic features of the binary mixing problem. Namely, the inverse diffusion of the PDF in the composition domain from a double delta distribution to an asymptotic approximate Gaussian distribution centered around , as the variance goes to zero (or ). There are other means of “driving” the PDF toward Gaussianity (or any other distribution) in a physically acceptable manner. The Johnson-Edgeworth tranlation (JET) Miller et al. (1993) involves the transformation of the random physical field , to a fixed standard Gaussian (or any other) reference field by means of a translation of the form
The function here plays a role similar to that of in the AMC. With appropriate form for the function , the scalar PDF is determined. In this manner, many frequencies can be generated. The AMC, for example, is recovered by the translation ; so can be also labeled as the
-Normal distribution. Recognizing this translation, the relation betweenand the physical time can be determined through knowledge of the higher order statistics. For example, the normalized variance:
5 Results111All data and codes used in this manuscript will be publicly available on GitHub at https://github.com/maziarraissi/DeepTurbulence.
In the following, the AMC (or the -Normal distribution) is utilized to assess the performance of our deep learning framework. In particular, Fig. 3 depicts the exact and the learned conditional expected diffusion . It is worth highlighting that the algorithm has seen no data whatsoever on the conditional expected diffusion. To obtain the results reported in this figure we are approximating and by a deep neural network consisting of 10 hidden layers with 100 neurons per each hidden layer (see Fig. 1). As for the activation functions, we use known in the literature Ramachandran et al. (2017) as the Swish
activation function. The smoothness of Swish and its similarity to ReLU make it a suitable candidate for an activation function while working with physics informed neural networksRaissi (2018). In general, the choice of a neural network’s architecture (e.g., number of layers/neurons and form of activation functions) is crucial and in many cases still remains an art that relies on one’s ability to balance the trade off between expressivity and trainability of the neural network (Raghu et al., 2016). Our empirical findings so far indicate that deeper and wider networks are usually more expressive (i.e., they can capture a larger class of functions) but are often more costly to train (i.e., a feed-forward evaluation of the neural network takes more time and the optimizer requires more iterations to converge). In this work, we have tried to choose the neural networks’ architectures in a consistent fashion throughout the manuscript by setting the number of hidden layers to 10 and the number of neurons to 50 per output variable. Consequently, there might exist other architectures that improve some of the results reported here.
As for the training procedure, our experience so far indicates that while training deep neural networks, it is often useful to reduce the learning rate as the training progresses. Specifically, the results reported here are obtained after , , , and
consecutive epochs of the Adam optimizerKingma and Ba (2014) with learning rates of , , , and , respectively. Each epoch corresponds to one pass through the entire dataset. The total number of iterations of the Adam optimizer is therefore given by times the number of data divided by the mini-batch size. The mini-batch size we used is and the number of data points is . Every iterations of the optimizer takes around on a single NVIDIA Titan X GPU card. The algorithm is capable of reconstructing the probability density function as well as the unknown conditional expected diffusion with relative errors of and , respectively. The relative errors in space as a function of time are depicted in Fig. 4. The relative error is high at small and that is due to the singularity of at . However, at larger times, the error decreases as the effect of initial singularity weakens.
Figure 5 depicts the exact and the learned conditional expected dissipation . It is worth highlighting that the algorithm has seen no data whatsoever on the dissipation coefficient. To obtain the results reported in this figure we are approximating and by a deep neural network outputting two variables consisting of 10 hidden layers with 100 neurons per each hidden layer (see Fig. 2). As for the activation functions, we use . The training procedure is the same as the one explained above, while every iterations of the optimizer takes around . The algorithm is capable of reconstructing the probability density function as well as the unknown conditional expected dissipation with relative errors of and , respectively. The relative errors in space as a function of time are depicted in Fig. 6.
6 Concluding Remarks
In this paper we present a data-driven framework for learning unclosed terms for turbulent scalar mixing. In the presented framework the unclosed terms are learned by (i) incorporating the physics, i.e. the PDF transport equation, and (ii) observe some high-fidelity observations on the PDF. We envision that the presented framework as described above can be straightforwardly extended to high-dimensional cases involving the mixing of multiple species. Early evidence of this claim can be found in Raissi (2018); Weinan et al. (2017), in which the authors circumvents the tyranny of numerical discretization and devise algorithms that are scalable to high-dimensions. A similar technique can be applied here while taking advantage of the fact that the data points lie on a low dimensional manifold simply because is a function from a low dimensional space (i.e., ) to the possibly high-dimensional space of species . Moreover, the approach advocated in the current work is also highly scalable to the big data regimes routinely encountered while studying turbulence simply because the data will be processed in mini-batches.
The work at Brown University is supported by the DARPA EQUiPS grant N66001-15-2-4055 and by the AFOSR Grant FA9550-17-1-0013. All data and codes used in this manuscript will be publicly available on GitHub at https://github.com/maziarraiss/DeepTurbulence.
- O’Brien (1960) E. E. O’Brien, On the Statistical Behavior of a Dilute Reactant in Isotropic Turbulence, Ph.D. Thesis, The Johns Hopkins University, Baltimore, MD, 1960.
- Brodkey (1975) R. S. Brodkey (Ed.), Turbulence in Mixing Operation, Academic Press, New York, NY, 1975.
- Pope (2000) S. B. Pope, Turbulent Flows, Cambridge University Press, Cambridge, UK, 2000.
- Haworth (2010) D. C. Haworth, Progress in probability density function methods for turbulent reacting flows, Prog. Energ. Combust. 36 (2010) 168–259.
- Nouri et al. (2017) A. G. Nouri, M. B. Nik, P. Givi, D. Livescu, S. B. Pope, A self-contained filtered density function, Phys. Rev. Fluids 2 (2017).
- Frolov et al. (2004) S. Frolov, V. Frost, D. Roekaerts, Micromixing in Turbulent Reactive Flows, Torus Press, Moscow, Russia, 2004.
- Ansari et al. (2011) N. Ansari, F. A. Jaberi, M. R. H. Sheikhi, P. Givi, Filtered density function as a modern CFD tool, in: R. S. Maher (Ed.), Engineering Applications of CFD, volume 1 of Fluid Mechanics and Its Applications, International Energy and Environment Foundation, Al-Najaf, Iraq, 2011, pp. 1–22.
- Pope (2013) S. B. Pope, Small scales, many species and the manifold challenges of turbulent combustion, Proc. Combust. Inst. 34 (2013) 1–31.
- Girimaji and Zhou (1996) S. S. Girimaji, Y. Zhou, Analysis and modeling of subgrid scalar mixing using numerical data, Phys. Fluids 8 (1996) 1224–1236.
- Jaberi et al. (1996) F. A. Jaberi, R. S. Miller, C. K. Madnia, P. Givi, Non-Gaussian scalar statistics in homogeneous turbulence, J. Fluid Mech. 313 (1996) 241–282.
- Christie and Domaradzki (1994) S. L. Christie, J. A. Domaradzki, Scale dependence of the statistical character of turbulent fluctuations in thermal convection, Phys. Fluids 6 (1994) 1848–1855.
- Eckstein and Vlachos (2009) A. Eckstein, P. P. Vlachos, Digital particle image velocimetry (dpiv) robust phase correlation, Measurement Science and Technology 20 (2009) 055401.
- Pereira et al. (2000) F. Pereira, M. Gharib, D. Dabiri, D. Modarress, Defocusing digital particle image velocimetry: a 3-component 3-dimensional dpiv measurement technique. application to bubbly flows, Experiments in Fluids 29 (2000) S078–S084.
- Psichogios and Ungar (1992) D. C. Psichogios, L. H. Ungar, A hybrid neural network-first principles approach to process modeling, AIChE Journal 38 (1992) 1499–1511.
- Lagaris et al. (1998) I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE transactions on neural networks 9 (1998) 987–1000.
- Sirignano and Spiliopoulos (2018) J. Sirignano, K. Spiliopoulos, Dgm: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics (2018).
- Weinan et al. (2017) E. Weinan, J. Han, A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics 5 (2017) 349–380.
- Long et al. (2017) Z. Long, Y. Lu, X. Ma, B. Dong, Pde-net: Learning pdes from data, arXiv preprint arXiv:1710.09668 (2017).
- Baymani et al. (2010) M. Baymani, A. Kerayechian, S. Effati, Artificial neural networks approach for solving stokes problem, Applied Mathematics 1 (2010) 288.
- Chiaramonte and Kiener (2018) M. Chiaramonte, M. Kiener, Solving differential equations using neural networks, Machine Learning Project (2018).
- Rudy et al. (2017) S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances 3 (2017) e1602614.
- Rudy et al. (2018) S. Rudy, A. Alla, S. L. Brunton, J. N. Kutz, Data-driven identification of parametric partial differential equations, arXiv preprint arXiv:1806.00732 (2018).
- Pan and Duraisamy (2018) S. Pan, K. Duraisamy, Data-driven discovery of closure models, arXiv preprint arXiv:1803.09318 (2018).
- Raissi et al. (2018) M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics (2018).
- Raissi (2018) M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differential equations, Journal of Machine Learning Research 19 (2018) 1–24.
- Dopazo (1973) C. Dopazo, Non-Isothermal Turbulent Reactive Flows: Stochastic Approaches, Ph.D. Thesis, Department of Mechanical Engineering, State University of New York at Stony Brook, Stony Brook, NY, 1973.
- Janicka et al. (1979) J. Janicka, W. Kolbe, W. Kollmann, Closure of the transport equation for the probability density function of turbulent scalar field, J. Non-Equil. Thermodyn. 4 (1979) 47–66.
- Pope (1982) S. B. Pope, An improved turbulent mixing model, Combust. Sci. Technol. 28 (1982) 131–145.
- Kosály and Givi (1987) G. Kosály, P. Givi, Modeling of turbulent molecular mixing, Combust. Flame 70 (1987) 101–118.
- Givi and McMurtry (1988) P. Givi, P. A. McMurtry, Non-premixed reaction in homogeneous turbulence: Direct numerical simulations, AIChE J. 34 (1988) 1039–1042.
Norris and Pope (1991)
A. T. Norris, S. B. Pope,
Turbulent mixing model based on ordered pairing,Combust. Flame 83 (1991) 27–42.
- Girimaji (1992a) S. S. Girimaji, On the modeling of scalar diffusion in isotropic turbulence, Phys. Fluids A. 4 (1992a) 2529–2537.
- Girimaji (1992b) S. S. Girimaji, A mapping closure for turbulent scalar mixing using a time-evolving reference field, Phys. Fluids A. 4 (1992b) 2875–2886.
- Jaberi and Givi (1995) F. A. Jaberi, P. Givi, Inter-layer diffusion model of scalar mixing in homogeneous turbulence, Combust. Sci. Technol. 104 (1995) 249–272.
- Subramaniam and Pope (1998) S. Subramaniam, S. B. Pope, A mixing model for turbulent reactive flows based on Euclidean minimum spanning trees, Combust. Flame 115 (1998) 487–514.
- Pope (2013) S. B. Pope, A Model for Turbulent Mixing Based on Shadow-Position Conditioning, Phys. Fluids 25 (2013) 110803.
- Tavoularis and Corrsin (1981) S. Tavoularis, S. Corrsin, Experiments in nearly homogenous turbulent shear flow with a uniform mean temperature gradient. Part 1, J. Fluid Mech. 104 (1981) 311–347.
- Eswaran and Pope (1988) V. Eswaran, S. B. Pope, Direct numerical simulations of the turbulent mixing of a passive scalar, Phys. Fluids 31 (1988) 506–520.
- McMurtry and Givi (1989) P. A. McMurtry, P. Givi, Direct numerical simulations of mixing and reaction in a nonpremixed homogeneous turbulent flow, Combust. Flame 77 (1989) 171–185.
- Christie and Domaradzki (1993) S. L. Christie, J. A. Domaradzki, Numerical evidence for the nonuniversality of the soft/hard turbulence classification for thermal convection, Phys. Fluids A 5 (1993) 412–421.
- Solomon and Gollub (1991) T. H. Solomon, J. P. Gollub, Thermal boundary layers and heat flux in turbulent convection: The role of recirculating flows, Phys. Rev. A 43 (1991) 6683–6693.
Thoroddsen and Van Atta (1992)
S. T. Thoroddsen, C. W. Van Atta,
Exponential tails and skewness of density-gradient probability density functions in stably stratified turbulence,J. Fluid Mech. 244 (1992) 547–566.
- Jayesh and Warhaft (1991) Jayesh, Z. Warhaft, Probability distribution of a passive scalar in grid-generated turbulence, Phys. Rev. Lett. 67 (1991) 3503–3506.
- Jayesh and Warhaft (1992) Jayesh, Z. Warhaft, Probability distribution, conditional dissipation, and transport of passive temperature fluctuations in grid-generated turbulence, Phys. Fluids A 4 (1992) 2292–2307.
- Raissi et al. (2018) M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical gaussian processes for time-dependent and nonlinear partial differential equations, SIAM Journal on Scientific Computing 40 (2018) A172–A198.
- Raissi and Karniadakis (2018) M. Raissi, G. E. Karniadakis, Hidden physics models: Machine learning of nonlinear partial differential equations, Journal of Computational Physics 357 (2018) 125–141.
- Raissi et al. (2017a) M. Raissi, P. Perdikaris, G. E. Karniadakis, Inferring solutions of differential equations using noisy multi-fidelity data, Journal of Computational Physics 335 (2017a) 736–746.
- Raissi et al. (2017b) M. Raissi, P. Perdikaris, G. E. Karniadakis, Machine learning of linear differential equations using Gaussian processes, Journal of Computational Physics 348 (2017b) 683 – 693.
- Raissi (2017) M. Raissi, Parametric gaussian process regression for big data, arXiv preprint arXiv:1704.03144 (2017).
- Perdikaris et al. (2017) P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, G. E. Karniadakis, Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling, Proc. R. Soc. A 473 (2017) 20160751.
- Raissi and Karniadakis (2016) M. Raissi, G. Karniadakis, Deep multi-fidelity Gaussian processes, arXiv preprint arXiv:1604.07484 (2016).
- Gulian et al. (2018) M. Gulian, M. Raissi, P. Perdikaris, G. Karniadakis, Machine learning of space-fractional differential equations, arXiv preprint arXiv:1808.00931 (2018).
- Raissi et al. (2018) M. Raissi, A. Yazdani, G. E. Karniadakis, Hidden fluid mechanics: A navier-stokes informed deep learning framework for assimilating flow visualization data, arXiv preprint arXiv:1808.04327 (2018).
- Raissi (2018) M. Raissi, Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations, arXiv preprint arXiv:1804.07010 (2018).
- Raissi et al. (2018a) M. Raissi, P. Perdikaris, G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236 (2018a).
- Raissi et al. (2018b) M. Raissi, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Deep learning of vortex induced vibrations, arXiv preprint arXiv:1808.08952 (2018b).
- Raissi et al. (2017a) M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics informed deep learning (part II): Data-driven discovery of nonlinear partial differential equations, arXiv preprint arXiv:1711.10566 (2017a).
- Raissi et al. (2017b) M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations, arXiv preprint arXiv:1711.10561 (2017b).
- Raissi (2018) M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differential equations, arXiv preprint arXiv:1801.06637 (2018).
- Baydin et al. (2015) A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: a survey, arXiv preprint arXiv:1502.05767 (2015).
- Abadi et al. (2016) M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv:1603.04467 (2016).
- Kraichnan (1989) R. H. Kraichnan, Closures for probability distributions, Bull. Amer. Phys. Soc. 34 (1989) 2298.
- Chen et al. (1989) H. Chen, S. Chen, R. H. Kraichnan, Probability distribution of a stochastically advected scalar field, Phys. Rev. Lett. 63 (1989) 2657–2660.
- Pope (1991) S. B. Pope, Mapping closures for turbulent mixing and reaction, Theor. Comp. Fluid Dyn. 2 (1991) 255–270.
- Miller et al. (1993) R. S. Miller, S. H. Frankel, C. K. Madnia, P. Givi, Johnson-Edgeworth translation for probability modeling of binary scalar mixing in turbulent flows, Combust. Sci. Technol. 91 (1993) 21–52.
- Jiang et al. (1992) T.-L. Jiang, F. Gao, P. Givi, Binary and trinary scalar mixing by Fickian diffusion-Some mapping closure results, Phys. Fluids A 4 (1992) 1028–1035.
- Ramachandran et al. (2017) P. Ramachandran, B. Zoph, Q. V. Le, Searching for activation functions, CoRR abs/1710.05941 (2017).
- Raghu et al. (2016) M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein, On the expressive power of deep neural networks, arXiv preprint arXiv:1606.05336 (2016).
- Kingma and Ba (2014) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).