Machine learning and particularly deep learning has achieved significant success in a wide range of commercial domain applications such as image recognition, audio recognition, and natural language processingnasrabadi2007pattern ; krizhevsky2012imagenet ; lecun2015deep ; goodfellow2016deep ; neal2012bayesian . In recent years, machine learning has been widely adopted in scientific applications, leading to a field recently referred to as physics-informed machine learning or more broadly, scientific machine learningwang2017physics ; wu2018physics ; ling2016reynolds ; geneva2019quantifying and multiphase flows ma2015using ; ma2016using and to compute rock permeability directly from CT scan images wu2018seeing
. They have also been used to discover ordinary and partial differential equations (ODEs and PDEs) from databrunton2016discovering ; rudy2017data ; long2017pde ; long2019pde . Finally, neural networks have been used to solve exactly specified PDEs sirignano2018dgm ; he2019mgnet ; berg2018unified ; tripathy2018deep and partially known PDEs by incorporating available data raissi2019physics ; raissi2019deep ; raissi2018multistep ; zhang2019quantifying
. The scientific applications reviewed above mostly involve supervised learning, which consists of three steps:
postulate a model that maps inputs (features) to outputs (labels), controlled by a set of adjustable model parameters;
learn the parameters from training data (labeled examples of input–output pairs); and
use the fitted model to predict the responses for new inputs that were not included in the training data.
1.1 Physical applications of GANs: progress and challenges
Recently, a dramatically different type of machine learning models referred to as generative models have found wide-spread applications in commercial and scientific domains. In particular, among these models the generative adversarial networks (GANs) goodfellow2014generative
have the best performance in many applications. Generative models such as GANs construct mappings from a generic (e.g., uniform or Gaussian) probability distribution to the data distribution. This is in stark contrast to supervised learning, which constructs functional mappings from input features to outputs based on labeled data. Once trained, such models can generate new samples that are not in the training database but conform to the data distribution. As such, the generated samples are “realistic”. As the training only uses unlabeled data, generative models belong to unsupervised learning. In the past few years, GANs have achieved major successes in commercial domain applications such as computer graphics. It also quickly found applications in scientific domains. For example, GANs have been successfully used to synthesize CT-scan images of rocksmosser2017reconstruction ; mosser2018stochastic and to generate realistic, statistics-conforming turbulent flow fields king2017creating ; xie2018tempogan . They have also been used to generate solutions of given algebraic equations or ODEs stinis2019enforcing , PDEs farimani2017deep , and stochastic differential equations yang2018physics by learning from the corresponding examples solutions.
The successful applications to physics prompt a critical question on the capability of GANs: do they generate samples that conform to the underlying physical constraints? These constraints are implicitly embedded in the training data to certain accuracy, because the data are obtained either by solving the equations that describe these constraints or by directly observing the physical system that obey
such constraints. However, the constraints are not explicitly encoded in the GANs. In the above-mentioned applications, the generated output are physical fields that often reside in high-dimensional spaces. This is in contrast to supervised learning, where the output are multi-class labels or low-dimensional scalar and vectors (e.g., the permeability or velocity at a point). Taking the GANs based PDE-emulator for examplefarimani2017deep , the output temperature field discretized on a mesh of grid points has a dimension of . Nevertheless, the physical laws expressed in the form of PDEs place heavy constraints on the admissible solutions. For example, velocity fields of fluid flows must be divergence-free due to mass conservation; temperature fields are typically smooth due to the Laplace operator in the governing equation. Such constraints dictated that admissible (i.e., physical and realistic) solutions must lie on a low-dimensional manifold embedded in a high-dimensional space. Therefore, it imperative to ensure their constraint-respecting properties of GANs before using them as trusted emulators for physical systems.
Fortunately, it has been theoretically proven that GANs are capable of preserving all the constraints and statistics of the training data if global optimum is achieved in the training goodfellow2014generative . However, it is also well-known that traditional GANs has difficulty in convergence and lack robustness in the training arjovsky2017wasserstein ; radford2015unsupervised . Consequently, numerous efforts have been made to improve the stability and robustness in training GANs, leading to a number of GANs variants. For example, researchers proposed different loss functions and divergence formulations to remedy the vanishing gradients associated with the loss function used in the standard GANs lin2017softmax ; mao2017least ; nowozin2016f ; arjovsky2017wasserstein ; gulrajani2017improved . In contrast, the present work investigates the feasibility of using physical constraints to improve the training of GANs used to emulate physical systems. It is expected that our work will complement existing works in regularizing the training of physics-emulating GANs.
1.2 Enforcing physical constraints in GANs
The current work advocates using physical constraints to improve the robustness of training GANs. In our perspective, the physical constraints are considered regularization for the training of GANs to find the global minimum, and not merely as extra requirements to satisfy. We further demonstrate that even the approximate constraint significantly improve the training of GANs. Such approximate constraints are common in practical systems as they may arise from incomplete knowledge of the system properties or from reduced-order modeling of complex system dynamics.
Constraints in physical systems manifested themselves in a wide range of forms ranging from simple algebraic equations and ODEs to nonlinear, coupled PDEs, some of which even contain unknown parameters or terms. Physical systems from a microscopic particle to turbulent flows in the ocean and galaxies in the universe have been found to obey a small set of common constraints and symmetries. Such constraints include conservation of mass, momentum, energy, and electric charges. For simple systems of a few degrees of freedom, such conservation laws can be described by simple algebraic equations or ODEs. Such constraints can be incorporated into neural networks straightforwardly by including them as penalty terms in the loss functions. Stinis et al.stinis2019enforcing performed an extensive exploration on enforcing physical constraints in GANs in a wide range of physical systems and application scenarios. They also pointed out a number of promising directions where further investigations are needed. One of these topics is enforcing approximate constraints on GANs, which is the focus of the present work.
Practical systems in science and engineering often have properties that are only known partially or with significant uncertainties stinis2019enforcing , which leads to approximate or uncertain physical constraints. For example, a particle immersed in a fluid of unknown viscosity can be described by a momentum conservation equation with an uncertain dissipation term due to the viscous drag; a complex fluid of unknown constitutive relation can be described by the Navier–Stokes equations, but the stress term is inevitably approximate. In addition to unknown system properties, approximate constraints may also arise due to reduced-order modeling of complex systems. For example, the mean velocity and pressure fields of turbulent flows can be described by Reynolds averaged Navier–Stokes equations, which has a similar form to the Navier–Stokes equations but with a Reynolds stress term that needs to be modeled pope00turbulent . If one is tasked to generate mean flow fields, the RANS equations can serve as valuable constraints, although the governing equations now contain unknown terms and thus serve only as approximate constraints. Enforcing approximate physical constraints arising in GANs is a major focus of the current paper.
1.3 Scope and contributions of present work
We propose a method to incorporate deterministic yet approximate constraints into GANs and study the effects of such constraints on the training and generative performance of GANs. Compared to the recent work of Stinis et al. stinis2019enforcing studying physical constraints in GANs, the current work differs in two critical aspects: (i) that we focused on approximate, conservation-law-like constraints and (ii) that the constraints are imposed on the generator. This work is complementary to the work presented in a companion paper that studied enforcing statistical constraints wu2019enforcing .
Complex systems can be subject to two distinct types of constraints: deterministic constraints and statistical constraints, which stem from the same set of conservation laws. For example, the Navier–Stokes equations describing the motion and dynamics of incompressible fluid flows originally directly from the mass and momentum conservation, which are referred to as deterministic constraints
in this paper. However, the solutions and properties of such PDEs exhibit sophisticated patterns and statistics due to the complex dynamics. For example, velocity increments in turbulent flows obey non-Gaussian distributions; the turbulent kinetic energy spectrum exhibit a decay rate ofin the universal range of the wavenumber space pope00turbulent . Such statistical constraints describe the properties of an ensemble of system states rather than an individual state. Specifically, given an ensemble of instantaneous velocity and pressure fields of a turbulent flow, each sample must satisfy the deterministic constraints such as the divergence-free condition (mass conservation) and the momentum conservation. In addition, the ensemble as a whole must also satisfy the statistical constraints of turbulent flows (e.g., energy spectrum and velocity increment distribution).
A companion paper of Wu et al. wu2019enforcing investigated the effects of enforcing statistical constraints on GANs. It was found that enforcing statistical constraints improve the convergence rate and robustness (with respect to algorithmic parameters) in the training. Moreover, it was demonstrated that enforcing low-order statistics such as covariance caused GANs to generate samples that better conform to the high-order statistics as well. In the current work, we focus on incorporating deterministic yet approximate constraints into GANs and investigate the effects of such constraints.
This rest of the paper is organized as follows. In Section 2, a brief overview of GANs is given, and the proposed methodology of imposing physical constraints is presented. In Section 3, a test case with simple geometric constraints and another test case of fluid flows with differential constraints are presented to demonstrate the merit of the constrained GANs. Finally, Section 4 concludes the paper.
Generative adversarial networks use a two-player competitive game strategy to train a generator that maps a generic (e.g., Gaussian) distribution in the latent space to a distribution in the data space goodfellow2014generative . Once trained, the generator can produce realistic examples conforming to the data distribution by drawing samples from the generic distribution. The current work focuses on enforcing deterministic, conservation-law-like physical constraints on the generated samples. In this section, we first describe the general idea of GANs architecture and their training procedure. Then, we introduce a generic representation of deterministic yet approximate physical constraints and explain why they are enforced on the loss function of the generator. Finally, we discuss a few practical implementation issues in stabilizing the training.
2.1 Architecture and training of GANs
Generative adversarial networks (GANs) were first proposed by Goodfellow et al. goodfellow2014generative , which features a novel neural network architecture consisting of two competing networks: a generator and a discriminator . The loss function of standard GANs is:
where and denote the probability distributions of training data and latent space vector , respectively, and indicates expectation. The generator takes a vector in the latent space and maps it to an element in the data space. For example, for the task of flow field synthesis,
can be a vector drawn from Gaussian or uniform distributions, which is mapped to the space of flow fields as illustrated in Fig.1. The discriminator takes a sample (either a training sample or a generated sample) in the data space as input and outputs a label to indicate whether the input is drawn from the training data (1 for true and 0 for false).
The training of GANs involves solving the “min–max” optimization problem, where the generator and the discriminator compete with each other to achieve a Nash equilibrium goodfellow2014generative . Specifically, the discriminator is first optimized (with kept fixed) to correctly distinguishes a synthesized sample by generator from a sample from the training data as accurate as possible (i.e., and ), which maximizes both expectations in the loss function. Then, the generator is optimized (with kept fixed) to generate a sample as realistic as possible in the viewpoint of the discriminator. The objective is to trick the discriminator to consider the sample drawn from the training data (i.e., ) as much as possible, which minimizes the expectation of in the loss function. This two-steps process is repeated until the discriminator correctly distinguishes generated samples from training samples with a probability of 50%, where the Nash equilibrium is achieved. Intuitively, in this scenario the generated samples are realistic enough that the discriminator is completely incapable of distinguishing them from training samples.
Goodfellow et al. goodfellow2014generative proved that the global optimum of the “min-max” problem in Eq. (1) is achieved if and only if the generator can exactly mimic the probability distribution of the training data, i.e., . Such a statement builds upon the proof that, with a fixed generator , the solution
minimize the loss function in Eq. (1). That is, the Nash equilibrium is achieved if and only if
. That is, under Nash equilibrium the generator is guaranteed to reproduce all the statistics of the training data, including statistical moments and correlations of any order.
While the theoretical capability of GANs in reproducing data statistics is reassuring, the ideal scenario requires the global optimum is achieved in the training. Unfortunately, standard GANs are difficult to train. One of the reasons is that the gradient vanishes as the discriminator approaches the optimum, which prevents the generator being further optimized. As it is unlikely that both the discriminator and the generator would achieve optimum simultaneously, the vanishing gradient can lead to an unsatisfactory generator.
In light of the vanishing gradient problem of the standard loss function, numerous researchers have proposed alternative loss functions to improve the training of GANs. Arjovsky et al.arjovsky2017wasserstein used the Earth-Mover (Wasserstein-1) distance , with which non-zero gradient of the loss function still exists even with an optimal discriminator . The corresponding loss function is:
where denotes the set of all 1-Lipschitz functions. The network weights of generator are clipped within a compact space to satisfies Lipschitz constraint empirically. Gulrajani et al. gulrajani2017improved pointed out that a poorly designed compact space would result in vanishing or exploding gradient, and they proposed using a gradient penalty (GP) term to enforce the Lipschitz constraint. The new loss function is
where is the sample uniformly generated between the generated data and the training sample , and is the weight of gradient penalty. In this work we used WGAN-GP to achieve improved stability in the training.
In many applications, it is desirable to impose some conditions on the generated samples, e.g., generating velocity fields that correspond to specified boundary conditions or conform to specified statistics. Such conditioning can be achieved by augmenting the latent space vector in the standard GANs with an auxiliary variable that encodes the specified conditions. This modification leads to an architecture referred to as conditional GANs (cGANs) mirza2014conditional . The loss function in Eq. (1) for standard GANs can be modified for cGANs as follows:
where random vectors and conform to their respective distributions while is deterministic. Here, both the generator and the discriminator explicitly depends on the auxiliary variables encoding the specified conditions. Subsequently, when using cGANs to generate samples, the conditioning-encoding vector and the latent space vector as both provided as inputs to the generator . This ensures the generator to output samples that both conform to the data distribution and satisfy the specified conditions. Recently, cGANs have been successfully used to generate solutions for PDEs with specified boundary conditions describing heat diffusion and fluid flows problems farimani2017deep . It can be seen from Eqs. (1) and (5) that the GANs and cGANs formulations are similar. Therefore, it is straightforward to extend the definitions of the GANs variants in Eqs. (3) and (4) to the corresponding cGANs. The extensions are omitted here for brevity.
2.2 Representation of generic physical constraints
Deterministic physical constraints exhibit themselves in a wide range of forms from simple algebraic equation to nonlinear integro-differential equations and inequalities. First, for simpler systems the conservation laws can be directly expressed as algebraic expression. For example, the conservation of potential energy (PE) and kinetic energy (KE) of an idealized pendulum is simply written as ; the energy conservation of inviscid, irrotational flows is , where is the pressure, is the velocity magnitude, is density, is the gravitational acceleration, and is elevation, respectively. Second, the most common physical principles such as conservation laws take the form of differential (sometimes integro-differential) equations , where denotes differential operators. Common examples in computational mechanics include Navier–Stokes equations describing mass and momentum equations of fluid flows and equations of linear elasticity describing the force balance of solids. After linearization and numerical discretization with, e.g., finite-element or finite-volume methods, they can be transformed to linear systems of the form:
where is the matrix resulted from the discretization of differential operator , is the vector of the field to be solved for (e.g., velocity or displacement), and is the forcing vector. Finally, some physics laws take the form of inequality equations, possibly involving differential operators. For example, the second law of thermodynamics states that the total entropy of a closed system can only increase or stays the same, i.e., where , , and denotes entropy increment, heat addition, and surrounding temperature, respectively.
In view of the diverse examples physical laws enumerated above, we write the constraints in a generic form as follows karpatne2017physics :
where denotes an algebraic operator, which can involve nonlinearity and can be obtained from numerical discretization of the integro-differential operators in the original physics constraints. As an example, the linear equation (6) above can be cast into an inequality as , where indicates the Euclidean vector norm.
2.3 Embedding constraints into the generator of GANs
In order to impose deterministic physical constraints in GANs, in this work we add a penalty term to the loss function of the generator based on the generic representation of constraints in Eq. (7). That leads to the following loss function for physics-constrained GANs:
where denotes the penalty coefficient, and is the loss function of the baseline GANs to be constrained. As the physical constraint term is independent of the discriminator , its gradient is non-zero even if the discriminator is close to optimum. Therefore, adding physical constraints to the generator loss function provides an alternative approach for combating the vanishing gradient problem in the training of GANs. Noted that Eq. (8) is generally applicable to different variants of GANs. The general concept of physics-constrained GAN is illustrated in Fig. 2, with the approximate physical constraints explained in Sec. 2.4 in more details.
The physical constraints are imposed on the generator rather than on the discriminator because such constraints helps the weaker party (generator) in the two-players game, which in turn accelerates the convergence to Nash equilibrium between the generator and the discriminator during the training of GANs. In the two-players game, the gradient vanishes for the standard GANs if the discriminator achieves its optimum, and thus the generator would not be able to further update its weights. Recall that the physical constraints embedded into the generator reduce the dimensionality of the weight space during the training by restraining the search to a low-dimensional manifold. That is, the constraints both introduces additional gradient when the discriminator achieves its optimum first and makes the training of the generator easier (by performing optimization within a lower dimensional manifold). Therefore, it is better to embed the physical constraints into the generator, by helping both the generator and the discriminator achieve equilibrium more synchronously.
2.4 Implementation considerations and approximate constraints
In many practical applications, the physical knowledge is commonly limited, i.e., the physical constraints can only be described approximately instead of being specified exactly. This type of physical constraints demands a modified definition of physics-constrained term accordingly. In this work, such approximate constraints are defined as
where is threshold that denotes the extent of uncertainty on the corresponding physical constraints and the tolerance of enforcing such constraints. With such a definition, we enforce an approximate constraint when training the constrained GANs.
In order to facilitate training of the constrained GANs, it is desirable to keep the penalty terms related to the physical constraint of the same order of magnitude as the term associated with the baseline GANs. To this end, the term in the loss function (Eq. (10)) is further replaced by
in our implementation. This modification is proposed to address two practical difficulties. First, in the initial phase of the training, the generated data often have dramatic departure from the physical constraints, leading to very large values of the loss term . Such a disparity in the loss function where one completely dominates the other will lead to difficulty in training. Second, in the final phase of the training, the physical constraint related loss function is much smaller than the standard GANs value term. This makes it difficult to enforce the physical constraint. By using the logarithm of the physical constraint penalty, the two difficulties above can be addressed simultaneously. Note that the constant is added to ensure the strict positivity of the logarithm and to avoid singularity (i.e., ).
The physics-constrained GANs are implemented based on the open-source software library
TensorFlowabadi2016tensorflow . Our implementation is built upon the standard, baseline GANs implementation by Kristiadi et al. kristiadi2019collection . The source code for the physics-constrained GANs and the example cases presented in Section 3 below are publicly available in a GitHub repository zeng2019enforcing .
In this section we use two cases to demonstrate the performance of the proposed method to constrain GANs and to investigate the effects of such enforcing constraints. The constraints are chosen to be representative of, yet much simpler than, the physical constraints in sciences and engineering. Consider the example scenario of using GANs to generate velocity fields of turbulent flows on a mesh of grid points. Apparently, each instantaneous flow field (snapshot) consists of degrees of freedom, which is analogous to a picture of pixels with three channels (corresponding to the three velocity components here) in each pixel. However, the number of intrinsic degrees of freedom is much smaller, because the velocities at the grid points are not completely independent. For example, the divergence of the velocity must be zero everywhere. Moreover, the turbulent kinetic energy distribution at different wavenumbers also determines the smoothness of the field, which further constrains the intrinsic degrees of freedom.
Inspired by the example of turbulent flows above, we propose the following two test cases. The first case has a geometrical constraint. We use GANs to generate circles, each of which is represented by 100 points. As each point is determined by its - and -coordinates, the apparent degree of freedom of each sample (a circle) is 200 for GANs. However, intrinsically each sample has only one degree of freedom, i.e., its radius. The simple example is convenient to visualize and analyze, but it has a strong constraint that is representative of those in many physical applications. The second case has a differential constraint stemming from the divergence-free condition of the velocity field. The training samples consist of velocity fields on a mesh generated from a complex potential parameterized by three parameters. Again, the underlying physics forms a strong constraint, leading to a dramatic difference between the apparent and intrinsic degrees of freedom.
3.1 Geometrical constraint: generating circles
Samples for the first case are two-dimensional circles on the – plane with different radii. The main objectives include (i) illustrating the merit of incorporating known constraints into the generator and (ii) demonstrating the effects of incorporating approximate constraints as proposed in Eq. (10).
3.1.1 Problem description
The training samples are the circles with different radii. The parametric constrained function of circles is given as
where is the angle in polar coordinate, and is the radius. denotes the points on the circle in Cartesian coordinates, where is the index of point of a generated sample. The training samples consist of circles whose radius is drawn from a uniform distribution between 0.4 and 0.8. Some examples are shown in Fig. 3. The goal is to first train GANs with the training dataset of circles and then use the trained GANs to generate samples. We train standard GANs, standard cGANs, and constrained cGANs proposed here with the same settings as presented in Table 1. The approximate constraint is defined as follow:
where indicate Euclidean norm, denotes the specified radius, and denotes the tolerance of the specified constraint. The approximate constraint is schematically illustrated in Fig. 4.
|Loss function for baseline GAN||WGAN-GP|
|Dimension of latent space|
|Activation functions||LeakyReLU and tanh|
|Epochs||up to 200|
Notes: (a) LeakyReLU maas2013rectifier activation function is used for both the generator and discriminator except the last layer. The slope of the LeakyReLU activation function is 0.2 for negative inputs.
(b) The tanh activate function is used in the last layer of the generator to normalize the output of the generator within the range .
In order to evaluate and compare the performances of different GANs, three metrics are used to evaluate the generated samples. The first metric is the magnitude of the original loss function, which shows the convergence of different GANs in training. The second metric is the deviation of generated samples from the circle with an optimal radius obtained from least-square fitting, which is defined as follow:
where indicates absolute value, is the number of points used to represent the circle, and is the radius obtained through least-square fitting of all 100 points in the sample. To assess the overall deviation of the generated samples, we compute the sample mean by averaging that of all generates samples, where indicates ensemble averaging. For standard cGANs and constrained cGANs,
can be regarded as the estimation of auxiliary variable, i.e., the radius in this case. There will be a bias between the estimated radius and the given radius , and we further define the bias as another metric based on the deviation from the specified radius:
Sampling averaging is performed similarly as above to obtain .
3.1.2 Results and discussions
We first set to test the performance of the baseline GANs with exact constraint, including both the standard GANs and standard cGANs. Specifically, 5000 circles are generated by using standard GANs, standard cGANs and constrained cGANs, with some generated examples shown in Fig. 5. By training on perfect circles with different radius, it can be seen in Fig. 5a that even standard GANs can generate circle-like samples, although noticeable noises exist in the generated samples.
Noted that standard GANs can only generate samples randomly, i.e., the radius of the generated sample cannot be controlled in this simple test case. In order to generated samples with specified conditions, cGANs are also studied in this work. The solid circles represent the truth with the corresponding radius as the condition label in Figs. 5b and 5c. It can be seen that the constrained cGAN generates samples that have much better agreement with the truth as compared to the standard cGANs. Moreover, although the noise level of standard cGANs results is slightly lower than that in standard GANs, a clear deviation from perfect circle can still be seen in the generated samples from standard cGAN in Fig.5b. In contrast, the generated samples from the constrained cGANs demonstrate a nearly perfect agreement with the truth, confirming the merit of introducing constraints to GANs.
Furthermore, the evaluation metrics defined above are examined to provide a more quantitative assessment of the constrained versus the baseline GANs. The results are presented in Fig.6. It can be seen in Figs. 6a and 6b that the loss functions of constrained cGAN converge faster, at approximately 125 epochs, while the loss functions of standard GANs and standard cGANs needed about 200 epochs to converge. Also note that the smaller noise level of the generator loss in Fig. 6a suggests that the generated samples are less noisy. One epoch indicates that all training data has been used once, as each step of gradient descent in the training (weight optimization) uses only a small fraction (referred to as “minibatch”) of the training data. Larger epoch numbers correspond to longer training time. It can be seen that constrained GAN and cGAN have faster convergence. This is because incorporating constraints generally restricts the solution space of the optimization problem onto a lower-dimensional manifold.
However, the loss functions do not provide a direct quantification about the quality of generated samples. In order to better compare the quality of generated samples, comparisons of deviation and bias are presented in Figs. 7. It can be seen in Fig. 7a that the deviation (a metric that quantifies the noise level of a generated sample) is slightly smaller for standard cGAN than standard GAN, and the deviation is about one order of magnitude smaller for the constrained cGAN. For the generated samples from cGANs, the biases as defined in Eq. 15 between the standard and constrained cGANs are compared in Fig. 7b. It can be seen that the biases of the generated sample from constrained cGAN are also noticeably smaller, indicating that the radius (the condition label) is better conformed to in the generated samples when the constraint is imposed.
The effects of imposing approximate constraints (see Eq. (13) and Fig. 4) are evaluated by setting positive tolerance parameters . We test two different tolerance values, and , to show that introducing approximate constraint can improve the training performance of GANs. As can be seen in Fig. 8, the generated samples better represent circles than the standard cGAN results in Fig. 5b. The improvements are observed in both tolerance values. In addition, noises in the generated samples increase as the tolerance parameter increase, which can be seen by comparing Fig. 5c (corresponding to ), Fig. 8a (), and Fig. 8c (). Such a trend is expected as the constraints are imposed less accurately with increasing tolerance . Indeed, the standard cGANs (Fig. 5b) corresponds to .
Quantitative metrics are further examined to illustrate the effect of adding approximate constraints, with the results presented in Fig. 9. It can be seen in Figs. 9a and 9b that the loss functions for both the generator and the discriminator demonstrate similar behaviors regardless of whether exact constraint () or approximate constraints () are imposed. This observation suggests that both types of constraints lead to similar improvements in terms of the convergence when training GANs. However, the quality of generated samples become more similar to the standard cGAN results as shown in Figs. 9c and 9d as the tolerance increases.
We further examine the convergence of cGANs with approximate constraints by analyzing samples during different stages (epochs) during the training, which are shown in Fig. 10. Recall that the approximate constraint is only active when generated points are located outside the shaded annulus (see Fig. 4). The generated points scatter randomly at the beginning of training (at epoch 1) as shown in Fig. 10a. During this stage of the training, the approximate constraint term plays an important role as most points fall outside of the shaded annulus. The effect of the approximate constraint term then gradually decay as the training proceeds, with more points falling within the shaded region. At epoch 21, the points start to concentrate toward the region within the annulus region (Fig. 10b). The generated points further concentrate more within the shaded annulus region at epoch 41 (Fig.10c). Eventually, almost all points concentrate within the annulus region, and most of them stick close to the circles of specified radii as indicated by black lines (Fig.10d). At this stage, most generated points fall within the shaded region and the approximate constraint becomes negligible. As a result, the standard cGAN loss becomes dominant again and contributes further optimization of generated points within the shaded region.
3.2 Differential constraint: generating potential flow fields
In this test case, we aim to use GANs to generate velocity fields of fluid flows. This test case is motivated by current applications of GANs to generate inflow boundary conditions for large eddy simulations king2018deep . The quality of inflow turbulence generation plays a critical role on influencing the performance of the LES. In fact, they are probably even more important than the sub-grid scale models. GANs also have potential applications to hybrid LES/RANS simulations, where turbulence fluctuations must be generated in the grey area. A grey area refers to an LES region that is located immediately downstream of a RANS region and does not have sufficient instability to become turbulence. In all such applications, being able to satisfy divergence-free condition in the generated velocity fields is the first metric. Further requirements such as statistical constraints are studied in a companion paper wu2019enforcing .
In this work we use samples of potential flow velocity fields as training data. The mass conservation (divergence-free condition for the velocity fields) is thus built into the data. In the constrained GANs, the approximate constraint based on the divergence-free condition is embedded in the generator as described in Section 2.4. The trained GANs, including both the standard GANs and the constrained GANs, are then evaluated by examining their generated samples in terms of (i) the degree to which they conform to the divergence-free condition and (ii) the smoothness of the generated velocity fields (because the training samples are smooth velocity fields).
3.2.1 Problem description
Two-dimensional velocity field training samples are generated by using the following analytic function currie2002fundamental :
where . The complex potential in Eq. (16) above is a superposition of a uniform flow potential (first term on the RHS) and a source flow potential (the second term on the RHS). The complex potential is parameterized by four parameters: determines the magnitude of the uniform flow velocity, determines the direction of the uniform flow, determines the strength of the source, and dictates the location of source. Without loss of generality, can be specified as the origin , so is no longer a free parameter hereafter. By sampling the parameter vector , we can obtain various velocity fields. Specifically, denoting the real part of as (referred to as the velocity potential), the velocity fields can be obtained by taking the gradient of , i.e., .
The parameters , and are sampled from three independent Gaussian distributions as follows:
denoting the mean and variance of the Gaussian distribution. Each parameter combinationcorresponds to a unique velocity field. The generated velocity are represented on a grid of mesh points in the domain of in the horizontal (-) and vertical (-) directions. The origin of the presented domain is co-located with the source as specified above, i.e., . The discretized velocity fields on the mesh are used as input to the GANs for training. We emphasize that the GANs are not aware of the parameterization of the velocity. All they see are the data in its high-dimensional representation, i.e., velocity field on the mesh. We used 20000 velocity fields as training samples of the GANs, from which four random yet representative samples are shown in Fig. 11.
The approximate constraint ensures the velocity divergence-free condition up to an accuracy , which is enforced via the following penalty term based on Eq. 10:
where controls the tolerance of the divergence-free condition, denotes the Frobenius norm of the velocity divergence field in the entire computational domain. Other set up and parameters used for the GAN is presented in Table 2. Convolutional and deconvolutional neural networks with four layers are used for the discriminator and generator, respectively. The detailed architectures of the two networks are presented in Table 3.
|Loss function for baseline GAN||WGAN-GP|
|Dimension of latent space|
|Learning rate ()||0.005, 0.0005|
Notes: (a) The slope of the LeakyReLU activation function is 0.2 for negative inputs.
|Layer 1||Deconvolutional, C:64, KS: , BN, LeakyReLU||Convolutional, C:64, KS: , BN, LeakyReLU|
|Layer 2||Deconvolutional, C:64, KS: , BN, LeakyReLU||Convolutional, C:64, KS:, BN, LeakyReLU|
|Layer 3||Deconvolutional, C:64, KS:, BN, LeakyReLU||Convolutional, C:64, KS:, BN, LeakyReLU|
|Layer 4||Deconvolutional, C:3, KS:||Convolutional, C:1, KS:|
Detailed network architecture for the generator and discriminator. Acronyms used: C, number of channels; KS, kernel size; BN, batch normalization.
The samples of velocity fields generated by the trained generator are evaluated by the following two metrics:
the degree to which they conform to these constraints, which is quantified by the norm of the velocity divergence in the field, i.e.,
A scalar metric is obtained above by taking the Frobenius norm of the velocity divergence in the entire domain, i.e., by summing the divergence of all cells. As in the geometric constraint case, we compute the sample mean by averaging that of all generates samples, where indicates ensemble averaging. Field contours are also presented to show the spatial distribution of the divergence.
the non-smoothness of a generated velocity field as measured by the gradient of the velocity magnitude:
where is the velocity magnitude, and the Frobenius is similarly defined as above. Similarly, the mean of the above quantity is used to assess the quality of the generated samples.
3.2.2 Results and discussion
The generated velocity fields by the standard GAN and constrained GAN are presented in Fig. 12 at different epochs of training. A learning rate of and a tolerance value of are used. It can be observed that the quality of the velocity fields improves as the training proceeds for both the standard GAN and the constrained GAN. This is evident from the improved smoothness of the velocity magnitude contours downstream of the source. From the visual inspection, one can hardly see any qualitative differences among the samples generated by the standard GAN and the constrained GAN. It is encouraging to see that both seem to generate “realistic” potential flow velocity fields that are qualitatively similar to the training samples (Fig 11a). It is worth emphasizing again that the training data consists of flow fields (velocity and pressure) on a discretized mesh of grid points, which apparently reside in a high dimensional space (, where the number 2 corresponds to the two velocity components). Recall the fact that all fields are controlled by three parameters and are generated through the potential in Eq. (16) is transparent to the GAN, which only sees the mesh-based field instead. From this perspective, it is encouraging that the GANs (either standard or constrained) learned the correct distribution of the data (i.e., a superposition of a uniform flow and a source flow), at least approximately. This observation supports the theoretical statement that the generator is able to replicate all statistics in the training data if global optimum is achieved in the training goodfellow2014generative .
The generated velocity fields are further analyzed by computing the means of the velocity divergence and the non-smoothness in order to assess the generated samples more quantitatively. The results are presented in Fig. 12(b). The quantitative assessment suggests that the constrained GAN outperforms the standard GAN. For both standard GAN and constrained GAN, the velocity divergence decreases as the training proceeds (i.e., as the epoch number increases). However, even after convergence (at about 100 epochs) the standard GAN sample still have divergence almost one order of magnitude larger than that of the training samples (see Fig. 12(b)a). In contrast, the constrained GAN reaches the same levels of divergence as that of the training samples. This is expected since the divergence-free constraint is embedded in the generator loss function. Nevertheless, an unexpected incidental consequence is that the constrained GAN also generated smoother velocity fields than the standard GAN, although still not as smooth as the training velocity fields (see Fig. 12(b)). Finally, in these tests we use two learning rates, and and the results are found to be similar with both settings after convergence. Therefore, the observations above are insensitive to choice of the algorithmic parameter. However, the well-tuned learning rate () does lead to faster convergence for both the standard GAN and the constrained GAN.
In summary, Figs. 12 and 13 suggest while both standard GAN and constrained GAN can capture the overall distribtuion of training data and training convergence is robust to the tuning parameter (learning rate ) for both of GANs. On the other hand, the constrained GAN clearly outperforms standard GAN in the following aepects:
the constrained GAN conforms better to the physical constraint (thanks to the embedded constraint in generator loss function),
the constrained GAN generated smoother (and thus more realistic) velocity fields, even though the smoothness is not explicitly enforced in the loss function, and
the constrained GAN has faster convergence than the standard GAN.
The physics-constrained GAN as proposed in this work aims to satisfy two objectives simultaneously with the two term in the loss function in Eq. (8), i.e., to generate sample that (i) conform to the training data distribution and (ii) satisfy the physical constraint. The relative priority is determined by the relative weight of the terms as controlled by parameter . A suitable parameter must be chosen or satisfy both objectives as outlined above. Otherwise, one term may completely dominate the other and generated sample would either become irrelevant or violate the physical constraints. We found a good rule of thumb is to choose so that the two terms in the loss function (standard GAN loss and physical constraint loss) are of comparable magnitude, which should be dynamically monitored. We have tested three different choices of weight parameters, , , and . The mean divergence at different epochs of training is presented in Fig. 15. The generated samples have a divergence comparable to the training data with the optimal weight . It can be seen that the divergence-free constraint is still somewhat satisfied with a smaller weight . In this case the loss function is dominated by the standard GAN loss and the physical constraint is not effectively enforced. As expected, samples generated by the standard GAN (corresponding to ) depart from the divergence-free condition even further. On the other hand, with a large weight of , the divergence is an order of magnitude smaller even than the training sample. However, generated velocity fields (not shown here) look dramatically different from a potential velocity field, which is clearly undesirable. Therefore, it is essential to choose the weight parameter such that the two terms in the loss function are balanced. In this way, the generated samples will both conform to the data distribution and satisfy the physical constraint.
A similar parametric study is performed on the constraint tolerance with three values (, , and ) investigated. The mean divergence of the velocities corresponding to these parameter choices are presented in Fig. 16. In this case the standard GAN corresponds to an infinitely large tolerance . Both and lead to very well satisfied divergence-free condition. At , the divergence-free condition is not as well satisfied, while the standard GANs generate fields with even larger divergence. This is similar to what was observed for the weight parameter, where a larger weight parameter places more emphasis on the physical constraint, except that in this case it is not possible to dominate the standard GAN loss term with the smallest tolerance . Overall, it can be concluded from Fig. 16 that the approximate constraint can improve the training of GANs and make the generated samples conform to the specified physical constraint.
Significant fluctuations are observed in Fig. 16 in the time history of divergence for the case of . This indicates that the training process is unstable. This can be explained by the discontinuous nature of the loss function Eq. (18): it is zero if the divergence magnitude and otherwise it is . In other words, it is active only when the divergence exceeds . Given that the mean divergence of the training velocity fields is 0.64 (indicated as horizontal black line in Fig. 16) due to the numerical discretization error, the tolerance value is a rather soft constraint. Consequently, the loss function is active very intermittently, which causes discontinuous changes of the loss function and its derivatives and thus it leads to instability. In contrast, in other cases, e.g., (exact constraint), (approximate constraint), or (the standard GAN), the constraint is either almost always active or almost always inactive, and thus they are free from such intermittency-induced instabilities. It can be seen that the regularization terms in loss functions can influence the training of neural networks in unexpected ways, and further research on enforcing constraints in GANs is warranted.
In this work, we proposed a general approach of embedding constraints in GANs for emulating physical systems. The physical constraints are embedded to the loss function of the generator to help the weaker party in the two-players game, which help the training to achieve equilibrium faster. This is motivated by the observation that the constraints reduce the effective dimension of the search space for the weight optimization and thus accelerate the training convergence. Moreover, we extend the proposed framework of constrained GANs to incorporate approximate constraints. This is motivated by the fact that many physical constraints are not known exactly in practical applications. Two simple yet representative test cases are used to demonstrate the merits of physics-constrained GANs: (i) generating circle on a plane, which features geometrical constraints and (ii) generating potential flow velocity fields, which have differential constraints. We show that enforcing constraints in the generator both accelerates the convergence of GANs training and improves the quality of generated samples. The proposed method is generally applicable to different physical systems and will facilitate the application of using GANs to emulate complex physical systems.
The computational resources used for this project were provided by the Advanced Research Computing (ARC) of Virginia Tech, which is gratefully acknowledged.
N. M. Nasrabadi, Pattern recognition and machine learning, Journal of Electronic Imaging 16 (4) (2007) 049901.
- (3) Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436.
- (4) I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, Vol. 1, MIT press Cambridge, 2016.
- (5) R. M. Neal, Bayesian learning for neural networks, Vol. 118, Springer Science & Business Media, 2012.
- (6) J.-X. Wang, J.-L. Wu, H. Xiao, Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data, Physical Review Fluids 2 (3) (2017) 034603.
- (7) J.-L. Wu, H. Xiao, E. Paterson, Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework, Physical Review Fluids 3 (7) (2018) 074602.
- (8) J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modelling using deep neural networks with embedded invariance, Journal of Fluid Mechanics 807 (2016) 155–166.
- (9) N. Geneva, N. Zabaras, Quantifying model form uncertainty in Reynolds-averaged turbulence models with bayesian deep neural networks, Journal of Computational Physics 383 (2019) 125–147.
- (10) M. Ma, J. Lu, G. Tryggvason, Using statistical learning to close two-fluid multiphase flow equations for a simple bubbly system, Physics of Fluids 27 (9) (2015) 092101.
- (11) M. Ma, J. Lu, G. Tryggvason, Using statistical learning to close two-fluid multiphase flow equations for bubbly flows in vertical channels, International Journal of Multiphase Flow 85 (2016) 336–347.
- (12) J.-L. Wu, X. Yin, H. Xiao, Seeing permeability from images: fast prediction with convolutional neural networks, Science Bulletin 63 (18) (2018) 1215–1222.
- (13) S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences 113 (15) (2016) 3932–3937.
- (14) S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances 3 (4) (2017) e1602614.
- (15) Z. Long, Y. Lu, X. Ma, B. Dong, PDE-Net: Learning PDEs from data, arXiv preprint arXiv:1710.09668.
- (16) Z. Long, Y. Lu, B. Dong, PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network, Journal of Computational Physics (2019) 108925.
- (17) J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics 375 (2018) 1339–1364.
- (18) J. He, J. Xu, MgNet: A unified framework of multigrid and convolutional neural network, Science China Mathematics (2019) 1–24.
- (19) J. Berg, K. Nyström, A unified deep artificial neural network approach to partial differential equations in complex geometries, Neurocomputing 317 (2018) 28–41.
- (20) R. K. Tripathy, I. Bilionis, Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification, Journal of Computational Physics 375 (2018) 565–588.
- (21) M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707.
- (22) M. Raissi, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Deep learning of vortex-induced vibrations, Journal of Fluid Mechanics 861 (2019) 119–137.
- (23) M. Raissi, P. Perdikaris, G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236.
- (24) D. Zhang, L. Lu, L. Guo, G. E. Karniadakis, Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems, Journal of Computational Physics 397 (2019) 108850.
- (25) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
- (26) L. Mosser, O. Dubrule, M. J. Blunt, Reconstruction of three-dimensional porous media using generative adversarial neural networks, Physical Review E 96 (4) (2017) 043309.
- (27) L. Mosser, O. Dubrule, M. J. Blunt, Stochastic reconstruction of an oolitic limestone by generative adversarial networks, Transport in Porous Media 125 (1) (2018) 81–103.
- (28) R. King, P. Graf, M. Chertkov, Creating turbulent flow realizations with generative adversarial networks, in: APS Meeting Abstracts, 2017.
Y. Xie, E. Franz, M. Chu, N. Thuerey, tempoGAN: A temporally coherent, volumetric GAN for super-resolution fluid flow, ACM Transactions on Graphics (TOG) 37 (4) (2018) 95.
P. Stinis, T. Hagge, A. M. Tartakovsky, E. Yeung, Enforcing constraints for interpolation and extrapolation in generative adversarial networks, Journal of Computational Physics 397 (2019) 108844.
- (31) A. B. Farimani, J. Gomes, V. S. Pande, Deep learning the physics of transport phenomena, arXiv preprint arXiv:1709.02432.
- (32) L. Yang, D. Zhang, G. E. Karniadakis, Physics-informed generative adversarial networks for stochastic differential equations, arXiv preprint arXiv:1811.02033.
- (33) M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv preprint arXiv:1701.07875.
- (34) A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434.
- (35) M. Lin, Softmax GAN, arXiv preprint arXiv:1704.06191.
X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, S. P. Smolley, Least squares generative adversarial networks, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 2813–2821.
- (37) S. Nowozin, B. Cseke, R. Tomioka, f-GAN: Training generative neural samplers using variational divergence minimization, in: Advances in Neural Information Processing Systems, 2016, pp. 271–279.
- (38) I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville, Improved training of Wasserstein GANs, in: Advances in Neural Information Processing Systems, 2017, pp. 5767–5777.
- (39) S. B. Pope, Turbulent Flows, Cambridge University Press, Cambridge, 2000.
- (40) J.-L. Wu, K. Kashinath, A. Albert, D. Chirila, Prabhat, H. Xiao, Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems, arXiv preprint arXiv:1905.06841.
- (41) M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv preprint arXiv:1411.1784.
- (42) A. Karpatne, W. Watkins, J. Read, V. Kumar, Physics-guided neural networks (PGNN): An application in lake temperature modeling, arXiv preprint arXiv:1710.11431.
- (43) M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv:1603.04467.
- (44) A. Kristiadi, A collection of generative models with GAN, https://github.com/wiseodd/generative-models/tree/master/GAN (2019).
- (45) Y. Zeng, J.-L. Wu, H. Xiao, Source codes for enforcing deterministic constraints on generative adversarial networks for emulating physical systems, https://github.com/zengyang7/ConstrainedGANs.git (2019).
- (46) A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: Proceedings of ICML, Vol. 30, 2013, p. 3.
- (47) R. King, O. Hennigh, A. Mohan, M. Chertkov, From deep to physics-informed learning of turbulence: Diagnostics, arXiv preprint arXiv:1810.07785.
- (48) I. G. Currie, Fundamental Mechanics of Fluids, CRC Press, 2002.