Informing decision-makers about uncertainties in local climate impacts requires ensemble models. Ensemble models solve the climate model for a distribution of parameters and initial conditions to generate a distribution of local climate impacts (Gneiting_2005). The physics of most oceanic processes can be well modeled at high-resolutions, but generating large ensembles is computationally too expensive: High-resolution ocean models resolve the ocean at horizontal resolution and require multiple hours or days per run on a supercomputer (Fuhrer_2018)
. Recent works are leveraging physics-informed deep learning to build “surrogate models“, i.e., computationally-lightweight models that interpolate expensive simulations of ocean, climate, or weather models(Rasp_2018; Brenowitz_2020; Yuval_2020b; Kurth_2018; Runge_2019). The lightweight models achieve a accelerate the simulations on the order of -times (Yuval_2020b; Rackauckas_2020). Building lightweight surrogate models could enable the computation of large ensembles.
The incorporation of domain knowledge from the physical sciences into deep learning has recently achieved significant success (Raissi_2019; Brunton_2019; Rasp_2018).Within physics-informed deep learning one could adapt the neural network architecture to incorporate physics as: inputs (Reichstein_2019), training loss (Raissi_2019), the learned representation (Lusch_2018; Greydanus_2019; Bau_2020), hard output constraints (Mohan_2020), or evaluation function (Lutjens_2021; Lesort_2019). Alternatively, one could embed the neural network in differential equations (Rackauckas_2020), for example, as: parameters (Garcia_2006; Raissi_2019), dynamics (Chen_2018), residual (Karpatne_2017; Yuval_2021), differential operator (Raissi_2018; Long_2019), or solution (Raissi_2019). We embed a neural network architecture in the solution which will enable fast propagation of parameter uncertainties and sensitivity analysis.
While most work in physics-informed deep learning has focused on deterministic methods, recent methods explore the expansion to stochastic differential equations (Zhang_2019; Dandekar_2021; Yang_2020a; Zhu_2019; Yang_2020b). In particular, Zhang_2019
achieves lightweight surrogate models for parameter estimation and uncertainty propagation by combining physics-informed neural networks(Raissi_2019) with arbitrary polynomial chaos (Wan_2006). We use the simpler polynomial chaos expansion (Smith_2013) instead of arbitrary polynomial chaos expansion, and focus on the task of uncertainty propagation in (Zhang_2019). Further, we are the first in applying the combination of polynomial chaos and neural networks to the stochastic local advection-diffusion equation (ADE). Methods of uncertainty quantification have extensively been demonstrated on the local ADE (Smith_2013); the advantage of neural networks is the ability to estimate PCE coefficients in high-dimensional spaces. The local ADE, also called horizontally averaged Boussinesq equation, is more challenging than the 1D stochastic diffusion equation from (Zhang_2019) and illustrates the application to ocean modeling.
In summary our work contributes, PCE-PINNs, the first method for fast uncertainty propagation of parameter uncertainties with physics-informed neural networks in ocean modeling.
We are defining the initial value problem of solving the stochastic partial differential equation,
with spatial domain, , temporal domain, , random space, , domain boundary, , nonlinear operator, , and Dirichlet boundary conditions, .
2.1 Defining the local advection-diffusion equation
We are given the local advection-diffusion equation which models the temperature distribution in a vertical ocean column over time,
with height, , time, , source, , noise, , temperature, , stochastic diffusivity, , constant vertical velocity, .
We assume that the distribution over the diffusivity is known, for example, through data assimilation or Bayesian parameter estimation. Specifically, the diffusivity is assumed to follow an exponential Gaussian process (GP) with . The GP, , is defined by mean, , correlation length,
, variance,, exponent, , and a covariance kernel that is similar to the non-smooth Ornstein-Uhlenbeck kernel:
2.2 Polynomial chaos expansion in neural networks
In practice, computing ensembles of differential equations such as in equation Equation 1 for a distribution of parameter inputs is often computationally prohibitive. Hence, we aim to learn a copy, or fast surrogate model, of the differential equation solver, , assuming a known parameter distribution and a set of ground truth solutions, , from the solver.
The polynomial chaos expansion (PCE) approximates arbitrary stochastic functions by a linear combination of polynomials (Smith_2013). The polynomials capture the stochasticity by applying a nonlinear function to typically simple distributions and the coefficients capture the spatio-temporal dependencies (Smith_2013). PCE has been widely adopted in computational fluid dynamics (CfD) community, because it offers fast inference time, analytical statistical summaries, such as , and the theoretical guarantees of polynomials (Smith_2013). However, the computation of PCE coefficients, , is analytically complex, because the computation differs among problems, and computationally expensive, because the computation involves integrals over the random space (Smith_2013). Hence, we leverage neural networks to learn the PCE coefficients, , directly from observations of the solution.
The polynomial chaos expansion then approximates the solution as,
with the NN-based PCE coefficients,
, the vector of polynomial degrees or multi-index,with , the set of multi-indices, , the maximum polynomial degree, , and the set of polynomials, . The polynomials are defined by a set of multivariate orthogonal Gaussian-Hermite polynomials,
with the one-term (monic) polynomials, of polynomial degree, . We are choosing the random vector of each stochastic dimension, , to be a Gaussian, and use the associated probabilists’ Hermite polynomials, . The polynomial degrees are given by the total-degree multi-index set, . For example, for . The number of terms, , is given by, .
The neural network (NN) then jointly approximates all PCE coefficients,
The NN is trained to approximate PCE coefficients while only using the limited number of measurements,
, as target data. The mean-squared error (MSE) loss function for episode,, and batch size, , is defined as:
where the realizations of random vectors, are shared between the target and approximated solution. The batch size is chosen to fit one solution sample, with t-grid size, , and z-grid size, .
can approximately match the mean (line) and standard deviation (shade) of the target solution inFig. 1(a) with . Importantly the approximated standard deviation also captures the growing trend towards the center location () and growing time, .
Figure 2 shows that the PCE-PINNs in Fig. 1(b) can successfully approximate the mean and standard deviation of the target solution Fig. 1(a). Importantly the explicit formulation as polynomial chaos expansion allows us to compute the mean and standard deviation without any sampling as a function of the PCE coefficients, e.g., . We can note that the PCE-PINN-approximated standard deviation captures the growing trend towards the center location () and increasing time (). Quantitative analysis shows that the mean error is, as a sum over the full spatio-temporal domain, low .
We observe that the PCE-PINNs slightly overestimate the uncertainty of the initial state (blue), have a marginal positive bias towards the right boundary (), and have lower curvature during the initial steps (
). Future work will explore stronger constraints on satisfying the underlying physics equations and explore a broader choice of neural networks hyperparameters to further reduce the error.
We used a -layer
-unit fully-connected neural network with ReLu activation. The network was trained with the ADAM optimizer with learning rate,, and for epochs. The target data was generated with temporal and grid points and samples of the solved differential equation. The maximum polynomial degree was chosen to be, , s.t. the number of PCE coefficients, .
Leveraging neural network-based surrogate models can not only reduce computational complexity but also storage complexity. Our network contains weights which occupy as floats .
4 Discussion and future work
We have demonstrated a novel technique for fast uncertainty propagation with physics-informed neural networks on the local advection-diffusion equation. The PCE-PINNs uses neural networks to learn the spatio-temporal coefficients of the polynomial chaos expansion, reducing the analytical and computational complexity of previous methods. Our method learned a lightweight surrogate model of the local advection-diffusion equation and successfully quantified the output uncertainties, given known parameter uncertainties.
We note that our results show room for improvement. Future work will explore stronger constraints on satisfying the physical laws in Equation 1, e.g., via physics-based regularization terms (Raissi_2018) or hard physics-constraints (Beucler_2021)
. Further, computational resources were limited during this experiment and future work will further optimize the choice of hyperparameters for the neural network. Lastly, the proposed approach requires computation of a training dataset of solved differential equation for a set of parameter samples which can quickly become computationally expensive. Future work, will explore self-supervised learning approaches to enable learned surrogate models without the use of expensive training data.
The authors greatly appreciate the discussions with Chris Hill, Hannah Munguia-Flores, Brandon Leshchinskiy, Nicholas Mehrle, Yanni Yuval, Paul O’Gorman, and Youssef Marzouk.
Research was sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.