Training neural networks augmented with additional physical information has been shown to improve their generalization capabilities, particularly when predicting physical processes. In the Physics Informed Neural Network (PINN) framework (Karpatne et al., 2017, 2018; Tartakovsky et al., 2018; Raissi et al., 2019; Wang et al., 2020), the neural network prediction is defined to be an explicit function of space and time . Furthermore, calculations of respective derivatives, such as and
, are required for formulating the loss function. However, when the available training data is concentrated on a single locationor time , the approximation of the derivatives and in current techniques deteriorates due to (a) insufficient information provided by the data and (b) the lack of structural explainability of the framework itself. To address these issues from a structural point of view, several works have been conducted in the literature recently. One architecture, namely the Distributed Spatiotemporal Artificial Neural Network (DISTANA, Karlbauer et al., 2019), uses translational invariant Prediction Kernels (PKs) and Transition Kernels (TKs) to process the temporal and spatial correlation of the data, respectively. Another method, called the Universal Differential Equation method (UDE, Rackauckas et al., 2020)
, combines Convolutional Neural Networks(CNNs, LeCun et al., 1999)
with Neural Ordinary Differential Equations(NODEs, Chen et al., 2018), for learning spatiotemporal data. Despite promising results shown by these methods, they still suffer from unreliable flux handling (i.e. the physical fluxes are not guaranteed to be conservative). Consequently, the methods mentioned above lack means to properly treat different types of boundary conditions.
To handle fluxes more robustly and improve generalization within a physical domain, we propose a Finite Volume Neural Network (FINN). The FINN method is a hybrid model, which capitalizes on the structural knowledge of the well-known Finite Volume Method (FVM, Moukalled et al., 2016), and the flexibility as well as the learning abilities of Artificial Neural Networks (ANNs), more specifically NODEs. As a consequence, the FINN structure can properly treat different types of boundary conditions and ensures conservation of the quantities of interest. Moreover, we show that FINN is able to reconstruct the full field solution (for all and ) even when trained with only partial information (e.g. at a single point or ). Additionally, the structure of FINN facilitates learning and extracting constitutive relationships and/or reaction terms, and, consequently, shows exceptional generalization capabilities and develops explainable knowledge structures.
In this work, we focus on modeling spatiotemporal physical processes, namely processes that scientists try to model with Partial Differential Equations (PDEs), such as diffusion-type problems. They can be generally written mathematically as follows:
where is the quantity of interest, is time, is the diffusion coefficient and is the source/sink term. Usually, the FVM discretizes Equation 1 implicitly in space and explicitly in time, leading to a simplified definition:
where denotes at control volume and time step . In other words, the time derivative of depends on the current value of , the values of at the neighboring control volumes, and time. For brevity, we drop the time index in later definitions.
In FINN, we introduce Flux Kernels , which take the input of , , and to approximate the divergence part (first term on the right hand side) in Equation 1 for each control volume :
The Flux Kernels consist of subkernels that calculate fluxes at the boundary surfaces between control volume and its neighboring control volumes (see Figure 1), being learned by and , which are subcomponents of the function with parameters . The function in each is equivalent to a linear layer that takes and one of its neighbors (i.e. or ) as inputs and learns the numerical FVM stencil with its weights that should amount to , which corresponds to and (i.e. simple difference between neighboring control volumes) in ideal one-dimensional diffusion problems. If the diffusion coefficient depends on according to a hidden constitutive relationship, the function in each learns and approximates this function . Otherwise, will be only a scalar value , which can also be set as a learnable parameter. Next, the output of is multiplied with the output of to obtain the flux approximation at each boundary. When the fluxes at all boundary surfaces are integrated in each control volume , the summation of the numerical stencil will lead to the classical one-dimensional numerical Laplacian with corresponding to if Equation 1 is true.
This structure of the Flux Kernel enables straightforward handling of different types of boundary conditions. With a Dirichlet boundary condition , we can set either or at the corresponding domain boundary. With a Neumann boundary condition , we can easily set the output of at the corresponding domain boundary to be equal to . With a Cauchy boundary condition, we can calculate the derivative approximation at the corresponding domain boundary and set it as either or .
The State Kernels then take the output of the Flux Kernels and approximate while also learning the source/sink term as a function of (whenever necessary), which is also learnable by the State Kernels. Formally, each State Kernel can be written as an approximation of the time derivative in Equation 1 for each control volume :
where is parameterized by . The outputs of State Kernels are then integrated by an ODE solver to solve for , which will be used recursively for calculation of the subsequent time steps. The benefits of using an ODE solver are twofold: (a) it allows for adaptive time stepping, which in turn leads to better numerical stability in explicit schemes, and (b) it enables handling of unevenly spaced time series, which is very common in real observation data.
One of the benefits of State Kernels is to enable separate calculations for different quantities of interest, while the divergence (flux) can be calculated based on the same variable. This ensures that each quantity of interest follows its own conservation law. In short, FINN consists of Flux Kernels that handle the spatial correlation, and State Kernels that handle the temporal correlation of the data.
For demonstration purposes, we choose an application with a subsurface contaminant transport problem. We assess the performance of FINN not only using synthetic simulation data, but also real experimental data. The contaminant transport is characterized by the non-linear diffusion-sorption equation in a fluid-filled homogeneous porous medium:
where denotes the concentration of trichloroethylene (TCE) dissolved in the fluid, denotes the effective TCE diffusion coefficient, and denotes the retardation factor (representing sorption), which is a function of . Equation 5 is subject to a Dirichlet boundary condition at and a Cauchy boundary condition at (i.e. the top and bottom ends of the field) and an initial condition . Using the definition of retardation factors, we can also calculate the total TCE concentration (both in the fluid and adsorbed in the solid)
where is the porosity of the medium (i.e. the core samples). More detailed information on the experiment and its numerical simulation can be found elsewhere (Nowak & Guthke, 2016).
4 Results and Discussion
As the first step, we train and test FINN using numerically generated synthetic datasets. Both train and test datasets are discretized into control volumes and time steps. We train FINN using the whole spatial domain, with time steps (i.e. days) of the train dataset. Here, FINN receives only the initial condition, i.e. initial values of and , along with the Dirichlet boundary condition value at the top boundary. The bottom boundary is subject to a Cauchy boundary condition, and therefore is solution dependent. FINN is then trained in a closed loop system, using predicted values of and at time step as input for the calculation at time step .
For this synthetic data application, we set and to be learnable. Additionally, for the calculation of , we set to be a feedforward neural network that approximates in Equation 5, allowing us to extract information about the learned retardation factor as a function of the contaminant concentration
. This neural network is constructed with 3 hidden layers, each containing 15 hidden nodes. Each hidden layer uses hyperbolic tangent as the activation function, and the output layer uses the sigmoid activation function, multiplied with a learnable scaling factor to ensure that the approximation ofis strictly positive.
To test the generalization capability, we use the trained network to extrapolate until time step ( days). Additionally, we test the trained FINN prediction against a completely unseen test dataset obtained at different boundary conditions. The Dirichlet boundary condition values at the top boundary are kg/m3 and kg/m3 for the train and test dataset, respectively. We also compare the train and test Mean Squared Error (MSE) value with other known methods, such as TCN (Kalchbrenner et al., 2016), ConvLSTM (Shi et al., 2015), and DISTANA (Karlbauer et al., 2019).
|Method||Training||Extrapolated training||Test unseen||Parameters|
The comparison in Table 1 shows that FINN appropriately generalizes when tested against a different boundary condition. Even though all methods have comparable performance during training, the predictions of TCN, ConvLSTM, and DISTANA deteriorate when the knowledge gained from the training data needs to be extrapolated and to predict unseen test data. This appears to be caused by the fact that they lack proper boundary condition handling. More detailed information can be found in the appendices regarding the benchmark test dataset (Appendix A), the training and test results (Appendix B), as well as the model setup (Appendix C).
As the second step, we apply FINN to real experimental data. The experimental data were collected from three different core samples, namely core samples #1, #2, and #2B (see Appendix D). In this setup, we train FINN using data that originate exclusively from core sample #2. For this experimental data application, FINN is set up the same way as the synthetic data setup. For the dissolved concentration calculation, we also set to be a feedforward neural network that takes as the input to learn the retardation factor function. However, we assume that we know the diffusion coefficient values for all core samples. The main setup difference lies in the available data used to train FINN. More specifically, we only use the breakthrough curve data of in the tailwater. This means that the data provides partial information and constrains the FINN prediction only at .
The results show that the trained FINN has higher accuracy with compared to the calibrated (least squares) numerical PDE model with . Further, we test and validate the trained model using different core samples (i.e. #1 and #2B), which originate from the same geographical area and therefore can be assumed to have similar soil parameters. In Figure 2, we show that the predictions match the experimental data with reasonable accuracy. We also compare the predictions with the output of the numerical model, with the retardation factor also calibrated using the data from the same core sample #2. The plots show that FINN’s prediction accuracy is comparable to the numerical model, even beating it in some instances. For core sample #1, the of FINN prediction is , while the numerical model underestimates the breakthrough curve with .
Specifically for core sample #2B, which is significantly longer than the other samples, to model the diffusion-sorption process, we can assume a zero-flux Neumann boundary condition at the bottom of the core. As a consequence, there are no breakthrough curve data available anymore. Instead, we compare the prediction against the total concentration profile data obtained from a destructive sampling (i.e. core slicing) at the end of the experiment. Here, FINN produces predictions with , while the numerical model overestimates the total concentration profile with . The results show that, even when applied to a different type of boundary condition, FINN’s predictions remain accurate. Moreover, we can extract the retardation factor as a function of using FINN, which is plotted in Figure 2, thus explaining a soil property.
5 Conclusion and Future Work
We have shown that including physical knowledge in the form of the Finite Volume structure produces excellent generalization capabilities and improves the explainability of the applied neural network structure. Our novel Finite Volume Neural Network (FINN) permits proper calculations for conservative fluxes and for different types of boundary condition. FINN outperforms other neural network methods for spatiotemporal modeling such as Temporal Convolutional Network, Convolutional LSTM, and DISTANA, especially when tested with different boundary conditions. Moreover, we show that FINN is suitable for experimental data processing, rendering it relevant as a data-driven modeling tool.
In this work, we assumed spatial homogeneity for the soil in the simulation domain due to the small size of the actual experimental domain. For real field applications, where the scale is significantly larger, the homogeneity assumption might not hold. We are interested in enhancing FINN to handle spatially heterogeneous parameters. One way to achieve this is by defining a spatially correlated parameter to model a learnable diffusion coefficient in a spatially heterogeneous system akin to Karlbauer et al. (2020)
. Additionally, we are also interested in quantifying uncertainties of our model by implementing a Bayesian Neural Network. A concrete application example in the contaminant transport modeling domain is to produce confidence intervals for both the concentration function and the learned retardation factor function. Overall, recent development in fusing numerical methods with deep learning to aid physical processes simulation shows promising results to keep continuing the trend in this direction. It is very exciting to see how far we can push the boundaries between numerical methods and deep learning and to see the benefit when combining both approaches.
This work is funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2075 – 390740016 as well as EXC 2064 – 390727645. We acknowledge the support by the Stuttgart Center for Simulation Science (SimTech). Moreover, we thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Matthias Karlbauer. Codes and data that are used for this paper can be found in the repository https://github.com/timothypraditia/finn.
- Chen et al. (2018) Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Kalchbrenner et al. (2016) Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. Neural machine translation in linear time. arXiv preprint, 2016. URL https://arxiv.org/abs/1610.10099.
- Karlbauer et al. (2019) Matthias Karlbauer, Sebastian Otte, Hendrik P. A. Lensch, Thomas Scholten, Volker Wulfmeyer, and Martin V. Butz. A distributed neural network architecture for robust non-linear spatio-temporal prediction. arXiv preprint, 2019. URL https://arxiv.org/abs/1912.11141.
- Karlbauer et al. (2020) Matthias Karlbauer, Tobias Menge, Sebastian Otte, Hendrik P. A. Lensch, Thomas Scholten, Volker Wulfmeyer, and Martin V. Butz. Hidden latent state inference in a spatio-temporal generative model, 2020.
- Karpatne et al. (2017) A. Karpatne, G. Atluri, J.H. Faghmous, M. Steinbach, A. Banerjee, A.R. Ganguly, S. Shekhar, N.F. Samatova, and V. Kumar. Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data. arXiv preprint, 2017. URL http://arxiv.org/abs/1612.08544v2.
- Karpatne et al. (2018) A. Karpatne, W. Watkins, J. Read, and V. Kumar. Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. arXiv preprint, 2018. URL http://arxiv.org/abs/1710.11431v2.
- LeCun et al. (1999) Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. Object recognition with gradient-based learning, pp. 319–345. Springer, 1999. ISBN 978-3-540-46805-9. doi: 10.1007/3-540-46805-6˙19.
- Moukalled et al. (2016) F. Moukalled, L. Mangani, and M. Darwish. The Finite Volume Method in Computational Fluid Dynamics. Springer, 1 edition, 2016. ISBN 9783319168746. doi: 10.1007/978-3-319-16874-6.
- Nowak & Guthke (2016) W. Nowak and A. Guthke. Entropy-based experimental design for optimal model discrimination in the geosciences. Entropy, 18(11), 2016. doi: 10.3390/e18110409.
- Rackauckas et al. (2020) Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, Ali Ramadhan, and Alan Edelman. Universal differential equations for scientific machine learning. arXiv preprint, 2020. URL https://arxiv.org/abs/2001.04385.
- Raissi et al. (2019) M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045.
- Shi et al. (2015) Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. arXiv preprint, 2015. URL https://arxiv.org/abs/1506.04214.
- Tartakovsky et al. (2018) A.M. Tartakovsky, C.O. Marrero, P. Perdikaris, G.D. Tartakovsky, and D. Barajas-Solano. Learning Parameters and Constitutive Relationships with Physics Informed Deep Neural Networks. arXiv preprint, 2018. URL https://arxiv.org/abs/1808.03398.
- Wang et al. (2020) Nanzhe Wang, Dongxiao Zhang, Haibin Chang, and Heng Li. Deep learning of subsurface flow via theory-guided neural network. Journal of Hydrology, 584:124700, 2020. doi: 10.1016/j.jhydrol.2020.124700.
Appendix A Soil Parameters and Simulation Domains for the Benchmark Test
In this appendix, we present the soil parameters and the simulation domains used to generate the numerical benchmark dataset. Table 2 summarizes the parameters used to generate the training and test dataset. The retardation factor function is generated with the Freundlich sorption isotherm, written mathematically as
|Soil parameters||Simulation domain|
Here, is the effective diffusion coefficient, is the porosity, is the bulk density, is the Freundlich K coefficient, is the Freundlich exponent, is the length of the sample, is the discrete control volume size, is the simulation time, and is the numerical time step.
For the training dataset, the upper boundary condition () is set to be a Dirichlet boundary condition, with the maximum solubility of TCE kg/m3. The testing dataset is generated with the same soil parameters and simulation domain, but with upper boundary condition kg/m3. The lower boundary condition () is set to be a Cauchy boundary condition according to , where is the flow rate in the bottom reservoir. In the benchmark dataset, we assume that . Details on geometries, boundary conditions, and simulation can be found in (Nowak & Guthke, 2016).
Appendix B Benchmark Test Results
In this appendix, we present the results and compare different methods for the benchmark test results. For each method, we train 10 models with different initialization. The MSE values of the predictions are then calculated compared with the training dataset at time steps (i.e. days), the extrapolated training dataset at time steps (i.e. days), and the whole unseen test dataset (at all time steps, i.e. ). Detailed information of the test MSE for every individual model is shown in Table 3 for seen data and in Table 4 for unseed data.
The prediction mean and confidence interval are plotted in Figure 3, Figure 4, Figure 5, and Figure 6. Confidence intervals are obtained from repeated (ten times) training with random initialization. Even though the prediction mean of each method is not far from the synthetic data, clear instabilities and inconsistencies can be seen from the wide range of confidence intervals in the TCN, ConvLSTM, and DISTANA predictions. This instability is mainly caused by the improper handling of boundary conditions by these methods. FINN, on the other hand, produces very precise prediction along with high accuracy.
Appendix C Model Details of TCN, ConvLSTM and DISTANA
In this appendix, we provide additional information about the TCN, ConvLSTM and DISTANA models: bias neurons were used in all layers of all architectures and ADAM was used for optimization with a learning rate of. As fair comparison, FINN results for the benchmark test case are obtained also using the ADAM optimizer with the same learning rate. While TCN, ConvLSTM and DISTANA are always provided with the real data in the first ten timesteps (i.e. teacher forcing), FINN only receives an initial condition in the first timestep along with the information about the boundary in all timesteps. For better comparison, we also provide boundary condition information for TCN, ConvLSTM, and DISTANA. Note that in this experiment, TCN, ConvLSTM and DISTANA are provided with more information than FINN, which nevertheless outperforms the other models.
Two input channels are followed by two hidden layers with four and eight channels, respectively, which are processed by two output channels. A convolution kernel size of was chosen and the standard dilation rate of TCN was applied (, where is the index of the layer), leading to a time horizon of 28 time steps. Code was taken and modified from 111https://github.com/locuslab/TCN.
Two input feature maps, followed by ten channels in one hidden layer and two output channels, were used. The convolution kernel size was set to222https://github.com/ndrplz/ConvLSTM_pytorch and adapted to be applicable to spatially one-dimensional data.
Two input channels map to four preprocessing convolution channels, which feed forward into a ConvLSTM layer with eight channels which are processed by two postprocessing convolution channels. The lateral information processing convolution layer was set to two channels.
Appendix D Soil Parameters and Simulation Domains for the Experiment
In this appendix, we present the soil parameters and the simulation domains of the core samples used in the experiment. Table 5 summarizes the parameters of core sample #1, #2, and #2B.
|Parameter||Unit||Core #1||Core #2||Core #2B|
|Parameter||Unit||Core #1||Core #2||Core #2B|
For all experiments, the core samples are subjected to a constant TCE concentration at the top , which amounts to a Dirichlet boundary condition. Notice that, for core sample #2, we set to be slightly higher to compensate for the fact that there might be fractures at the top of core sample #2, so that the TCE can break through the core sample faster.
For core samples #1 and #2, is the flow rate of clean water at the bottom reservoir that determines the Cauchy boundary condition at the bottom of the core samples. For core sample #2B, note that the sample length is significantly longer than the other samples. Therefore, for this particular sample, given to be approximately in the same order with the other samples, we assume the bottom boundary condition to be a no-flow Neumann boundary condition.