Finite Volume Neural Network: Modeling Subsurface Contaminant Transport

04/13/2021 ∙ by Timothy Praditia, et al. ∙ University of Stuttgart 0

Data-driven modeling of spatiotemporal physical processes with general deep learning methods is a highly challenging task. It is further exacerbated by the limited availability of data, leading to poor generalizations in standard neural network models. To tackle this issue, we introduce a new approach called the Finite Volume Neural Network (FINN). The FINN method adopts the numerical structure of the well-known Finite Volume Method for handling partial differential equations, so that each quantity of interest follows its own adaptable conservation law, while it concurrently accommodates learnable parameters. As a result, FINN enables better handling of fluxes between control volumes and therefore proper treatment of different types of numerical boundary conditions. We demonstrate the effectiveness of our approach with a subsurface contaminant transport problem, which is governed by a non-linear diffusion-sorption process. FINN does not only generalize better to differing boundary conditions compared to other methods, it is also capable to explicitly extract and learn the constitutive relationships (expressed by the retardation factor). More importantly, FINN shows excellent generalization ability when applied to both synthetic datasets and real, sparse experimental data, thus underlining its relevance as a data-driven modeling tool.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Training neural networks augmented with additional physical information has been shown to improve their generalization capabilities, particularly when predicting physical processes. In the Physics Informed Neural Network (PINN) framework (Karpatne et al., 2017, 2018; Tartakovsky et al., 2018; Raissi et al., 2019; Wang et al., 2020), the neural network prediction is defined to be an explicit function of space and time . Furthermore, calculations of respective derivatives, such as and

, are required for formulating the loss function. However, when the available training data is concentrated on a single location

or time , the approximation of the derivatives and in current techniques deteriorates due to (a) insufficient information provided by the data and (b) the lack of structural explainability of the framework itself. To address these issues from a structural point of view, several works have been conducted in the literature recently. One architecture, namely the Distributed Spatiotemporal Artificial Neural Network (DISTANA, Karlbauer et al., 2019), uses translational invariant Prediction Kernels (PKs) and Transition Kernels (TKs) to process the temporal and spatial correlation of the data, respectively. Another method, called the Universal Differential Equation method (UDE, Rackauckas et al., 2020)

, combines Convolutional Neural Networks

(CNNs, LeCun et al., 1999)

with Neural Ordinary Differential Equations

(NODEs, Chen et al., 2018), for learning spatiotemporal data. Despite promising results shown by these methods, they still suffer from unreliable flux handling (i.e. the physical fluxes are not guaranteed to be conservative). Consequently, the methods mentioned above lack means to properly treat different types of boundary conditions.

To handle fluxes more robustly and improve generalization within a physical domain, we propose a Finite Volume Neural Network (FINN). The FINN method is a hybrid model, which capitalizes on the structural knowledge of the well-known Finite Volume Method (FVM, Moukalled et al., 2016), and the flexibility as well as the learning abilities of Artificial Neural Networks (ANNs), more specifically NODEs. As a consequence, the FINN structure can properly treat different types of boundary conditions and ensures conservation of the quantities of interest. Moreover, we show that FINN is able to reconstruct the full field solution (for all and ) even when trained with only partial information (e.g. at a single point or ). Additionally, the structure of FINN facilitates learning and extracting constitutive relationships and/or reaction terms, and, consequently, shows exceptional generalization capabilities and develops explainable knowledge structures.

2 Methods

In this work, we focus on modeling spatiotemporal physical processes, namely processes that scientists try to model with Partial Differential Equations (PDEs), such as diffusion-type problems. They can be generally written mathematically as follows:


where is the quantity of interest, is time, is the diffusion coefficient and is the source/sink term. Usually, the FVM discretizes Equation 1 implicitly in space and explicitly in time, leading to a simplified definition:


where denotes at control volume and time step . In other words, the time derivative of depends on the current value of , the values of at the neighboring control volumes, and time. For brevity, we drop the time index in later definitions.

In FINN, we introduce Flux Kernels , which take the input of , , and to approximate the divergence part (first term on the right hand side) in Equation 1 for each control volume :

Figure 1: Illustration of the Flux and State Kernels in the FINN.

The Flux Kernels consist of subkernels that calculate fluxes at the boundary surfaces between control volume and its neighboring control volumes (see Figure 1), being learned by and , which are subcomponents of the function with parameters . The function in each is equivalent to a linear layer that takes and one of its neighbors (i.e. or ) as inputs and learns the numerical FVM stencil with its weights that should amount to , which corresponds to and (i.e. simple difference between neighboring control volumes) in ideal one-dimensional diffusion problems. If the diffusion coefficient depends on according to a hidden constitutive relationship, the function in each learns and approximates this function . Otherwise, will be only a scalar value , which can also be set as a learnable parameter. Next, the output of is multiplied with the output of to obtain the flux approximation at each boundary. When the fluxes at all boundary surfaces are integrated in each control volume , the summation of the numerical stencil will lead to the classical one-dimensional numerical Laplacian with corresponding to if Equation 1 is true.

This structure of the Flux Kernel enables straightforward handling of different types of boundary conditions. With a Dirichlet boundary condition , we can set either or at the corresponding domain boundary. With a Neumann boundary condition , we can easily set the output of at the corresponding domain boundary to be equal to . With a Cauchy boundary condition, we can calculate the derivative approximation at the corresponding domain boundary and set it as either or .

The State Kernels then take the output of the Flux Kernels and approximate while also learning the source/sink term as a function of (whenever necessary), which is also learnable by the State Kernels. Formally, each State Kernel can be written as an approximation of the time derivative in Equation 1 for each control volume :


where is parameterized by . The outputs of State Kernels are then integrated by an ODE solver to solve for , which will be used recursively for calculation of the subsequent time steps. The benefits of using an ODE solver are twofold: (a) it allows for adaptive time stepping, which in turn leads to better numerical stability in explicit schemes, and (b) it enables handling of unevenly spaced time series, which is very common in real observation data.

One of the benefits of State Kernels is to enable separate calculations for different quantities of interest, while the divergence (flux) can be calculated based on the same variable. This ensures that each quantity of interest follows its own conservation law. In short, FINN consists of Flux Kernels that handle the spatial correlation, and State Kernels that handle the temporal correlation of the data.

3 Experiment

For demonstration purposes, we choose an application with a subsurface contaminant transport problem. We assess the performance of FINN not only using synthetic simulation data, but also real experimental data. The contaminant transport is characterized by the non-linear diffusion-sorption equation in a fluid-filled homogeneous porous medium:


where denotes the concentration of trichloroethylene (TCE) dissolved in the fluid, denotes the effective TCE diffusion coefficient, and denotes the retardation factor (representing sorption), which is a function of . Equation 5 is subject to a Dirichlet boundary condition at and a Cauchy boundary condition at (i.e. the top and bottom ends of the field) and an initial condition . Using the definition of retardation factors, we can also calculate the total TCE concentration (both in the fluid and adsorbed in the solid)


where is the porosity of the medium (i.e. the core samples). More detailed information on the experiment and its numerical simulation can be found elsewhere (Nowak & Guthke, 2016).

4 Results and Discussion

As the first step, we train and test FINN using numerically generated synthetic datasets. Both train and test datasets are discretized into control volumes and time steps. We train FINN using the whole spatial domain, with time steps (i.e. days) of the train dataset. Here, FINN receives only the initial condition, i.e. initial values of and , along with the Dirichlet boundary condition value at the top boundary. The bottom boundary is subject to a Cauchy boundary condition, and therefore is solution dependent. FINN is then trained in a closed loop system, using predicted values of and at time step as input for the calculation at time step .

For this synthetic data application, we set and to be learnable. Additionally, for the calculation of , we set to be a feedforward neural network that approximates in Equation 5, allowing us to extract information about the learned retardation factor as a function of the contaminant concentration

. This neural network is constructed with 3 hidden layers, each containing 15 hidden nodes. Each hidden layer uses hyperbolic tangent as the activation function, and the output layer uses the sigmoid activation function, multiplied with a learnable scaling factor to ensure that the approximation of

is strictly positive.

To test the generalization capability, we use the trained network to extrapolate until time step ( days). Additionally, we test the trained FINN prediction against a completely unseen test dataset obtained at different boundary conditions. The Dirichlet boundary condition values at the top boundary are kg/m3 and kg/m3 for the train and test dataset, respectively. We also compare the train and test Mean Squared Error (MSE) value with other known methods, such as TCN (Kalchbrenner et al., 2016), ConvLSTM (Shi et al., 2015), and DISTANA (Karlbauer et al., 2019).

Method Training Extrapolated training Test unseen Parameters
TCN 1 386
ConvLSTM 1 496
FINN 528
Table 1: Comparison of MSE values between different deep learning architectures.
Figure 2: Breakthrough curve prediction of the FINN method (blue line) during training using data from core sample #2 (top left), during testing using data from core sample #1 (top right) and total concentration profile prediction using data from core sample #2B (bottom left). The predictions are compared with the experimental data (red circles) and the results obtained using the physical model (orange dashed line). The extracted retardation factor as a function of is shown on the bottom right

The comparison in Table 1 shows that FINN appropriately generalizes when tested against a different boundary condition. Even though all methods have comparable performance during training, the predictions of TCN, ConvLSTM, and DISTANA deteriorate when the knowledge gained from the training data needs to be extrapolated and to predict unseen test data. This appears to be caused by the fact that they lack proper boundary condition handling. More detailed information can be found in the appendices regarding the benchmark test dataset (Appendix A), the training and test results (Appendix B), as well as the model setup (Appendix C).

As the second step, we apply FINN to real experimental data. The experimental data were collected from three different core samples, namely core samples #1, #2, and #2B (see Appendix D). In this setup, we train FINN using data that originate exclusively from core sample #2. For this experimental data application, FINN is set up the same way as the synthetic data setup. For the dissolved concentration calculation, we also set to be a feedforward neural network that takes as the input to learn the retardation factor function. However, we assume that we know the diffusion coefficient values for all core samples. The main setup difference lies in the available data used to train FINN. More specifically, we only use the breakthrough curve data of in the tailwater. This means that the data provides partial information and constrains the FINN prediction only at .

The results show that the trained FINN has higher accuracy with compared to the calibrated (least squares) numerical PDE model with . Further, we test and validate the trained model using different core samples (i.e. #1 and #2B), which originate from the same geographical area and therefore can be assumed to have similar soil parameters. In Figure 2, we show that the predictions match the experimental data with reasonable accuracy. We also compare the predictions with the output of the numerical model, with the retardation factor also calibrated using the data from the same core sample #2. The plots show that FINN’s prediction accuracy is comparable to the numerical model, even beating it in some instances. For core sample #1, the of FINN prediction is , while the numerical model underestimates the breakthrough curve with .

Specifically for core sample #2B, which is significantly longer than the other samples, to model the diffusion-sorption process, we can assume a zero-flux Neumann boundary condition at the bottom of the core. As a consequence, there are no breakthrough curve data available anymore. Instead, we compare the prediction against the total concentration profile data obtained from a destructive sampling (i.e. core slicing) at the end of the experiment. Here, FINN produces predictions with , while the numerical model overestimates the total concentration profile with . The results show that, even when applied to a different type of boundary condition, FINN’s predictions remain accurate. Moreover, we can extract the retardation factor as a function of using FINN, which is plotted in Figure 2, thus explaining a soil property.

5 Conclusion and Future Work

We have shown that including physical knowledge in the form of the Finite Volume structure produces excellent generalization capabilities and improves the explainability of the applied neural network structure. Our novel Finite Volume Neural Network (FINN) permits proper calculations for conservative fluxes and for different types of boundary condition. FINN outperforms other neural network methods for spatiotemporal modeling such as Temporal Convolutional Network, Convolutional LSTM, and DISTANA, especially when tested with different boundary conditions. Moreover, we show that FINN is suitable for experimental data processing, rendering it relevant as a data-driven modeling tool.

In this work, we assumed spatial homogeneity for the soil in the simulation domain due to the small size of the actual experimental domain. For real field applications, where the scale is significantly larger, the homogeneity assumption might not hold. We are interested in enhancing FINN to handle spatially heterogeneous parameters. One way to achieve this is by defining a spatially correlated parameter to model a learnable diffusion coefficient in a spatially heterogeneous system akin to Karlbauer et al. (2020)

. Additionally, we are also interested in quantifying uncertainties of our model by implementing a Bayesian Neural Network. A concrete application example in the contaminant transport modeling domain is to produce confidence intervals for both the concentration function and the learned retardation factor function. Overall, recent development in fusing numerical methods with deep learning to aid physical processes simulation shows promising results to keep continuing the trend in this direction. It is very exciting to see how far we can push the boundaries between numerical methods and deep learning and to see the benefit when combining both approaches.


This work is funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2075 – 390740016 as well as EXC 2064 – 390727645. We acknowledge the support by the Stuttgart Center for Simulation Science (SimTech). Moreover, we thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Matthias Karlbauer. Codes and data that are used for this paper can be found in the repository


Appendix A Soil Parameters and Simulation Domains for the Benchmark Test

In this appendix, we present the soil parameters and the simulation domains used to generate the numerical benchmark dataset. Table 2 summarizes the parameters used to generate the training and test dataset. The retardation factor function is generated with the Freundlich sorption isotherm, written mathematically as

Soil parameters Simulation domain
Parameter Unit Value Parameter Unit Value
m2/day m
- m
kg/m3 days
(m3/kg)nf days
- 0.874
Table 2: Soil parameters and simulation domains for training and testing dataset generation.

Here, is the effective diffusion coefficient, is the porosity, is the bulk density, is the Freundlich K coefficient, is the Freundlich exponent, is the length of the sample, is the discrete control volume size, is the simulation time, and is the numerical time step.

For the training dataset, the upper boundary condition () is set to be a Dirichlet boundary condition, with the maximum solubility of TCE kg/m3. The testing dataset is generated with the same soil parameters and simulation domain, but with upper boundary condition kg/m3. The lower boundary condition () is set to be a Cauchy boundary condition according to , where is the flow rate in the bottom reservoir. In the benchmark dataset, we assume that . Details on geometries, boundary conditions, and simulation can be found in (Nowak & Guthke, 2016).

Appendix B Benchmark Test Results

In this appendix, we present the results and compare different methods for the benchmark test results. For each method, we train 10 models with different initialization. The MSE values of the predictions are then calculated compared with the training dataset at time steps (i.e. days), the extrapolated training dataset at time steps (i.e. days), and the whole unseen test dataset (at all time steps

). We train the models with noisy data. The noise is normally distributed with standard deviation

, i.e. ). Detailed information of the test MSE for every individual model is shown in Table 3 for seen data and in Table 4 for unseed data.

The prediction mean and confidence interval are plotted in Figure 3, Figure 4, Figure 5, and Figure 6. Confidence intervals are obtained from repeated (ten times) training with random initialization. Even though the prediction mean of each method is not far from the synthetic data, clear instabilities and inconsistencies can be seen from the wide range of confidence intervals in the TCN, ConvLSTM, and DISTANA predictions. This instability is mainly caused by the improper handling of boundary conditions by these methods. FINN, on the other hand, produces very precise prediction along with high accuracy.

Table 3: Test MSE on seen data (extrapolated training) from ten different training runs for each model
Table 4: Test MSE on unseen data coming from ten different training runs for each model
Figure 3: Dissolved concentration profile prediction mean (with confidence interval) at days compared with the extrapolated training dataset obtained using TCN (top left), ConvLSTM (top right), DISTANA (bottom left), and FINN (bottom right).
Figure 4: Total concentration profile prediction mean (with confidence interval) at days compared with the extrapolated training dataset obtained using TCN (top left), ConvLSTM (top right), DISTANA (bottom left), and FINN (bottom right).
Figure 5: Dissolved concentration profile prediction mean (with confidence interval) at days compared with the test dataset obtained using TCN (top left), ConvLSTM (top right), DISTANA (bottom left), and FINN (bottom right).
Figure 6: Total concentration profile prediction mean (with confidence interval) at days compared with the test dataset obtained using TCN (top left), ConvLSTM (top right), DISTANA (bottom left), and FINN (bottom right).

Appendix C Model Details of TCN, ConvLSTM and DISTANA

In this appendix, we provide additional information about the TCN, ConvLSTM and DISTANA models: bias neurons were used in all layers of all architectures and ADAM was used for optimization with a learning rate of

. As fair comparison, FINN results for the benchmark test case are obtained also using the ADAM optimizer with the same learning rate. While TCN, ConvLSTM and DISTANA are always provided with the real data in the first ten timesteps (i.e. teacher forcing), FINN only receives an initial condition in the first timestep along with the information about the boundary in all timesteps. For better comparison, we also provide boundary condition information for TCN, ConvLSTM, and DISTANA. Note that in this experiment, TCN, ConvLSTM and DISTANA are provided with more information than FINN, which nevertheless outperforms the other models.


Two input channels are followed by two hidden layers with four and eight channels, respectively, which are processed by two output channels. A convolution kernel size of was chosen and the standard dilation rate of TCN was applied (, where is the index of the layer), leading to a time horizon of 28 time steps. Code was taken and modified from 111


Two input feature maps, followed by ten channels in one hidden layer and two output channels, were used. The convolution kernel size was set to

and zero-padding was applied to preserve data dimensions. PyTorch code was taken from

222 and adapted to be applicable to spatially one-dimensional data.


Two input channels map to four preprocessing convolution channels, which feed forward into a ConvLSTM layer with eight channels which are processed by two postprocessing convolution channels. The lateral information processing convolution layer was set to two channels.

Appendix D Soil Parameters and Simulation Domains for the Experiment

In this appendix, we present the soil parameters and the simulation domains of the core samples used in the experiment. Table 5 summarizes the parameters of core sample #1, #2, and #2B.

Soil parameters
Parameter Unit Core #1 Core #2 Core #2B
Simulation domain
Parameter Unit Core #1 Core #2 Core #2B
m N/A
m3/day N/A
Table 5: Soil and experimental parameters of core samples #1, #2, and #2B.

For all experiments, the core samples are subjected to a constant TCE concentration at the top , which amounts to a Dirichlet boundary condition. Notice that, for core sample #2, we set to be slightly higher to compensate for the fact that there might be fractures at the top of core sample #2, so that the TCE can break through the core sample faster.

For core samples #1 and #2, is the flow rate of clean water at the bottom reservoir that determines the Cauchy boundary condition at the bottom of the core samples. For core sample #2B, note that the sample length is significantly longer than the other samples. Therefore, for this particular sample, given to be approximately in the same order with the other samples, we assume the bottom boundary condition to be a no-flow Neumann boundary condition.