Recent advances in deep learning have shown an increased use of neural networks to create surrogate models for physics-related problems. In particular, Convolutional Neural Networks (CNN) have been employed in a wide variety of applications, leveraging their efficient parameter sharing property and their ability to capture long-range spatial correlations. However, most of the existing works limit themselves to one particular problem setup, keeping the same boundary conditions (BCs) throughout the entire training data[Lee2019a], thus being unable to be generalized to other types of boundary conditions. Ideally, a flexible neural network framework should be able to work with several types of boundary conditions, without the need of retraining the network for each new problem setup. It is thus crucial to understand how boundary conditions are treated by data-driven CNNs in order to improve their generalization capabilities. The general theme of border effect for CNNs has been broadly studied in the image processing community [Liu2008, Hamey2015]. Still today, such border effects can have a strong influence in state-of-the-art architectures employed in image segmentation [Lupton2021]. The usual zero-padding strategy leads to border pixel artifacts and blind spots where the network accuracy drops. Solutions have been proposed to treat borders through separate filters [Innamorati2020] for the edges, corners and inner pixels or to consider the padded pixels as missing information, through the use of Partial Convolution strategies [Liu2018a]. Other works demonstrate how CNNs implicitly learn spatial position [Kayhan, Islam2020], using the padded pixels to serve as anchors, i.e., as a reference for filter activation in border regions. The use of circular convolution [Schubert2019] eliminates such border effects, but can only be employed on “panoramic” datasets.
Yet, the previous works have been devised for image segmentation/classification tasks and it is still unclear whether such studies can be directly transposed for regression and recurrent tasks, as usually encountered when modeling physics. In such contexts, a small error in the boundary prediction may lead to large errors elsewhere in the computational domain. A typical example of such a phenomenon is the development of non-reflecting boundary conditions in computational acoustics [Poinsot1992], which avoids the unphysical reflection of waves back into the computational domain. If left untreated, undesired reflections can pollute the calculated solution. To the author’s knowledge, there is a lack of clear results regarding what is the optimal strategy for the treatment of such boundary conditions when employing CNNs for on spatio-temporal evolving problems, for which BC errors can propagate and contaminate the whole computational domain. Indeed, only few works specifically focus on the boundary treatment problem. Some employ explicit rules in relatively simple cases, such as in periodic domains, as in the case of turbulence modeling [Mohan2020]. For more complex boundary treatments, previous works are found in the context of Physical-Informed Neural Networks (PINN) [Raissi2019a] employed in combination with CNN [Gao2021]
, where hard constraints are imposed through the padding mechanism. However, only simple Dirichlet or Neumann conditions on static problems are considered, leaving dynamical conditions out of the study. In the case of spatio-temporal modeling, CNNs and Recurrent Neural Networks (RNN) architectures have been employed indistinctly. Mathieu et al.[Mathieu2016] designed a Multi-Scale CNN for video prediction, which was later employed in physics-related applications [Lee2019a, Alguacil2020b]. Fotiadis et al. [Fotiadis2020] compared recurrent and convolutional approaches in physics-based applications and found that CNNs can be successfully employed as spatio-temporal predictors, with greater accuracy than RNNs and lesser training costs. In all previous cases, the treatment of boundary conditions was not explicitly studied, leaving the effect of BC treatments on such spatio-temporal evolving systems as an open question.
In this paper, the effects of several boundary condition treatments are characterized when modeling a space-time evolving problem using Convolutional Neural Networks. Three strategies are employed for handling the boundary conditions: (i) an implicit treatment through padding only, (ii) adding some extra spatial context to the network input and finally, (iii) explicitly encoding the boundary condition rules into the network output. These strategies are tested on a series of datasets with varying boundary conditions, modeling the two dimensional wave and heat equations.
The main contributions are the following:
For problems with simple Neumann boundary conditions, padding allows CNNs to efficiently model the physics; yet, mimicking the actual semantics of the dataset makes it only possible for problems with fixed BCs;
The addition of an extra spatial context makes the neural network less sensitive to the choice of padding;
Explicit coding of neural networks gives the best accuracy in the more challenging test cases where dynamical effects are needed to model BCs, allowing more robust predictions by first enforcing physics.
This section describes the methodology to predict the spatio-temporal dynamics of physics-related quantities using a convolutional neural network. The trained network follows a typical auto-regressive strategy [Geneva2019]
to produce time series of high-dimensional state vectors. The focus is put on the treatment of boundary conditions in the context of convolutional networks, in order to reproduce the desired physics. Several algorithms for the treatment of such BCs are presented and later evaluated in Section4
2.1 Learning an auto-regressive model
Dynamical systems can be modeled through a discrete time-invariant model acting on a delayed state vector composed of discrete temporal states which may lie on a high dimensional space. Formally, the discretized time-dynamics read :
where corresponds to the next state in the time series.
In order to generate an approximate model for , a neural network , parametrized by its weights and biases , is trained on a dataset composed of input-target tuples through a supervised optimisation problem, based on an error metric , such that
Once an approximate solution is obtained, any time state can be reached by employing an auto-regressive iterative strategy on the learned model, namely:
where is the function composition operator.
2.2 Neural network convolutional architecture
The auto-regressive strategy can be employed to create surrogate models for physics-based quantities. In traditional fluid solvers, it is common to discretize both the time and space dimensions of physical quantities, such as pressure or velocity fields. This results in high dimensional state where the modeled equations are solved for each degree of freedom. To reduce the computational costs associated with training a neural network surrogate on such high-dimensional states, a convolutional neural network is employed due to its weight-sharing capabilities. In order to efficiently treat the intrinsic multi-scale features of fluid flows, a Multi-Scale fully convolutional neural network[Mathieu2016, Lee2019a] is employed, as shown in Fig. 1. This architecture is similar to U-Net [Ronneberger2015], and both architectures have been employed in other spatio-temporal auto-regressive physics-based applications [Lee2019a, Fotiadis2020]. The input state is composed of four consecutive vectors , in order to provide the network with additional temporal context.
The usage of a pure convolutional approach for the temporal regression problem instead of a Recurent Architecture (RNN, LSTMs etc) is justified because of the direct prediction of full states, which only require a short temporal span for their accurate time-stepping (in traditional PDE solvers, the discretization of the time-derivatives). A comparison of the fully convolutional approach with LSTM approaches performed in [Fotiadis2020] confirms this observation, as the convolutional architecture achieves better accuracies on such types of problems.
2.3 Boundary condition treatment
In traditional solvers, the proper modeling of boundary conditions is key to accurate numerical resolution of the partial differential equations. Therefore, understanding the boundary condition treatment is fundamental if convolution neural networks are to be employed as surrogates for physics-based models. Boundary conditions are intrinsically linked to the concept of padding in fully convolutional networks: additional information must be created at the borders before each convolution in order to keep the same image input resolution at the output. However, the value of the additional pixel information is not known a priori, and several padding strategies are available to encode this information. Zero padding, where the additional information is filled with zero values; replication padding, where the values at border pixels are replicated multiple times into the padding area; reflection padding, where an axial symmetry is performed along boundary edges; and circular padding (or periodic) which wraps values from the opposite boundary in the same spatial direction.
While padding is the most straightforward strategy to impose BCs in CNNs, it is unclear how the padding type affects the predictions, neither how an optimal choice is connected to the physical BC type (Dirichlet, Neumann, etc.). Moreover, padding is fixed for the entire database considered, which lacks of versatility when targeting physical systems with multiple possible BCs. Consequently, the objective of this work is to study if an optimal boundary treatment strategy can be found when employing CNNs for physics-based regression. Three types of strategies are considered:
Implicit: It consists in applying exclusively padding to the input and successive feature maps. This is the most common approach in convolutional networks and forces the network to implicitly learn the boundary physics. It is up to the network designer to chose an adequate padding strategy which usually results in a long trial-and-error process until finding an optimal solution. It constitutes the baseline method of this work (Fig. 1a) and the four padding strategies presented above are investigated (zeros, replication, reflection and circular).
Spatial context: The second strategy consists in concatenating an additional channel to the network input, which is consisting in a Boolean mask indicating the position of border pixels. This is formalized as follows:
where represents the domain of interest, its boundaries and are the spatial coordinate. The motivation for such a strategy is to provide the network with an increased spatial context. Figure 1b depicts such a strategy. Note that padding is still employed in order to maintain the size of feature maps. This strategy is inspired by other works such as Liu et al. [Liu2018], where giving explicit spatial information to CNNs is shown to crucially improve their generalization capability. Yet, the correlation between the spatial extended context, the padding type, and the actual physical BC is still unclear, thus being studied here.
Explicit encoding: Finally, the third strategy (Fig. 1c) consists in explicitly encoding the boundary condition. In practice, the boundary pixel values are imposed after the network output [Gao2021], before stepping into the optimization step during the training phase. The way the value is imposed depends on the mathematical modeling of the boundary (Dirichlet, Neumann condition, etc).
All three strategies are compared in this work, by combining them with the four types of padding previously mentioned.
2.4 Loss function
The loss function for training the aforementioned Multi-Scale network is defined as:
where and , where a classical mean square metric is employed, both for the state vector and its spatial derivatives, denoted as Gradient Difference Loss (GDL) as in [Mathieu2016]. This loss drives the optimization towards achieving sharper predictions, and compensates for the smoothing of the predicted signal in the long term prediction of spatio-temporal series.
Note that the training focuses only on the next time-step prediction. Therefore, the auto-regressive prediction of a long time series of state vectors is a generalization problem.
3 Applications: time-evolving PDEs
The studied modeled is applied to create data-driven surrogates of spatio-temporal evolving partial differential equations. In practice, two applications are investigated: an hyperbolic PDE (acoustic wave propagation) and parabolic PDE (heat equation). The emphasis is put on the influence of the boundary conditions on the ability of surrogate data-driven models to reproduce accurately the underlying dynamics. Thus, several types of boundary conditions are studied for each case, which are detailed next.
3.1 Acoustic propagation of Gaussian pulses:
The first application corresponds to the surrogate modelling of a 2D acoustic wave equation in a quiescent medium with speed of sound , written in terms of the acoustic density , with Gaussian density pulses as initial conditions:
where is the Laplacian operator, , and represent respectively the spatial positions of the center, the amplitude and the half-width of the initial pulse.
3.1.1 Boundary conditions:
Three cases of boundary conditions are considered, representative of typical configurations found in acoustics. Each one of the BC constitutes a dataset to be employed for training a surrogate model:
Reflecting walls (Dataset 1 - D1): Hard-reflecting walls, representing interior acoustics, which is modeled with a Neumann boundary condition: .
Periodic walls (Dataset 2 - D2): Periodic conditions to model infinitely repeating domains.
Absorbing walls (Dataset 3 - D3): Radiation boundary conditions, modeling propagation of waves into the far-field (exterior acoustics). The challenge is to avoid spurious reflections that can pollute the computational domain.
3.2 Diffusion of temperature spots:
Second, the diffusion of temperature spots is studied, modeled by the following heat equation on the temperature with Gaussian density pulses as initial conditions:
where denotes the thermal diffusivity of the medium. The intial conditions are identical as those employed in Eq. (6b) (Gaussian temperature spots).
3.2.1 Boundary conditions:
Here, an additional dataset is generated:
Adiabatic walls (Dataset 4 - D4): Zero-flux adiabatic walls, modeled as Neumann boundary conditions .
3.3 Datasets generation and parameters
The datasets of input-target fields is generated offline with the multi-physics open-source Palabos Lattice-Boltzmann Method (LBM)[Latt2020] numerical solver. We solve equations (6) and (7) for a duration T with a time-step of . A two-dimensional square domain is considered.
Acoustic datasets (D1 to D3)
: Each set is composed of LBM simulations, each with discrete time snapshots. Only the acoustic density fields are recorded in square domains of physical length , discretized with cells per spatial direction and a spatial step of . The LB time step is set to , with is the speed of sound. The four input density fields fed into the Neural network are equally spaced in time with . A random number of Gaussian pulses in the range are used as initial conditions, with fixed amplitude and half-width, and . The initial location of the pulses
is also randomly sampled inside the domain following an uniform distribution. Atraining/validation random split is employed for the simulations.
Temperature datasets (D4)
: Each set is composed of LBM simulations, each with discrete time snapshots. The temperature fields are recorded in square domains of physical length , discretized with cells per spatial direction and a spatial step of . The heat diffusivity is set to , where the LB time step is set to . The four input temperature fields fed into the Neural network are sampled so that . The same initial conditions as in D1-D3 are employed. A training/validation random split is employed for the simulations.
|Centered IC||Random IC|
To perform the evaluation of the presented boundary condition treatment, a Multi-Scale network with million parameters is trained on the four proposed datasets. As discussed in 2.4 the network is trained to minimize Eq. 5 for a single-step prediction. The Adam optimizer is employed, with a learning rate initially set to , and decaying by each time the loss reaches a plateau. The loss weights are set to and
. Data augmentation is employed through the random rotation of input-target tuples, and input fields are normalized by their mean and standard deviation. Batch size is kept at 32. Trainings are performed on a NVIDIA V100 GPU and convergence is achieved at about 1000 epochs for each run.
4.0.1 Wave equation with reflecting boundary:
The first case of study corresponds to Dataset 1. Density waves are fully reflected back into the domain after the interaction with boundaries. Therefore, the waves stay in the computational domain for infinitely long times, as no viscous dissipation is present. To test the presented approaches, 24 initial conditions with 1 to 5 Gaussian pulses randomly located in the initial domain are used as inputs for the auto-regressive model. A 25th initial condition is also tested, with the particular case of a Gaussian pulse initially centered in the domain, whose solution leads to strong symmetric solutions and is thus challenging for the neural network. For each initial condition (generated with the LBM solver), the auto-regressive strategy can recursively predict the density fields over a time horizon of iterations, by using the previous prediction as a new input. In order to improve the neural network robustness versus the error accumulation over time, an a posteriori correction is employed to correct the predictions after each time step [Alguacil2020b], based on the conservation of acoustic energy over time in this particular application.
The error is evaluated in terms of relative root mean square error at each neural network iteration, namely , where is the high-dimensional density prediction at iteration . Here, the time horizon is set at iterations. For the explicit method, Neumann boundary conditions on the density fields are used to model such conditions. A first-order finite difference discretization is employed.
Results are qualitatively evaluated in Fig. 3. For two different initial conditions (centered pulse and 3 randomly sampled pulses), the LBM reference (top row) is compared to several of the proposed approaches. For the implicit case (i.e., only padding) strategies, the best model regarding the employed padding is shown in the second row, corresponding to the replication padding. For both initial conditions, the auto-regressive strategy follows closely the ground truth data. In the third row, the implicit strategy employs circular padding and shows a good agreement in the case of the initially centered Gaussian pulse. However, for the other initial condition, the prediction diverges after some iterations. At iteration , the pulse arriving on the left wall is re-injected at the right wall, mimicking the behavior of periodic boundaries instead of the reflecting ones, on which the network has been trained.
. For the implicit strategy, results show that choosing the optimal padding crucially depends on the data physics. With circular padding the network is incapable of reproducing the desired physics except for some particular initial conditions, illustrated by the increased variance area signaling the presence of outliers. This is due to the artifacts discussed previously. For the rest of available padding (zeros, replication and reflection), the replication padding solution performs better than the other two even if error levels remain acceptable (belowrelative error).
Such observations agree with other studies performed in image segmentation [Lupton2021]: circular padding limits the CNNs ability to encode position information and can only be used with spatially periodic data. To further investigate this claim, an additional spatial context channel is employed, while maintaining circular padding. As observed in Fig. 4 (middle plot, red curve), the error is significantly lower than the one obtained with the implicit strategy. This suggests that the additional input serves as an spatial anchor for the CNN to encode the hard-reflecting wall, which was not possible using only circular padding. The last row in Fig. 3 demonstrates this improvement, even though the prediction error is higher than the one obtained with other padding methods. Furthermore, the addition of the spatial context channel reduces the overall variability of the chosen padding effects. While the error slightly increases for the replication case versus the implicit strategy, all three padding methods converge to similar error evolutions. This suggests that the additional context channel may force the network to explicitly learn similar convolutional kernels for boundary treatment, while this is not guaranteed by the implicit case.
For the explicit case, results in Fig. 4 (right) show that the replicate and reflection padding cases achieve the lowest errors, while zero and circular paddings have larger errors. Note that the network is only trained for a one-step prediction and the explicit enforcing of the boundary is performed after each prediction. Thus, the explicit enforcing is only processed by the network in the auto-regressive context. Results show whether the employed padding strategy is compatible with the enforced boundary values: reflecting and replication padding can be though as first-order finite difference approximations of spatial derivatives, for 1-pixel padding. Zero and circular paddings are on the other hand not compatible with the enforced boundary values, performing worse in both cases.
This behavior highlights the possible benefits of employing explicitly boundary rules as long as the subsequent padding follows the same logic. It also calls to directly enforce such explicit rules in the input padding mechanism, which is left for future work.
4.0.2 Wave equation with periodic BCs
: The second case of study corresponds to Dataset 2, where all four wall boundaries are set as periodic walls in the training data. The objective is that the neural network reproduces the wave propagation in an infinitely-repeating domain.
The relative error evolution over time is depicted in Fig.4a, for the implicit and spatial context methods. The explicit method is not employed here as it is equivalent to the implicit one: physical solvers employ additional “ghost cells” to wrap values from one boundary to another [Mohan2020]. Also, reflect padding is not shown as it behaves very similarly to the replicate strategy. Results show that for both implicit and spatial context strategies, only circular padding is able to achieve acceptable error levels, with an average relative error of for the implicit case and at for the spatial context. The use of a padding strategy other than circular produces unphysical behavior at the boundaries, as the network is incapable of copying by itself the values arriving at one border to the opposite one.
4.0.3 Wave equation with absorbing BCs:
The third test case corresponds to the neural network trained on dataset 3, with non-reflecting (absorbing) boundary conditions. This case is significantly more challenging than the two previous cases: the initial Gaussian pulse is expected to propagate into the far field and completely leave the computational domain. Thus, the underlying data distribution is changing over time: the initial acoustic energy tends towards as . The challenge for the network is to correctly propagate the initial pulses outside the domain without spurious reflections at boundaries. As the signal energy tends towards zero, the error is now calculated relatively to the initial density, i.e., . Similarly to previous experiments, 25 different initial conditions are employed for the auto-regressive tests.
For the explicit method, a local one-dimensional (LODI) non-reflecting equation is used to impose the values a the boundaries, which reads [Poinsot1992]. First order finite differences are employed to discretize both the spatial and temporal derivatives.
The evolution of the error for auto-regressive iterations is shown in Fig. 4b. For both the implicit and spatial context cases, the results show a high variability between the different cases. While the implicit strategy with zero and reflect padding manage to produce low-error results, the other two padding strategies lead to diverging simulations. In contrast, when the spatial context is employed, the circular padding performs better, while the other three methods diverge. This unstable network behavior shows the complexity of the non-reflecting case in comparison to the previous ones. Instabilities can be directly related to the appearance of artifacts when the pulse impinges the walls, as can be seen in Fig. 6: before the pulse arrival at the wall (), all methods show stable and accurate predictions, while larger errors are shown after the first interaction with the BCs (. Such artifacts lead in some cases to the unbounded growth of the density amplitude, leading to instabilities.
Interestingly, the explicit encoding of boundaries has an important stabilizing effect: all four padding methods show a very similar error evolution. Figures 6 show the main differences between implicit, spatial context and explicit approaches for the same padding (replicate): the first two cases initially damp the outgoing waves more efficiently. However, some unphysical reflections (shown by the asymmetry of density the fields) lead to artifacts remaining in the computational domain, eventually leading to instabilities in the implicit case. In the explicit case, even though reflections also exist, they follow a symmetric pattern, which corresponds very closely to the one found for reflecting walls, as seen in Fig. 3. This suggests that the explicit encoding adds physical consistency to the network BC treatment. The error in that approach arises mainly from the lack of proper BC modeling, as the aforementioned LODI hypothesis cannot handle efficiently two-dimensional effects, typical of Gaussian pulses, for example at corners. Future research should reach out towards the treatment of such transverse effects [Poinsot1992]. This demonstrates that blending prior physics knowledge when handling boundary conditions can improve CNNs accuracy and robustness.
4.0.4 Heat equation with adiabatic boundary :
The last studied application corresponds to the diffusion of temperature pulses, modeled by the heat equation (7), when employing Neumann boundary conditions in the temperature (adiabatic walls). The same testing strategy is employed, by simulating 25 initial conditions for neural network recurrences. Two metrics are employed to analyze the network, the relative mean square error and the evolution of the average spatial temperature that tends to a constant value in time, as no heat flux exists at the adiabatic walls.
Results in Figs. 7 and 8 confirm some of the previously made observations for Neumann BCs (reflecting acoustic case): the implicit padding strategy seems appropriate to correctly reproduce the dynamics, when those remain simple, such as in the diffusion of temperature fields. However, the error evolution shows that the use of padding that mimics the physical boundary condition (e.g. replicate) results in a lower long-term error, as well as a closer approximation to the constant level. Furthermore, the use of additional spatial context reduces the variability of the different padding choices, thus confirming the interest of employing such an additional input to make the predictions more robust to boundary condition effects. Here the explicit strategy does not results in an improved accuracy with respect to the baseline methods. The use of such an additional a-priori encoding of boundary conditions may only be justified in the presence of complex conditions, such as the one presented previously for the non-reflecting acoustics case.
This paper presents an exhaustive comparison between several available methods to treat boundary conditions in fully convolutional neural networks for spatio-temporal regression, in the context of hyperbolic and parabolic PDEs. Such temporal regression tasks are highly sensitive to the well-posedness of boundary conditions, as small localized errors can propagate in time and space, producing instabilities in some cases. The characterization of such boundaries is crucial to improve the neural network accuracy.
The main outcomes are summarized next: employing padding alone yields accurate results only when the chosen padding is compatible with the underlying data. The addition of a spatial context channel seems to increase the robustness of the network in simple cases (Neumann boundaries), but fails for the more complex non-reflecting boundary case. Finally, the explicit encoding of boundaries, which enforces some physics constraints on border pixels, clearly demonstrates its superiority in such cases, allowing to design more robust neural networks. Such an approach should be further investigated in order to understand its coupling with the neural network behavior, and its extension to problems with several types of boundary conditions.