Collocation Polynomial Neural Forms and Domain Fragmentation for Initial Value Problems

by   Toni Schneidereit, et al.

Several neural network approaches for solving differential equations employ trial solutions with a feedforward neural network. There are different means to incorporate the trial solution in the construction, for instance one may include them directly in the cost function. Used within the corresponding neural network, the trial solutions define the so-called neural form. Such neural forms represent general, flexible tools by which one may solve various differential equations. In this article we consider time-dependent initial value problems, which require to set up the neural form framework adequately. The neural forms presented up to now in the literature for such a setting can be considered as first order polynomials. In this work we propose to extend the polynomial order of the neural forms. The novel collocation-type construction includes several feedforward neural networks, one for each order. Additionally, we propose the fragmentation of the computational domain into subdomains. The neural forms are solved on each subdomain, whereas the interfacing grid points overlap in order to provide initial values over the whole fragmentation. We illustrate in experiments that the combination of collocation neural forms of higher order and the domain fragmentation allows to solve initial value problems over large domains with high accuracy and reliability.



There are no comments yet.


page 13

page 17


Computational characteristics of feedforward neural networks for solving a stiff differential equation

Feedforward neural networks offer a promising approach for solving diffe...

Adaptive Neural Domain Refinement for Solving Time-Dependent Differential Equations

A classic approach for solving differential equations with neural networ...

Spectral Monic Chebyshev Approximation for Higher Order Differential Equations

This paper is focused on performing a new method for solving linear and ...

Deep Petrov-Galerkin Method for Solving Partial Differential Equations

Deep neural networks are powerful tools for approximating functions, and...

The Gâteaux-Hopfield Neural Network method

In the present work a new set of differential equations for the Hopfield...

Neural networks catching up with finite differences in solving partial differential equations in higher dimensions

Fully connected multilayer perceptrons are used for obtaining numerical ...

Computing solutions of Schrödinger equations on unbounded domains- On the brink of numerical algorithms

We address the open problem of determining which classes of time-depende...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the last decades several neural network approaches for solving differential equations have been developed [1, 2, 3]. The application and extension of these approaches is a topic of recent research, including work on different network architectures like Legendre [4] and polynomial neural networks [5] as well as computational studies [6, 7].

One of the early proposed methods [8] introduced a trial solution (TS) in order to define a cost function using one feedforward neural network. The TS is supposed to satisfy given initial or boundary values by construction. It is also referred to as neural form (NF) in this context [8, 9]

which we will adopt from here on. Let us note that such NFs represent a general tool that enable to solve ordinary ordinary differential equations (ODEs), partial differential equations (PDEs) and systems of ODEs/PDEs alike. We will refer here to this approach as the trial solution method (TSM). Later, the initial method from

[8] has been extended by a NF with two feedforward neural networks, which allows to deal with boundary value problems for irregular boundaries [10] and yields broader possibilities for constructing the TS [9]. In the latter context, let us also mention [11] where an algorithm is proposed in order to create a TS based on grammatical evolution. Focusing on initial value problems (IVPs), one approach employs to learn solution bundles [12], making the trained neural forms reusable for enquired initial values.

A technique related to TSM that avoids the explicit construction of trial solutions has been proposed in [13]. The given initial or boundary values from the underlying differential equation are included in the cost function as additional terms, so that the NF can be set to equal the neural network output. We will refer to this approach as modified trial solution method (mTSM).

The fact that the neural network output computation resembles a linear combination of basis functions leads to a network architecture as presented in [14]

(for PDEs). In that work one hidden layer incorporates two sets of activation functions, one of which is supposed to satisfy the PDE and the second dealing with boundary conditions. The basis function coefficients are set to be the connecting weights from the hidden layer to the output neuron, and the sum over all basis functions and coefficients makes up the NF.

Motivated by the construction principle of collocation methods in numerical analysis, we propose in this paper a novel extension of the NF approach. Our neural form extension is based on the observation, that the NF using one feedforward neural network as employed in [8] may be interpreted as a first order collocation polynomial. We propose to extend the corresponding polynomial order of the neural form. The novel construction includes several feedforward neural networks, one for each order. Compared to a collocation method from standard numerics, the networks take on the role of coefficients in the collocation polynomial expansion.

Furthermore, we aim to approximate initial value problems on fairly large domains. Therefore, and based on the NF structures, we also propose a fragmentation of the computational domain into subdomains. In each subdomain, we solve the initial value problem with a collocation neural form. This is done proceeding in time from one domain fragment to the adjacent subdomain. The interfacing grid points in any subdomain provide the initial value for the next subdomain. On a first glance one may think of similarities to domain decomposition methods for PDEs in numerical analysis, cf. [15, 16]. We also show how to combine the domain fragmentation with the newly developed collocation polynomial neural forms.

2 Setting up the Neural Form (NF)

In this section, we first recall the TSM and its modified version mTSM, respectively, compare [8, 13]. Then we proceed with details on the feedforward neural networks we employ, followed by a description of the novel collocation-based neural form and the subdomain approach.

2.1 Construction of the Neural Form

Consider an initial value problem written in a general form as


with given initial value . In order to connect with a neural network, several approaches introduce a NF as a differentiable function

, where the vector

contains the network weights. With the collocation method we discretise the domain by a uniform grid with grid points (), so that the initial value problem (1) leads to the formulation


Let us note that, in a slight abuse of notation, we identify with the vector of corresponding entries, since this enables to give many formula a more elegant, compact notation.

In order to satisfy the given initial value, TSM [8] employs the NF as a sum of two terms


where is supposed to match the initial condition (with the simplest choice to be ), while is constructed to eliminate the impact of at . The choice of determines the influence of over the domain.

Since the NF as used in this work satisfies given initial values by construction, we define the corresponding cost function incorporating Eq. (3) as


Let us now turn to the mTSM approach after [13]. The mTSM approach chooses the NF to be equivalent to the neural network output directly


Since no condition is imposed by the initial value on the NF in this way, the conditions are added to the cost function when relying on Eq. (5):


2.2 Neural Network Architecture

In this section we will describe how a feedforward neural network with one hidden layer operates in our setting. Specific variants will be addressed in the corresponding sections.

We employ one hidden layer, with hidden layer neurons supplemented by one bias neuron. Having in addition one bias neuron in the input layer and a linear output layer neuron, the neural network output reads


Thereby = represents the sigmoid activation function with the weighted sum . Here, (input layer neuron), (input layer bias neuron), (hidden layer neurons) and (hidden layer bias neuron) denote the weights which are stored in the weight vector . The input layer passes the domain data (that is in practice ), weighted by and , to the hidden layer for processing. The neural network output is again a weighted sum of the values . With given, the neural forms and cost functions in Eqs. (4),(6), are obtained.

As usual, the cost function gradient is used to update

in order to find a (local) minimum in the weight space. One training cycle is called an epoch and consists of a full iteration over all training data points.

If a weight update is performed after the gradient computation for a single grid point we call this method single batch training (SBtraining) here. An alternative proceeding, performing the weight update after a complete iteration over all grid points, averaging the cost function gradient, is denoted here as full batch training (FBtraining).

Let us comment in some detail on the relation between grid points and training data. Our setting is an unsupervised learning framework, where grid points are used for domain discretisation, and where the unknowns are values of the ODE solution at exactly these grid points. Thus, in our setting, the grid points are identical with the training data. Let us stress in this context, that the grid points by themselves stay fixed during network optimisation.

For optimising the cost function we consider here ADAM (adaptive moment estimation)


which is a stochastic gradient descent method, using adaptive learning for every weight.

3 The Novel Collocation Neural Form (CNF)

Making Eq. (3) precise for our grid-based setting, a suitable choice for the neural form of TSM is


where will be evaluated at grid points . Compared to a first order polynomial one may find similarities in the structure. Motivated by the expansion of an -th order collocation function polynomial [18]


we are lead to set up our collocation-based NF (CNF) approach for TSM:


The weight vector is denoted by and we define the matrix of weight vectors .

The use of higher order monomial powers as in (10) not only generalises previous methods, but may also enable better stability and accuracy properties, as we show in this paper. Let us also observe, that the neural networks take on the roles of coefficient functions for the values of . We conjecture at this point that this construction makes sense since in this way several possible multipliers (not only as in (8)) are included for neural form construction. It is important to mention that the new neural form construction (10) fulfills the initial condition.

Let us stress that the proposed ansatz (10) includes neural networks, where represents the -th neural network


The corresponding cost function is then given as in Eq. (4).

We extend the mTSM method in a similar way as we obtained the TSM extension in Eq. (10):


Thereby the first neural network is set to learn the initial condition in the same way as stated in Eq. (6).

From now on we will refer to the number of neural networks in the neural form as the collocation neural form order (CNFo).

4 The Novel Subdomain Collocation Neural Form (SCNF)

The previous described TSM and mTSM approaches use the IVP structure together with the given initial value in order to train the neural networks on a certain domain. In a prior experimental study [7] we figured out that especially TSM tends to struggle with approximating the solution on larger domains. However, on small domains the numerical error tends to remain small. Since the domain variable effectively acts as a scaling of , we conjecture that a large domain size variation may introduce the need for a higher amount of training data or the use of a more complex neural network architecture.

These circumstances motivate us to introduce a second stage of discretising the domain. That is, we split the solution domain in subdomains , with grid points in each subdomain. Now the CNF is solved separately in each subdomain. The interfacing grid points overlap, i.e. the computed value at the last grid point of any subdomain is set to be the new initial value for the next subdomain .

Since the CNF for TSM is constructed in order to satisfy the given initial values, we force the subdomain CNF (SCNF) to also hold that characteristic. Therefore the SCNF is constructed to satisfy the new initial values in each fragmentation, namely


The neural networks are now scaled by , which in fact may avoid higher scaling factors, depending on the subdomain size.

Proceeding to mTSM, we also adopt the CNF approach and set the first neural network to learn the new initial values in each subdomain. That is, the SCNF reads


and the corresponding cost function


In order to keep the overview of all terms and indices, we sum them up again: The -th grid point in the -th subdomain is denoted by , while is the initial point in the subdomain with the initial value . That is, and are overlapping grid points. In , holds. The matrix contains the set of the neural network weights in the corresponding subdomain. Finally, denotes the -th neural network in .

5 Experiments and Results

This section is divided into experiments on the collocation neural form (CNF), followed by experiments on the subdomain collocation neural form (SCNF). Prior to this, we will provide detailed information about how the weight initialisation for the different neural networks are realised. The discussion of constant weight initialisation is also one of the main subjects in the experimental section. As stated before, the specific neural network configurations will be addressed in the subsequent experiments.

Weight initialisation with applies to all corresponding neural networks so that they use the same initial values. Increasing the CNFo for the initialisation with works systematically. For CNFo=1, a set of random weights for the neural network is generated. For CNFo=2 (now with two neural networks), the first neural network is again initialised with the generated weights from CNFo=1, while for neural network number two, a new set of weights is generated. This holds for all CNFo for higher orders, subsequently, in all experiments. To achieve comparability, the same random initialised weights are used in all experiments.

For optimisation we use ADAM, which parameters are fixed with, as employed in [17], =1e-3, =9e-1, =9.99e-1 and =1e-8.

5.1 Experiments on the Collocation Neural Form (CNF)

In this section, we want to test our novel CNF approach with the initial value problem


which has the analytical solution and is solved over the entire domain (without domain fragmentation). The Eq. (17) involves a damping mechanism, making this a simple model for stiff phenomena [19].

The numerical error shown in subsequent diagrams in this section is defined as the -norm of the difference between the exact solution and the corresponding CNF


If we do not say otherwise, the fixed computational parameters in the subsequent experiments are: 1 input layer bias, 1 hidden layer with 5 sigmoid neurons, 1e5 training epochs, 10 training data, and the weight initialisation values which are =-10 and [-10.5,-9.5].

Weight Initialisation

Let us comment in some more detail on weight initialisation. The weight initialisation plays an important role and determines the starting point for gradient descent. Poorly chosen, the optimisation method may fail to find a suitable local minimum. The initial neural network weights are commonly chosen as small random values [20]. Let us note that this is sometimes considered as a computational characteristic of the stochastic gradient descent optimisation. Another option is to choose the initialisation to be constant. This method is not commonly used for the optimisation of neural networks since random weight initialisation may lead to better results. However, constant initialisation returns reliably results of reasonable quality if the computational parameters in the network remain unchanged.

1  5.7148e-6  2.6653e-6
2  7.5397e-6  2.6653e-6
3  3.7249e-5  2.6653e-6
4  1.1894e-5  2.6653e-6
5  7.7956e-6  2.6653e-6
Table 1: Results for five different realisations during optimisation (mTSM, CNFo=2)

As previous experiments have documented [7, 8, 13], both TSM and mTSM are able to solve differential equations up to a certain degree of accuracy. However, an example illustrating the accuracy of five computations with random weights respectively constant weights shows that the quality of approximations may vary considerably, see Table 1. As observed in many experiments, even a small discrepancy in the initialisation with several sets of random weights in the same range, may lead to a significant difference in accuracy. On the other hand, the network initialisation with constant values very often gives reliable results by the proposed novel approach. This motivates us to study in detail the effects of constant network initialisations.

5.1.1 CNF Experiment: Number of Training Epochs

Figure 1: Experiment in 5.1.1 Number of Training Epochs, (blue) CNFo=1, (orange) CNFo=2, (yellow) CNFo=3, (purple) CNFo=4, (green) CNFo=5

The first experiment shows for different CNFo how the numerical error behaves depending on the number of training epochs. The diagrams only display every hundredth data point.

In Fig. 1(a) with TSM and results for CNFo=1 (blue) do not provide any useful approximation, independent of the batch training method selected. With a second neural network for CNFo=2 (orange) in the neural form, approximately lowers by one order of magnitude so that we now obtain a solution which can be considered to rank at the lower end of reliability. However, the most interesting result in Fig. 1(a) is CNFo=5 (green) with the best accuracy at the end of the optimisation process but with the drawback of occurring oscillations. These may arise by the chosen optimisation method.

For mTSM with SBtraining and , already CNFo=1 converges to a solution accuracy that can be considered reliable. However, we observe within Fig. 1(b) that only the transition from CNFo=1 (blue) to CNFo=2 (orange) affects with increasing accuracy, while heavy oscillations start to occur.

In not documented results with , CNFo has only minor influence on the accuracy. Especially FBtraining for mTSM shows the same trend for both initialisation methods with only minor differences in the last epochs.

Let us note that the displayed results show the best approximations using constant or random initialisation. This means, we obtain the best results for TSM with FBtraining, CNFo=5 (green) and for mTSM with SBtraining, CNFo2, respectively.

Concluding this experiment, we were able to get better results with over . Increasing the CNFo to at least order five seems to be a good option for TSM and FBtraining, whereas further CNFo may provide even better approximations. For mTSM we can not observe benefits for CNFo above order 2.

Moreover, we see especially that the increase in the order of the neural form in (10) appears to have a similar impact on solution accuracy as the discretisation order in classical numerical analysis.

5.1.2 CNF Experiment: Domain Size Variation

Figure 2: Experiment in 5.1.2 Domain Size Variation, (blue) CNFo=1, (orange) CNFo=2, (yellow) CNFo=3, (purple) CNFo=4, (green) CNFo=5

Investigating the methods concerning different domain sizes provides information on the reliability of computations on larger domains. The domains in this experiment read as and we directly compare in this experiment with .

In Fig. 2(a), 2(b), we observe TSM from around =3.5 to incrementally plateau to unreliable approximations. Increasing CNFo improves on small domains and shifts the observable step-like accuracy degeneration towards larger domains.

However, even with CNFo=5 (green) the results starting from domain size =3.5 towards larger sizes are unreliable. Previous to the first plateau higher CNFo provide significant better for , while there are only minor changes for for the TSM method. This holds for both SBtraining and FBtraining, and one can say that in this experiment TSM works better with , even without increasing CNFo.

Turning to the mTSM extension, we observe in Fig. 2(c) with SBtraining the existence of a certain point from where different CNFo return equal values, whereas FBtraining returns (close to) equal results for all the investigated domain sizes. However, we see some evidence for the use of CNFo=2 (orange) over CNFo=1 (blue) to show an overall good performance. A further increase of CNFo is not necessary with this approach, confirming results from Experiment 5.1.1.

Let us also note that, with mTSM we find that a small domain seems to favour which then provides better results than .

5.1.3 CNF Experiment: Number of Training Data Variation

Figure 3: Experiment in 5.1.3 Number of Training Data Variation, (blue) CNFo=1, (orange) CNFo=2, (yellow) CNFo=3, (purple) CNFo=4, (green) CNFo=5

The behaviour of numerical methods highly depend on the chosen amount of grid points, so that in this experiment we analogously investigate the influence of the amount of training data. In every computation, the domain is discretised by equidistant grid points.

As in the previous experiments, the CNFo shows a major influence on the results with TSM, and the best approximations are provided by with CNFo=5 (green) as seen in Fig. 3(a). An interesting behaviour (observed also in a different context in Fig. 1(a)) is the equivalence between CNFo=3 (yellow) and CNFo=4 (purple). Both converge to almost exactly the same , where one may assume a saturation for the CNFo. However, another increase in the order decreases the numerical error again by one order of accuracy.

Turning to mTSM in Fig. 3(b) we again find a major increase in accuracy after a transition from CNFo=1 (blue) to CNFo=2. For ntD=50, values for CNFo2 converge to the same results as provided by TSM with CNFo=5.

Concluding this experiment, we again find evidence that increasing CNFo in the proposed approach provides an improved accuracy for . However, increasing ntD seems not to improve the accuracy from a certain point on, unlike for numerical methods. But one could argue, that the analogy between the number of grid points for numerical methods here is the number of epochs.

5.2 Experiments on the Subdomain Collocation Neural Form (SCNF)

In Section 5.1, while the test equation is stiff, its solution is at the same time very smooth and the equation is solved on a small domain. However, Fig. 2 in Experiment 5.1.2, shows that TSM does not provide reliable solutions on larger domains. Hence, we want to show that the novel SCNF approach is able to work even on a fairly large domain with a different initial value problem. Therefore we use the following test equation


with the analytical solution


The solution is shown in Fig. 4 for

Figure 4: Analytical solution for initial value problem (19)

and incorporates heavily oscillating and increasing characteristics, similar to instabilities.

The numerical error is now defined as the -norm of the difference between the exact solution and the corresponding SCNF in each subdomain


whereas averages the numerical error of the subdomains


The weight initialisation works as employed in Section 5 and the values are fixed to =0 and [-0.5,0.5]. In the subsequent experiments, the solution domain is kept constant to and the neural networks are training with 1e5 epochs.

In addition we use the method of training the neural networks incrementally which has been employed in [13]. That is, we initially train the neural networks for the first grid point, afterwards for the first two grid points. We continue the procedure up to a FBtraining of all grid points in each subdomain. The initial weight initialisation is the same in each subdomain.

A Scaling Experiment

The original TSM neural form (Eq. (3)) is theoretically capable of approximating every continuous function, according to the universal approximation theorem [21]. However, Table 2 shows results for a TSM neural form with a single neural network. For different domains we scaled the number of hidden layer neurons linearly and averaged ten computations for each domain with the same computational parameters.

domain D Neurons ntD
8.4228e-4 5 10
9.2191e-4 10 20
1.9448e-3 15 30
1.6751e-2 20 40
Table 2: TSM neural form,

The results in Table 2 provide the following message. Increasing the domain size forces the neural network to incorporate more hidden layer neurons and grid points. Indeed, to reach e.g. (averaged) =9.5873e-4 for , learning the neural network required 50 hidden layer neurons and 75 grid points. In general, determining a suitable architecture in terms of the number of hidden layer neurons and training data is a challenging task.

In subsequent experiments we find the SCNF to be able to solve the initial value problem with neural networks including a small fixed amount of hidden layer neurons and training data in each subdomain. At the same time, this allows to define various important parameters in a simple and straightforward way.

5.2.1 SCNF Experiment: CNF versus SCNF

Figure 5: Experiment in 5.2.1 CNF versus SCNF, (orange) analytical solution, (red) CNF solution, (black/dotted) SCNF solution

In the first experiment we compare results of a SCNF with a CNF that is solved over the entire domain. For comparability the total number of training data is constant, namely ntD=1000 for the red line and ntD=10 with 100 subdomains for the black/dotted line. However, the comparison of two CNFs with the same architecture would not be meaningful because the domain size has a significant influence. Therefore we decided to realise the CNF (red) with a neural network incorporating 1 input layer bias and 100 sigmoid neurons with 1 bias. The SCNF (black/dotted) features neural networks with 1 input layer bias and 5 sigmoid neurons with 1 bias per subdomain. Both CNF and SCNF incorporate CNFo=3. In addition we did not increase the domain size incrementally for this experiment, to reduce the number of parameters that prevent comparability.

The CNF solution (red) shows throughout all experiments in Fig. 5 no useful approximation. In total, the number of hidden layer neurons and training data that would be needed to obtain a useful approximation seems to be much higher. Nonetheless, the SCNF approach (black/dotted) working with the same number of training data was able to solve the initial value problem in a satisfactory way. From a qualitative perspective both TSM and mTSM together with and provide similar results.

Concluding this experiment, we see that the SCNF method provides a useful solution to the initial value problem. In addition, the incorporated small number of hidden layer neurons enables a much more effective training of the neural networks.

5.2.2 SCNF Experiment: CNFo Variation

Figure 6: Experiment 5.2.2 CNFo Variation, , (orange) analytical solution, (black/dotted) SCNF solution
1 2.7834 4.8758
2 0.5763 0.7478
3 0.0846 0.0848
Table 3: -norm for mTSM interface grid points,

The ability to approximate the initial value problem with SCNF, depending on different CNFo, is subject to this experiment. Here the SCNFs include 1 input layer bias and 5 sigmoid neurons with 1 bias. The solution domain is split into 60 subdomains with 10 grid points in each subdomain. Here, we employ incremental learning in the subdomains.

Results for TSM with CNFo=1 in Fig. 6(a) and mTSM with CNFo=1 in Fig. 6(b) indicate that the original TSM and mTSM methods are not useful over larger domains, even when employing domain fragmentation. However, the SCNF of first order is able to get back on the solution trend, although several subdomains do not provide correct approximations. In total, both solutions for CNFo=1 (especially mTSM) cannot be considered to be reliable.

That changes for CNFo=2, at least for TSM in Fig. 6(c). Here we find, with the exception of some local extreme points, the SCNF to be a reasonable approximation of the initial value problem. This statement however, does not hold for mTSM. Although the general trend now is much closer to the analytical solution, there are still subdomains which do not approximate the solution well.

Results shown in Table 3 represent the -norm of differences between analytical and computed solution, for mTSM as displayed Fig. 6, measured at the last grid points in , namely , and the corresponding initial points in , . We propose to consider this measure, since it indicates how well the solution can be met over the subdomains. We find that increasing CNFo has a major influence on the accuracy.

We conjecture that learning the subdomain initial values becomes easier for mTSM, the more neural networks are incorporated. That is mainly because the first neural network can so to say focus on learning the initial values, while the other networks are more engaged with the IVP structure. We think that this conjecture can be confirmed by the decreasing discrepancy between the overlapping at the interfaces for higher orders of CNFo.

The overall best solutions here are provided by CNFo=3 (Fig. 6(e),6(f)) for both TSM and mTSM in this experiment.

We tend to favor TSM over mTSM, since the initial value in each subdomain is satisfied by the corresponding SCNF (where the learned value at is set to be the initial value for ) and does not have to be learned again.

5.2.3 SCNF Experiment: Number of Subdomain Variation

Figure 7: Experiment in 5.2.3 Number of Subdomain Variation, (blue) , (yellow)

In this experiment we investigate the influence of the total number of subdomains on the numeric error . Fig. 7 shows the behaviour for (blue) and (yellow). The SCNF incorporate CNFo=3, 1 input layer bias, 5 sigmoid neurons with 1 bias and 10 grid points in each subdomain. We again employ incremental learning in the subdomains.

Let us first comment on the SCNF for TSM in Fig. 7(a). Despite minor differences between the solutions corresponding to and for smaller numbers of subdomains, both initialisation methods show a very similar trend. A saturation regime seems to appear for around 350 subdomains with 1e-5.

Turning to mTSM in Fig. 7(b), we again observe a similar behaviour between the methods with and . Although the differences disappear not before larger numbers of subdomains. We find that even at 400 subdomains the numerical error can not compete with TSM here.

Let us note again, that the chosen weight initialisation approach for (see Section 5) means that the random weights are initialised in the same way in each subdomain. In undocumented tests we observed that the results may show slight to significant variations, when the random weights are generated independently for each network over the subdomains. However, the results we have shown here using represent a rather typical trend observed in the results.

In conclusion, one can obtain very good approximations with the TSM SCNF approach for both weight initialisation methods. That means, choosing over has no downsides, which leads us to again support the use of constant weight initialisation.

5.2.4 SCNF Experiment: Numerical Error in the Subdomains

Figure 8: Experiment in 5.2.4 Numerical Error in the Subdomains, (blue) CNFo=1, (orange) CNFo=3, (yellow) CNFo=5

The last experiment investigates the numeric error in each subdomain , depending on different CNFo. Again, the SCNFs feature 1 input layer bias and 5 sigmoid neurons with 1 bias. The solution is computed with 100 subdomains together with 10 grid points each and incremental learning in the subdomains.

Throughout Fig. 8, CNFo=1 shows the least good results. Although, if we compare mTSM with in Fig. 8(b) to results for 60 subdomains in Fig. 6(b), increasing the domain fragmentation by 40 subdomains seems to prevent the solution from diverging. Random weight initialisation works better for CNFo=1, especially with TSM.

Solutions provided by CNFo=3 and CNFo=5 are much better than for CNFo=1, and increasing the order clearly tends to increase the accuracy. For TSM with both CNFo=3 and CNFo=5, as well as for mTSM with CNFo=5 from a certain subdomain on, the numerical error saturates. Let us note, that for both and the differences in the overall numerical error are not significant in these cases.

In this experiment, we again tend to favour TSM with . Although CNFo=1 does not work well, the other shown higher orders provide good approximations with saturation regimes. The results confirm our preference of constant initialisation, because does not depend on a good generation of random weights by chance.

6 Conclusion and Future Work

The proposed CNF and SCNF approaches merging collocation polynomial basis functions with neural networks and domain fragmentation show clear benefits over the previous neural form constructions. We have studied in detail the constant weight initialisation for our novel CNF approach with a basic stiff initial value problem. Depending on the batch learning methods, the collocation-based extension seems to have some benefits for both TSM and mTSM. For the TSM CNF, this effect is more significant than observed for the mTSM extension.

Focusing on mTSM and the CNF approach, using two neural networks, one for learning the initial value and one multiplied by , seems to have some advantages over other possible mTSM settings. Considering approximation quality as most imperative, we find mTSM with CNFo=2 to provide the overall best results for the investigated initial value problem.

We find that the proposed SCNF approach combines many advantages of the new developments. Employing higher order CNF methods, it is possible to solve initial value problems over large domains with very high accuracy, and at the same time with reasonable optimisation effort. Moreover, many computational parameters can be fixed easily for this setting, which is a significant issue with other TSM and mTSM variations.

As another important conclusion, in the experiments we were able to show that we can favour constant weight initialisation over random weight initialisation.

Future research may include work on other possible collocation functions and on combining the networks with other discretisation methods.


This publication was funded by the Graduate Research School (GRS) of the Brandenburg University of Technology Cottbus-Senftenberg. This work is part of the Research Cluster Cognitive Dependable Cyber Physical Systems.


  • [1] Yadav, N., Yadav, A., Kumar, M.: An Introduction to Neural Network Methods for Differential Equations. SpringerBriefs in Applied Sciences and Technology, Netherlands (2015). doi:10.1007/978-94-017-9816-7
  • [2] Maede Jr, A.J., Fernandez, A.A.: The numerical solution of linear ordinary differential equations by feedforward neural networks. Mathematical and Computer Modelling 19(12), 1–25 (1994). doi:10.1016/0895-7177(94)90095-7
  • [3] Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Communications in Numerical Methods in Engineering 10(3), 195–201 (1994). doi:10.1002/cnm.1640100303
  • [4] Mall, S., Chakraverty, S.: Application of Legendre Neural Network for solving ordinary differential equations. Applied Soft Computing 43, 347–356 (2016). doi:10.1016/j.asoc.2015.10.069
  • [5] Zjavka, L., Pedrycz, W.: Constructing general partial differential equations using polynomial and neural networks. Neural Networks 73, 58–69 (2016). doi:10.1016/j.neunet.2015.10.001
  • [6] Famelis, I.T., Kaloutsa, V.: Parameterized neural network training for the solution of a class of stiff initial value systems. Neural Computing and Applications, 1–8 (2020). doi:10.1007/s00521-020-05201-1
  • [7]

    Schneidereit, T., Breuß, M.: Solving Ordinary Differential Equations using Artificial Neural Networks - A study on the solution variance. Proceedings of the Conference Algoritmy, 21–30 (2020).

  • [8] Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks 9(5), 987–1000 (1998). doi:10.1109/72.712178
  • [9]

    Lagari, P.L., Tsoukalas, L.H., Safarkhani, S., Lagaris, I.E.: Systematic Construction of Neural Forms for Solving Partial Differential Equations Inside Rectangular Domains, Subject to Initial, Boundary and Interface Conditions. International Journal on Artificial Intelligence Tools

    29(5), 2050009 (2020). doi:10.1142/S0218213020500098
  • [10] Lagaris, I.E., Likas, A., Papageorgiou, D.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5), 1041–1049 (2000). doi:10.1109/72.870037
  • [11] Tsoulos, I.G., Gavrilis, D., Glavas, E.: Solving differential equations with constructed neural networks. Neurocomputing 72(10), 2385–2391 (2009). doi:10.1016/j.neucom.2008.12.004
  • [12] Flamant, C., Protopapas, P., Sondak, D.: Solving Differential Equations Using Neural Network Solution Bundles. arXiv:2006.14372, (2020).
  • [13]

    Piscopo, M.L., Spannowsky, M., Waite, P.: Solving differential equations with neural networks: Applications to the calculation of cosmological phase transitions. Physical Review D

    100(1), 016002 (2019). doi: 10.1103/PhysRevD.100.016002
  • [14] Rudd, K., Ferrari, S.: A constrained integration (CINT) approach to solving partial differential equations using artificial neural networks. Neurocomputing 155, 277–285 (2015). doi:10.1016/j.neucom.2014.11.058
  • [15] Golovashkin, D.L., Kazanskiy, N.L.: Mesh Domain Decomposition in the Finite-Difference Solution of Maxwell’s Equations. Optical Memory and Neural Networks 18(3), 203–211 (2009). doi:10.3103/S1060992X09030102
  • [16] Jagtap, A.D., Kharazmi, E., Karniadakis, G.E.: Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering 365, 113028 (2020). doi:10.1016/j.cma.2020.113028
  • [17] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. arXiv preprint:1412.6980, (2017).
  • [18] Antia, H.M.: Numerical methods for scientists and engineers. 1st edn. Hindustan Book Agency, New Delhi (2012).
  • [19] Dahlquist, G.G.: G-stability is equivalent to A-stability. BIT Numerical Mathematics 18(4), 384–401 (1978). doi:10.1007/BF01932018
  • [20] Fernández-Redondo, M., Hernández-Espinosa, C.: Weight initialization methods for multilayer feedforward. ESANN, 119–124 (2001).
  • [21]

    Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems

    2(4), 303–314 (1989). doi:10.1007/BF02551274