1 Introduction
Differential equations (DEs) are important models in many areas of science and engineering, as they often represent realworld phenomena [1]. A special class of DEs are initial value problems, describing the time evolution of a system. The variety of neural network approaches for solving DEs has increased over the last years and decades [2, 3, 4]. They mostly focus on obtaining a cost function out of the DE structure and given initial or boundary conditions. The cost function in this context has the characteristic of connecting the DE with the neural network framework [5, 6]. This may be achieved with socalled neural forms (NFs), which are in fact trial solutions satisfying the given conditions [7, 8, 9]
. The neural forms approach results in an unsupervised learning framework. In the end, the neural form represents the solution of the DE.
Other neural approaches combine unsupervised and supervised training, where the neural network outcome is compared to true (known) data corresponding to certain domain data [10, 11, 12]. Typically the unsupervised part arises from the DE structure, while given initial or boundary conditions are directly added to the cost function and are treated in a supervised way [13]. The resulting difference, the error, is then used for learning neural network parameters. Therefore, the neural network itself represents the solution of the DE after training in such approaches.
Turning to classical numerical methods for solving all kinds of differential equations, one may consider e.g. RungeKutta methods [14, 15] for time integration or the finite element method (FEM) [16]. In order to obtain high accuracy and robustness, many numerical schemes feature adaptive mechanisms regarding, e.g., step size control [1, 14] or mesh refinement [17, 18, 19]
. That is, certain areas of the solution domain may require more elements or grid points, in other words a refined mesh, for improved reliability and accuracy. Such adaptive mesh refinement techniques enable the mesh to be locally refined based on a suitable error estimate.
Several works offer neural network based strategies and approaches to generate optimal meshes or mesh refinements for use with the finite element method [20, 21]. However these approaches do not combine neural solution of DEs with mesh adaptivity, and also they stick to a traditional mesh and do not explore the approximation capability of neural networks. Predicting areas which are of interest in the sense of a required mesh refinement using neural networks is the objective of [22]. There a timeseries analysis is employed to predict elementwise solution gradient values. The inbound neural network yields an indicator based on local gradient values in space and time. This indicator is then used to predict whether a mesh refinement or a coarsening may be suitable. While in this method the mesh refinement indicator is realised by a neural network, the FEM is used for solving the PDE examples. Complementary to the latter approach, in [23] a learning strategy is developed which keeps the mesh fixed but selects the numerical scheme that gives locally high accuracy based on local gradients.
The most relevant related article in the context of our work may be the adaptive neural approach in [24], so let us discuss this work in higher detail. It features a feedforward neural network in a framework combining both supervised and unsupervised terms, similar to [10, 12]. The training process includes several evolution steps, each consisting of the optimisation over the training points combined with an evaluation of results at a finer grid. The latter is realised with the same set of neural network parameters obtained from the training step. It is proposed to start with a coarse grid and to perform local grid refinement whenever the resulting network errors differ. The method is developed for boundary value problems arising with stationary PDEs, like e.g. the Poisson equation. Results indicate that more complex neural network architectures (w.r.t. number of layers and neurons) or more training points may increase the accuracy.
Let us stress that in the discussed work [24], the mesh is refined but treated in a global fashion and not decomposed into subdomains. However, we also find the combination of domain decomposition with neural networks [12] which is also related to our method. The socalled physicsinformed neural networks may be uniquely defined in predefined, discrete subdomains which may feature different network sizes. At the subdomain interfaces, a physical continuity condition holds and the average solution of the two corresponding neural networks at each interface is enforced. Several neural networks learn the governing DE locally instead of using a single one for the entire domain, which results overall in a small network error.
Our contributions. We propose the wellengineered adaptive neural domain refinement (ANDRe), which has collocation (polynomial) neural forms inbound. These are optimised repeatedly over the domain. The domain itself is allowed to split into subdomains which may locally decrease in size, whenever the network error is not sufficiently small. Therefore, we combine the advantageous characteristics from domain decomposition and adaptive mesh refinement, and we embed into this approach the collocation neural forms framework recently developed in [5]. Furthermore, we embed into the described process a means to adapt the number of neurons used for optimisation in a subdomain. This may be considered as equivalent to an increase in reliability and accuracy of the approximation. Thus we also combine adaptive refinement of the domain with adaptivity in the neural sense.
The following Section introduces details on the collocation neural forms, their incorporate neural networks architecture and optimisation together with the domain fragmentation approach. Based on the latter, we continue to propose the adaptive domain refinement algorithm. Later, this approach is applied to three example initial value problems. The results are discussed in detail and we finish the paper by a conclusion with an outlook to future work.
2 The Methods
The overall aim is to solve initial value problems (IVPs) in form of
(1) 
with given initial values . We identify as the time derivative of . Let us note at this point, that may also denote a system of IVPs. In the following we will first focus on IVPs with only one equation and later provide the necessary information in order to extend the approach to systems of IVPs.
2.1 The Subdomain Collocation Neural Form (SCNF)
The neural forms approach [7] seeks to replace the solution function with a differentiable trial solution
(2) 
which connects the given IVP with a neural network term, incorporating the weight vector
. In Eq. (2), both and are problem specific and have to be constructed under the requirement of fulfilling the initial condition. Besides replacing the solution function, its time derivative is expressed as well by differentiating the constructed neural form.One of the possible neural form constructions, to which we will refer as classic neural form and which has been proposed in [7], is to set and . This configuration ensures to remove the impact of at the initial point , whereas then equals the initial value . in Eq. (1).
The subdomain collocation neural form (SCNF) approach now opts to extend the classic NF (Eq. (2)) in two directions, (i) by increasing the polynomial degree of the term as explained below and (ii) by introducing domain fragmentation, which is a novel approach to solve the initial value problem on subdomains in order to increase the numerical accuracy.
Since is a constant value, we find the resulting NF in Eq. (2) to resemble a first order polynomial in . Hence, we propose to extend the polynomial order of the classic neural form, inspired by collocation polynomials [1]. Therefore we transform
(3) 
This polynomial extension adds more flexibility to the approach. However, this is achieved in a different way than just increasing the number of hidden layer neurons in a single neural network, since additional networks arise that are connected to the factors , see [5]. Thereby, the weight matrix stores each weight vector as column vectors.
We thus discretise the domain by the collocation method employing a uniform grid with grid points (), so that our novel collocation neural forms approach for the IVP (1) leads to the cost function formulation
(4) 
Since the domain variable in effectively acts as a scaling of , we conjecture that a large domain size variation may introduce the need for a higher amount of training points or the use of a more complex neural network architecture. Having this in mind, it appears very natural to couple the collocation neural forms with a technique that refines the computational domain. To this end we will consider the nonadaptive version of domain fragmentation which opts to split the domain into separate a fixed number of equidistant subdomains.
That said, we split the solution domain into subdomains , with grid points per domain fragment. Now the neural form is resolved separately in each subdomain. The interfacing grid points overlap, e.g., the computed value at the last grid point of any subdomain is set to be the new initial value for the next subdomain . The general idea is visualised in Fig. 1, where the black/vertical marks represent the equidistantly distributed subdomain boundaries for the solution of an example IVP.
Summarising the construction up to now, the SCNF satisfies the new initial values in each domain fragment, namely
(5) 
The neural networks are thus now scaled by , which in fact may avoid higher scaling factors as by the formulation over the entire domain, depending on the subdomain size. Incorporated in Eq. (4), the SCNF time derivative reads
(6)  
In order to keep the overview of all terms and indices, we sum them up again: The th grid point in the th subdomain is denoted by , while is the initial point in the subdomain with the initial value . That is, and are overlapping grid points. In , holds. The matrix contains the set of the neural network weights in the corresponding subdomain. Finally, denotes the th neural network in .
The cost function employing the SCNF aims to minimise for each subdomain the energy
(7) 
Simultaneously to Eq. (4), the cost function now incorporates instead of .
If in Eq. (1) represents a system of IVPs, each solution function requires its own NF and the cost function derives from the sum over separate norm terms, i.e. one for each equation involved. We will address this extension in detail in the corresponding example later on.
2.2 Neural Network Architecture and Optimisation
Let us first give an overview on the the neural network architecture as displayed exemplary in Fig. 2. The term , c.f. Eq. (5), represents in general a feedforward neural network with one input layer neuron for the discretised domain data , hidden layer neurons and one output layer neuron. In addition, both input layer and hidden layer incorporate one bias neuron. The neural network output reads
(8) 
Here,
represents the sigmoid activation function. The weighted sum
incorporates the neural network weights (input layer neuron), (input layer bias neuron), (hidden layer neurons) and (hidden layer bias neuron). These weights are stored in the weight vector . The input layer passes the domain data , weighted by and , to the hidden layer for processing. The neural network output is again a weighted sum of the values and . Neural network training is the process of minimising the cost function, c.f. Eq. (7) with respect to the neural network weights. In practice the goal is to find a local minimum in the weight space (or energy landscape), which may consist of many extreme points. One epoch of training includes the computation of the gradient w.r.t. the network weights
, averaged over all training points (grid points). Usually it takes several epochs for a successful training. The computed gradient is then used in order to update the neural network weights using Adam optimisation [25]. With full batch training, the cost function also returns an (averaged) scalar value for each epoch. We will refer to this value as training error. It provides information about the training status and whether the minimisation of can be considered as accomplished or not. Let us recall, that each is separately optimised.Details on the optimisation
Let us consider the example IVP
(9) 
for which we find , inbound in the cost function, in Eq. (7) as
(10) 
The minimisation of Eq. (7) aims to get in Eq. (10) as close to zero as possible. That is, the expression of interest actually reads
(11) 
Here, the values of are predetermined by the domain, whereas the SCNF and its time derivative additionally depend on the neural network weights and their optimisation. Hence, Eq. (11) can be considered as satisfied for various combinations of . Hence, the results may highly depend on the final location in the weight space.
One reason for this circumstance may relate to the complexity of the energy landscape which inherent multiple (local) minima that can lead to several combinations of the left hand side in Eq. (11). Not all of these combinations must be real or useful solutions for the given domain training points. This issue may occur, e.g., when the initial weights are far away from a suitable minimum for a helpful approximation, when there is a minimum nearby the initialisation with unfavourable optimisation parameters, such that the optimiser can get stuck inside. However, fine tuning all the incorporated computational parameters is an ungrateful task since some of these are not independent of each others [26, 9].
3 The Novel Adaptive Neural Domain Refinement (ANDRe)
We propose in this section the embedding of the previously introduced SCNF approach into an adaptive algorithm. The resulting refinement strategy features two components, (i) SCNF training status arising from the cost function (Eq. (7)) in each subdomain serving as an error indicator and (ii) an algorithmic component to perform the domain refinement. In Fig. 3, we consider an artificial example to sketch the principle behind ANDRe in a visual way. The basic idea is to optimise the cost function for a given number of equidistant training points in each subdomain, until the resulting training error meets a predefined error bound. To obtain the subdomains, the algorithm starts with the cost function optimisation on the entire domain (Fig. 3 (1.)). If the predefined error bound is not full filled, the domain is split in half. Now the optimisation task starts again for the left half since we only know the initial value for this subdomain. In case the training error again fails to go below , the current (left) subdomain is reduced in size (Fig. 3 (2.)). Whereas a splitting is only performed when the computation takes place in the rightmost subdomain and is not satisfied. The process of comparing the training error to , reducing the current subdomain and starting the optimisation another time, is repeated until (Fig. 3 (3., 4.)).
Algorithm summary
Since we only know that this condition holds for the equidistant training points, the result has to be verified in the corresponding subdomain. That means, the cost function is evaluated by intermediate grid points and the resulting verification error is compared to the training error. If both are sufficiently small, the SCNF in this subdomain is assumed to be learned and remains untouched from now on. In the learned subdomain, the rightmost training point and the corresponding SCNF value (Eq. (5)) are supposed to be the initial point and initial value for the adjacent subdomain, that represents the next time interval.
Now, the whole process starts again with two subdomains and a new initial condition for the new (sub)domain. However, the current (new) subdomain starts at the right boundary of the first (learned) subdomain and ends at the right boundary of the (entire) domain. Therefore, the already learned subdomain is excluded from further computations.
If a subdomain becomes too small or if the error increases after a split, the computational parameters are adjusted in a predefined, automated way (see below).
Let us now provide detailed information about ANDRe, which is shown as a flowchart in Fig. 4. Starting point is the choice of the collocation neural form order () and the subdomain adjustment factor , which acts as a size reduction whenever a decrease is necessary. For optimisation we use equidistant training points TP. After each complete optimisation, the training error will be compared to to check if it is sufficiently small. Another important constant is to verify the SCNF solution in the corresponding subdomain. This verification is realised by evaluating the cost function with the previous learned weights, at intermediate verification points VP. We define as the index of the subdomain, in which the SCNF is currently solved and represents the total amount of subdomains. The latter is not fixed and will increase throughout the algorithm. Finally, the very first domain is set to be the entire given domain .
Flowchart explanation:
The first processing operation
(12) 
covers setting the initial architecture parameters such as number of hidden layer neurons, Adam learning rate and initialising the weights for .
Afterwards the optimisation problem
(13) 
is solved by training the SCNF at given equidistant TP over the entire domain .
Then the first decision block compares the training error (after the training process has ended) to the error bound :
(14) 

Eq. (14) NO: In case the training error did not go below , the size of the current subdomain will be reduced. But first, another decision has to be made here. Namely, has been solved for the first time on the current subdomain or in other words, is the current domain index equal to number of total subdomains :
(15) 
Eq. (15) YES: That means the right boundary is and we have to split the current subdomain , which leads to an increase of the number of total subdomains by (). The boundaries now have to be adjusted with the left one to remain unchanged, while the former right boundary is now scaled by , after is set to be the right boundary of domain . Afterwards, the algorithm leads back to Eq. (12).

Eq. (15) NO: In this case the current subdomain has already been split up. Now the right boundary has to be adjusted in order to decrease the current subdomain size. Beforehand we check for a complex condition (highlighted in blue) to ensure that a subdomain does not become too small. Additionally we also check if the training error decreased compared to the prior computation on the same subdomain . That is, the algorithm compares the training error from the formerly larger subdomain to the current, size reduced subdomain . The condition itself may come in different shapes. We decided to check for one of the
(16)


Eq. (14) YES: In case of the training error being smaller or equal compared to , the current subdomain has been successfully learned by means of a sufficiently small training error. However, a small training error sometimes can be misleading and the numeric error remains unsatisfied. Hence, a second stage of verification is added to the algorithm (highlighted in purple). That is, the first operation here reads
(19) where VP represents grid points, intermediate to TP. In order to compute Eq. (19), the already learned weights are used. This processing operation is followed by
(20) Both and are scalar values and follow Eq. (7) with different grid points . Then the defined difference is compared to a second error bound , namely
(21) 
Eq. (21) YES: The current subdomain can be considered as learned.
Now it is necessary to determine, whether we are in the last subdomain (right boundary is ) or if there is still one subdomain to solve the optimisation problem on, namely
(22) 
Eq. (22) NO: There is at least one subdomain left and therefore the current subdomain index is updated to in order to solve the optimisation problem on the adjacent subdomain. Additionally we reset all the possibly adjusted parameters to the initial ones. Thus we make sure to not overuse the variable parameters in regions where the solution computes by using the initial ones. The algorithm then leads back to Eq. (12).

Eq. (22) YES: All subdomains have been successfully learned and the initial value problem is entirely solved.
We developed ANDRe in three steps, making it a unique and wellengineered adaptive neural algorithm for domain refinement. Excluding the coloured parts in Fig. 4, the black parts represent a fully functional algorithm that can refine the domain in an adaptive way with the focus laying on the training error, c.f. Eq. (14). However, it turned out that small training error do not necessarily result in a comparable numerical error, presumably due to possible overfitting. Therefore we included a verification stage, highlighted in Fig. 4 as purple, to reduce the impact of overfitting on the end result. The minimised cost function is evaluated at grid points, intermediate to the training points. Therefore we are able to detect possible overfitting, whenever the resulting errors do not match. Furthermore, in some examples we recognised that the preset neural network architectures may not be flexible enough to learn certain subdomains. Hence, we upgraded ANDRe to incorporate an automated parameter adjustment mechanism, highlighted in Fig. 4 as blue. Whenever a subdomain becomes too small, network and optimisation related parameters may be rebalanced in a predefined way. We will later provide experimental evidence to prove the capabilities of ANDRe.
4 Examples and Results
In this section we discuss computational results for three different IVPs, solved by ANDRe. Beforehand, the inbound parameters and methods are further specified.
The neural networks for the SCNF follow Eq. (8) and are set to initially incorporate five sigmoid hidden layer neurons together with one bias each in the input and hidden layer. All networks are initialised with constant weights at the very beginning of the algorithm. Once a subdomain has been learned, the corresponding weights are used to initialise the SCNF in the neighbouring subdomain. The initial learning rate for Adam optimisation is set to be e3 with its additional parameters chosen as employed in [25]. Throughout the algorithm, we do not change the use of TP as the number of training points and the verification with intermediate grid points, VP . Additionally we employ incremental training as in [10].
The training process itself lasts at least 5e4 epochs up to a maximum of 1e5 epochs. In between, a training error below e4 stops the training, since this condition is directly related to the algorithm in Eq. (14). The condition for verifying the training error is met, if it drops below e3. Experiments are performed with SCNF order throughout all examples. We find that this is sufficient for the proposed ANDRe algorithms to obtain useful results. Lower neural form orders in most cases worked in the beginning very well (sometimes even better), but results soon started to get worse with a significantly larger amount of necessary subdomains. On the other hand, higher orders will significantly increase the computational cost. We have documented in [5], that a SCNF order of is sufficient to solve IVPs.
Let us now comment on the parameter adjustment since this part of the algorithm requires a lot of fine tuning. After several experiments with different parameter adjustment methods, not documented here, we propose to increase the initial Adam learning rate and the number of hidden layer neurons in a predefined order. That is, the initial learning rate can be increased several times before the number of hidden layer neurons rearranges by two additional neurons. The latter resets the learning rate to its default parameter.
As for a theoretical justification of this step, let us comment that suitable minima in the weight space are probably located at different positions, depending on the subdomain. Even though we transfer the learned network weight to an adjacent subdomain as initial weights, the existence of a nearby local minimum may be possible, but is not guaranteed. The higher the initial learning rate, the farther the optimiser can travel in the weight space, adding more flexibility and increasing the chance to find a suitable one.
Has a subdomain in this way been successfully learned, both parameters are reset to their initial values. Let us recall, that the parameter adjustment does not take place during an optimisation cycle, it rather appears after completing one. In other words, we do not perturb the neural network training during the optimisation process.
The numerical error is defined as the averaged norm of the difference between the exact solution and the corresponding SCNF in each subdomain
(23) 
whereas averages the numerical error of the subdomains
(24) 
In [5] we have shown, that the SCNF with a fixed number of subdomains is capable of solving IVPs on larger domains. Now we want to show that ANDRe is able to automatically determine the suitable number of subdomains.
4.1 Example IVP 1
As a first example, we take on the following IVP with constant coefficients
(25) 
The analytical solution
(26) 
incorporates heavily oscillating and increasing characteristics, similar to instabilities. This example is still relatively simple and serves to demonstrate the main properties of our approach.
In Fig. 5, we show both the analytical solution (orange) and ANDRe solution (black/dotted). The latter has the total amount of training points from the required 76 subdomains displayed, resulting in a total numerical error of e3. The training points are not equidistantly distributed and because of the general trend of the solution, we would expect the density to be higher at the peaks and dips, while declining in between. However, this does not seem to be the case. While at some local extreme points there are only one or two black dots, they are tightly packed around others.
Fig. 7 shows the subdomain distribution related to Fig. 5 in the beginning for . We find the domain size adjustment parameter to show a significant influence here. It appears to be very important where one subdomain ends, because this may cause the adjacent one to be more difficult to solve. Please note that this statement holds under the consideration of the chosen neural network parameters.
Let us turn to Fig. 6, where the numerical error (blue) is compared to the training error (orange/marked) and the verification error (green) of the successfully learned subdomains. Commenting on the relation between the training and the verification error , we see that both are most of the time very similar. This implies, that the corresponding subdomain has been effectively learned up to the desired state. However, the discrepancy between both in the left part of Fig. 6, which is still in the tolerated range defined by the error bound , seems to have a significant effect on the numerical error.
Although we find a local maximum for the numerical error (blue) in the beginning, it decreases afterwards and remains in a certain region with local minima and maxima. However, we find the decreasing behaviour to be an important characteristic compared to numerical methods, where one would expect the error to accumulate.
In Fig. LABEL:FIGexp3.1.3, the sizes of the learned subdomains are shown. The general trend points to smaller subdomains throughout the computation. However, we witness that there are local differences with bigger or smaller subdomains and this is what we expected from the algorithm. The subdomain size is reduced until it is sufficiently small and that can be individual for each part of the solution. Nonetheless, let us compare both the numerical error in Fig. 6 and the subdomain size in Fig. LABEL:FIGexp3.1.3. In the first ten subdomains there seems to be a certain correlation, a larger size in this range results in a larger numerical error. A smaller verification error bound to deal with the discrepancy between training an verification error in Fig. 6 may have resulted in another size reduction with better results. However, the statement that a smaller subdomains size implies a better numerical error does not hold here.
4.4655  15.000  13.1031  12.4874  181.7023  1e3 
4.4655  9.7327  6.9323  0.3427  178.3671  1e3 
4.4655  7.0991  0.4119  3.0768  5.5600  1e3 
4.4655  7.0991  0.4545  9.9983e5  4.4363  3e3 
4.4655  5.7823  1.5456e2  1.4383e3  4.7809e3  3e3 
4.4655  5.7823  1.0168e2  4.3802e4  1.0789e3  5e3 
4.4655  5.1239  3.6418e3  9.9998e5  1.1720e4  5e3 
In Table 1, quantitative results of the solution process of one subdomain are displayed. The left boundary remains constant while the right subdomain boundary is adjusted as in Eq. (18). The error values demonstrate the appearance of nonuniform learning during the solution process and show how important the verification (purple) and the parameter adjustment (blue) components of ANDRe are, c.f. Fig. 4. While decreases (as intended) for the first two boundary reductions, it increases for the third one, which leads to a growth of the initial learning rate . Now for the same subdomain size, decreased significantly and would already match the condition defined by in Eq. (14). However, trying to verify the result with (see Eq. (21)) shows that this solution is not correct, although the training error is small. That circumstance leads to another size reduction and in the following to another increase of until both and are sufficiently small.
4.2 Example IVP 2
The second example IVP
(27) 
incorporates nonconstant coefficients. The analytical solution
(28) 
includes trigonometric and exponentially increasing terms. In contrast to the previous example IVP in Section 4.1, Eq. (27) is solved on an even larger domain with extensively increasing values. Experiments on this example IVP aim to show that ANDRe is capable of solving timeintegration problems on large domain with small neural networks. This is of particular importance as the domain size has been identified in several works as an intricate parameter of the underlying problem, see also the detailed study in [26].
As displayed in Fig. 8, the ANDRe solution fits the analytical solution (orange) on a qualitative and useful level. In total, the algorithm has finished after splitting the solution domain into 32 subdomains with an average numerical error of e3. In the number of hidden layer neurons increased to 7 after adjusting the initial learning rate up to e1. The alternating behaviour with switching between local minima and maxima compares to the results from Example IVP 1 in 4.1.
While Fig. 9(a) shows the ANDRe solution for the entire domain, Fig. 9(b),(c),(d) are zoomed in, to provide a more detailed view on certain areas. We observe the local extreme point in Fig. 9(c) to be covered by approximately equidistant subdomain (on a qualitative level). This however, does not hold for the extreme points in Fig. 9(b),(d). Especially in the latter, the grid points are densely packed. This may both indicate and confirm, that the domain refinement not only depends on the complexity of a certain region, but on finding suitable minima in the weight space in order to get the training error below the error bound .
Let us comment in some more detail on the behaviour of both training and verification error, displayed in Fig. 9(a). While both match, they undergo the preset of =1e4 in many cases by several orders. Two adjacent subdomains may have a training/verification error with significant differences. Especially towards the end of the integration domain in this example, this effect is maximised. In Fig. 9(b), the subdomain sizes are shown and while the last two subdomains share approximately the same size, their error vales differ a lot. However, the numerical error in Fig.9(a) remains almost constant, although the other errors become very small and reach a global minimum in subdomain 29.
In Fig. 10 we see the first and second SCNF orders to dominate the results. However, the higher orders also contribute to the solution, making our approach also adaptive in the approximation order, indirectly. Let us recall, that the factors for different orders , c.f. Eq. (5), dictate the impact of each neural network. Hence, subdomains with a size below 1 imply a smaller influence of higher SCNF orders. This circumstance can be challenging for IVP solution with large values. Nonetheless, we see that ANDRe was able to solve this problem, although the number of hidden layer neurons had to be increased in one subdomain. That is, we see our approach to enable the parameter adjustment when necessary, to be justified by the results.
From the perspective of employing a condition for minimisation of the cost function, the question arises how the algorithm outcome is affected by different error bound values . Table 2 shows the mean values of the numerical error, training error and verification error. The choice of has a direct impact on each error value, they all decrease. However, an experiment for 6 did not get further than . We terminated the computation after the number of hidden layer neurons crossed fifty five. In this subdomain, the smallest training error was 3.6656e6 but the optimisation did not manage to go below e6. This phenomenon may again relate to the complexity of the cost function energy landscape. Either such a local minimum could not be found by the optimiser for various reasons, or the even global minimum is still too shallow for that error bound.
Concluding this example IVP, ANDRe has shown its potential for solving timedependent ODEs on large domain with extensively increasing values. At the same time, the discussion of this example confirms and illustrates the beneficial properties of our approach
1e1  8.7833e2  5.5550e3  5.5408e3 
1e2  3.0016e2  2.0467e3  2.0031e3 
1e3  1.2667e2  2.5357e4  2.4736e4 
1e4  2.6474e3  3.8890e5  3.7961e5 
1e5  5.6461e4  3.9608e6  3.8621e6 
4.3 Example IVP 3
As a third example we employ a system of coupled first order IVPs. That is, the LotkaVolterra equations [27]
(29)  
(30) 
with parameters , , , . The initial values are , , , depending on the subsequent experiment. The chosen value for will be explicitly addressed. This example shows that our approach is able to obtain useful results for systems of IVPs.
The above mentioned equations describe the relation between predators and their prey . The parameter determines the reproduction rate for the prey, while the population of the predator declines proportional to , whenever the other species is not presence. In the presence of each other, the population of the prey decreases by and the number of predators grow by . This model includes idealisations, for example an unlimited resource of food for the prey [27].
We obtain a cost function from the IVP system in Eq. (29) as the sum of norms of each equation. We use and as shortcuts:
(31) 
This equation is then subject to optimisation.
In Fig. (11) both (a) the RungeKutta 4 solution and (b) the ANDRe solution are shown. The latter took 62 subdomains to successfully finish the computation. For a fair comparison on the quantitative side, both method should use equal amount of training points, which in this case would arise from ANDRe solution. However, we are more interested in a qualitative comparison, since the RungeKutta 4 is known to provide very good results. ANDRe found a useful solution for the LotkaVolterra equations as seen in Fig. 11(b), since beside the plot type there is no visible difference from the qualitative perspective.
In contrast to the previous examples, plotting both solutions in different diagrams enables us to get a better view on the discrete ANDRe solution points. Fig. 12 shows the solution related to three different initial values for the predators. Although Fig. 11(b) and Fig. 12(b) only display the solution at the training points, the trained SCNF is capable of evaluating the solution at every arbitrary discrete grid point over the entire domain, which is a tremendous advantage over numerical integration methods. The verification step in ANDRe also ensures to reduce possible overfitting.
Turning to Fig. 13 and comparing the training and verification errors for the LotkaVolterra equations with the results for the other two example IVPs in Fig. 9 and Fig. 6, we observe a pattern. That is, in all three examples, the algorithm related errors (orange/marked and green) show heavily differences throughout the subdomains. Sometimes adjacent subdomains inherent the above mentioned errors in range of several orders. In general, this behaviour may remind of oscillations.
5 Conclusion and future work
The proposed ANDRe is based on two components. First, the resulting training error arising from the inbound subdomain collocation neural form (SCNF) acts as a refinement indicator. The second component is the proposed algorithm which refines the solution domain in an adaptive way. We find ANDRe to be a dynamic framework adapting the complexity of a given problem. In this paper, we have shown that the approach is capable of solving timedependent differential equations incorporating various interesting characteristics, in particular including large domains and extensive variation of solution values.
In contrast to numerical solution methods for solving initial value problems, the numerical error does not inevitable accumulate over the subdomains. It can rather decrease again due to the flexibility of the neural forms approach. A significant advantage of ANDRe is the verification step to make sure that the solution is also useful outside of the chosen training points. All this makes ANDRe a unique, conceptually useful and wellengineered framework.
However, several questions remain open for future work. While there seems to be a certain and natural correlation between the training and the numerical error, in reality this correlation appears to be sometimes a sensitive issue. It is unclear yet, whether some minima in the cost function energy landscape contribute better to the relation between training and numerical error, or not. However, we find the training error to already serve as a useful error indicator in ADNRe. In addition, we would like the numerical error to proportionally correspond to the training error. If we could manage to achieve an improvement in the correlation between both errors or understand the relation more in detail on the theoretical level, we think that the ANDRe approach can perform even better in the future.
We also find relevant to further investigate the computational parameters and fine tuning the parameter adjustment part of ANDRe. The verification step may be considered as a part in the optimisation process, to predict early, whether a further optimisation in the corresponding subdomain is useful or a size reducing is mandatory. This could lower the computational cost but has to be incorporated and tested carefully to not lose any information during the optimisation process.
Since ANDRe represents an additional discretisation in time, the approach should also work for PDEs with both time and spatial components and it appears natural to extend in future work the method to multidimensional differential equations.
Acknowledgement
This publication was funded by the Graduate Research School (GRS) of the Brandenburg University of Technology CottbusSenftenberg. This work is part of the Research Cluster Cognitive Dependable Cyber Physical Systems.
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
References
 [1] H.M. Antia: Numerical Methods for Scientists and Engineers, 1st edition, Hindustan Book Agency, New Delhi (2012).

[2]
A.J. Maede Jr, A.A. Fernandez: The numerical solution of linear ordinary differential equations by feedforward neural networks Mathematical and Computer Modelling 19.12, pp. 1–25 (1994). doi:10.1016/08957177(94)900957
 [3] N. Yadav, A. Yadav, M. Kumar: An Introduction to Neural Network Methods for Differential Equations, SpringerBriefs in Applied Sciences and Technology, Springer Dordrecht, Heidelberg New York London (2015). doi:10.1007/9789401798167

[4]
M.W.M.G. Dissanayake, N. PhanThien: Neuralnetworkbased approximations for solving partial differential equations, Communications in Numerical Methods in Engineering 10.3, pp. 195–201 (1994). doi:10.1002/cnm.1640100303
 [5] T. Schneidereit, M. Breuß: Collocation Polynomial Neural Forms and Domain Fragmentation for Initial Value Problems, arXiv:2103.15413, (2021).
 [6] S. Mall, S. Chakraverty: Application of Legendre Neural Network for solving ordinary differential equations, Applied Soft Computing 43, pp. 347–356 (2016). doi:10.1016/j.asoc.2015.10.069
 [7] I.E. Lagaris, A. Likas, D.I. Fotiadis: Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks 9.5, pp. 987–1000 (1998). doi:10.1109/72.712178

[8]
P.L. Lagari, L.H. Tsoukalas, S. Safarkhani, I.E. Lagaris: Systematic Construction of Neural Forms for Solving Partial Differential Equations Inside Rectangular Domains, Subject to Initial, Boundary and Interface Conditions, International Journal on Artificial Intelligence Tools 29.5, pp. 2050009 (2020). doi:10.1142/S0218213020500098

[9]
T. Schneidereit, M. Breuß: Solving Ordinary Differential Equations using Artificial Neural Networks  A study on the solution variance, Proceedings of the Conference Algoritmy, pp. 21–30 (2020).

[10]
M.L. Piscopo, M. Spannowsky, P. Waite: Solving differential equations with neural networks: Applications to the calculation of cosmological phase transitions, Physical Review D 100.1, pp. 016002 (2019). doi: 10.1103/PhysRevD.100.016002
 [11] I.E. Lagaris, A. Likas, D.G. Papageorgiou: Neuralnetwork methods for boundary value problems with irregular boundaries, IEEE Transactions on Neural Networks, 11.5, pp. 1041–1049 (2000). doi:10.1109/72.870037
 [12] A.D.Jagtap, E. Kharazmi, G.E. Karniadakis: Conservative physicsinformed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems, Computer Methods in Applied Mechanics and Engineering 365, pp. 113028 (2020). doi:10.1016/j.cma.2020.113028
 [13] J. Blechschmidt, O.G. Ernst: Three ways to solve partial differential equations with neural networks — A review, GAMMMitteilungen 44.2, pp. e202100006 (2021). doi:10.1002/gamm.202100006
 [14] E. Hairer, S.P. Nørsett, G. Wanner: Solving Ordinary Differential Equations 1: Nonstiff Problems, 2nd edition, Springer Series in Computational Mathematics, SpringerVerlag, Berlin Heidelberg (1993). doi:10.1007/9783540788621
 [15] E. Hairer, G. Wanner: Solving Ordinary Differential Equations 2: Stiff and DifferentialAlgebraic Problems, 2nd edition, Springer Series in Computational Mathematics, SpringerVerlag, Berlin Heidelberg (1996). doi:10.1007/9783642052217
 [16] O.C. Zienkiewicz, R.L. Taylor, J.Z. Zhu: The Finite Element Method: Its Basis and Fundamentals, 7th edition, Elsevier ButterworthHeinemann, Oxford (2013). doi:10.1016/B9781856176330.000198
 [17] W. Bangerth, R. Rannacher: Adaptive Finite Element Methods for Differential Equations, Lectures in Mathematics, Springer Basel AG, Basel (2003). doi:10.1007/9783034876056
 [18] M.J. Berger, J. Oliger: Adaptive mesh refinement for hyperbolic partial differential equations, Journal of Computational Physics 53.3, pp. 484–512 (1984). doi:10.1016/00219991(84)900731
 [19] R. Verfürth: A posteriori error estimation and adaptive meshrefinement techniques, Journal of Computational and Applied Mathematics 50.1, pp. 67–83 (1994). doi:10.1016/03770427(94)902909
 [20] S. Alfonzetti: A Finite Element Mesh Generator based on Adaptive Neural Network, IEEE Transactions on Magnetics 34.5, pp. 3363–3366 (1998). doi:10.1109/20.717791

[21]
J. Bohn, M. Feischl: Recurrent neural networks as optimal mesh refinement strategies, Computers and Mathematics with Applications 97, pp. 61–76 (2021). doi:10.1016/j.camwa.2021.05.018
 [22] L. Manevitz, A. Bitar, D. Givoli: Neural network time series forecasting of finiteelement mesh adaptation, Neurocomputing 63, pp. 447–463 (2005). doi:10.1016/j.neucom.2004.06.009
 [23] M. Breuß, D. Dietrich: Fuzzy Numerical Schemes for Hyperbolic Differential Equations, KI 2009: Advances in Artificial Intelligence, Lecture Notes in Computer Science 5803, Springer, Berlin, Heidelberg, pp. 419–426 (2009). doi:10.1007/9783642046179_53
 [24] C. Anitescu, E. Atroshchenko, N. Alajlan, T. Rabczuk: Artificial neural network methods for the solution of second order boundary value problems, Computers, Materials and Continua 59.1, pp. 345–359 (2019). doi:10.32604/cmc.2019.06641
 [25] D.P. Kingma, J. Ba: Adam: A Method for Stochastic Optimization, arXiv:1412.6980, (2017).
 [26] T. Schneidereit, M. Breuß: Computational characteristics of feedforward neural networks for solving a stiff differential equation, arXiv:2012.01867, (2020).
 [27] M.C. Anisiu: Lotka, Volterra and their model, Didáctica mathematica 32, pp. 9–17 (2014).
Comments
There are no comments yet.