DeepAI
Log In Sign Up

Accelerating numerical methods by gradient-based meta-solving

In science and engineering applications, it is often required to solve similar computational problems repeatedly. In such cases, we can utilize the data from previously solved problem instances to improve the efficiency of finding subsequent solutions. This offers a unique opportunity to combine machine learning (in particular, meta-learning) and scientific computing. To date, a variety of such domain-specific methods have been proposed in the literature, but a generic approach for designing these methods remains under-explored. In this paper, we tackle this issue by formulating a general framework to describe these problems, and propose a gradient-based algorithm to solve them in a unified way. As an illustration of this approach, we study the adaptive generation of parameters for iterative solvers to accelerate the solution of differential equations. We demonstrate the performance and versatility of our method through theoretical analysis and numerical experiments, including applications to incompressible flow simulations and an inverse problem of parameter estimation.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

05/04/2021

Personalized Algorithm Generation: A Case Study in Meta-Learning ODE Integrators

We study the meta-learning of numerical algorithms for scientific comput...
02/14/2021

Multi-Objective Meta Learning

Meta learning with multiple objectives can be formulated as a Multi-Obje...
01/26/2018

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

Meta-learning allows an intelligent agent to leverage prior learning epi...
02/27/2019

Provable Guarantees for Gradient-Based Meta-Learning

We study the problem of meta-learning through the lens of online convex ...
07/16/2018

Meta-Learning with Latent Embedding Optimization

Gradient-based meta-learning techniques are both widely applicable and p...
06/17/2020

MetaSDF: Meta-learning Signed Distance Functions

Neural implicit shape representations are an emerging paradigm that offe...
11/22/2022

A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms

Meta-learning of numerical algorithms for a given task consist of the da...

1 Introduction

It is common and important in science and engineering to solve similar computational problems repeatedly. For example, in computational fluid dynamics, many methods involve solving a Poisson equation to compute the pressure field in every time step of the simulation Ferziger2001-kj. Another example is systems biology, where an important but challenging problem is to estimate parameters of mathematical models for biological systems from observation data, and solving this inverse problem often requires many numerical simulations Moles2003-nn; Chou2009-pr. In these situations, we can utilize the data of the previously solved problems to solve the next similar but unseen problems more efficiently, and machine learning is a natural and effective approach for this.

Thus, in recent years, many learning-based methods have been proposed for repeated solutions of computational problems. One direction of the research is selecting algorithms or parameters of algorithms from the results of previously solved problems. In Hutter2011-ck, the authors propose a general method to tune parameters of algorithms for sets of problem instances by sequential model-based optimization. In Gupta2017-hf, the problem of application-specific algorithm selection is formulated using the PAC learning framework, and it is studied further in Balcan2021-qk; Mitzenmacher2021-nu. Another direction is directly predicting solutions to problems of interest by machine learning. For example, in Tompson2017-yw; Tang2017-wc; Ajuria_Illarramendi2020-tb; Ozbay2021-qt; Cheng2021-ua

, convolutional neural networks are used to predict the solutions of Poisson equations, and

Fan2020-ke

proposes neural network architectures inspired by the linear perturbative analysis to solve electrical impedance tomography problems. In

Pfaff2021-uw, a methodology to learn mesh-based simulations using graph neural networks is proposed, and Li2021-bl

learns a resolution-invariant mapping from a family of partial differential equations (PDEs) to the solution space by utilizing the fast Fourier transform.

In addition to these directions, more recent studies actively explore the potential of tightly coupled machine learning and numerical methods Baker2019-vs; Brunton2019-sh; Karniadakis2021-tf; Thuerey2021-xr. For example, Ajuria_Illarramendi2020-tb; Ozbay2021-qt; Um2020-zx; Huang2020-oa; Vaupel2020-zu; Venkataraman2021-zv propose to use predicted solutions by machine learning as initial guesses of subsequent traditional numerical methods, and Hsieh2018-ey combines a neural network and an iterative solver to accelerate it while maintaining convergence guarantee. In Um2020-zx, authors report a significant error reduction by correcting solutions using neural networks trained with PDE solvers. In this line of research, there are several works where meta-learning approaches are taken to solve computational problems Venkataraman2021-zv; Chen2022-tx; Guo2021-bw; Psaros2022-ym; Liu2021-vw; Feliu-Faba2020-yj. Meta-learning, or learning to learn, leverages previous learning experiences to improve future learning performance Hospedales2021-ib, which fits the motivation of utilizing the data from previously solved equations for the next one. For instance, Chen2022-tx uses meta-learning to generate smoothers of the Multi-grid Network for parametrized PDEs, and Guo2021-bw

proposes a meta-learning approach to learn effective solvers based on the Runge-Kutta method for ordinary differential equations (ODEs). In

Psaros2022-ym; Liu2021-vw, meta-learning is used to accelerate the training of physics-informed neural networks for solving PDEs.

Although many methods have been proposed in this direction, they are often problem-specific. In other words, there lacks both a unified framework to explain them and a general design pattern to adapt them to new problem settings. On the other hand, in the machine learning literature, there exists a general methodology - gradient-based meta-learning Finn2017-qg - that covers a variety of meta-learning problem settings.

In this paper, we generalize this approach to yield a framework, which we call gradient-based meta-solving (GBMS), that encompasses both learning and computational problems. This offers a general means to understand and develop learning-based algorithms to speed up computation. As an illustration of our approach, we apply GBMS to accelerate the solution to ODEs and PDEs with iterative solvers by generating parameters of the solvers adaptively for each problem instance. We show the advantage of the proposed algorithm over the baseline classical and learning-based approaches through theoretical analysis and numerical experiments. Finally, we incorporate the algorithm into practical applications, including incompressible flow simulations and parameter estimation of chemical reactions, and demonstrate its versatility and performance.

2 Gradient-based meta-solving

In this section, we introduce our core idea of gradient-based meta-solving. First, we formulate a class of problems, which we call meta-solving, that includes both ordinary meta-learning and learning-based computational problems. We then propose a general gradient-based algorithm for solving them. Under our formulation, many methods proposed in related works can be regarded as special cases of the GBMS algorithm.

2.1 General formulation of meta-solving

Let us now introduce the general formulation of meta-solving. We first fix the required notations. A task is a tuple consisting of a dataset , a solution space

, and a loss function

. A solution space is a set of parametric candidate solutions. A loss function is a real-valued function, which measures how well the task is solved. To solve a task means to find an approximate solution which minimizes . Meta-solving considers the solution of not one, but a distribution of tasks by a learnable solver. Thus, we consider a task space

as a probability space that consists of a set of tasks

and a task distribution , which is a probability measure defined on a suitable -algebra on . A solver is a function from to , where . is the parameter of , and is its parameter space. Here, may or may not be trainable, depending on the problem. Then, solving a task by an algorithm with a parameter is denoted by . A meta-solver is a function from to , where is a parameter of and is its parameter space. A meta-solver parametrized by is expected to generate an appropriate parameter for solving a task with a solver , which is denoted by . Then, by using the notations above, our meta-solving problem is defined as follows:

Definition 1 (Meta-solving problem).

For a given task space , solver , and meta-solver , find which minimizes .

Note that the formulation of meta-solving problems shares the goal with the framework introduced in Gupta2017-hf, but differs in the following sense. In Gupta2017-hf, the problem of application-specific algorithm selection is formulated as a statistical learning problem using the PAC learning framework, but the basic framework in Gupta2017-hf is defined to select an algorithm for a distribution of problem instances. Although it is extended to algorithm selection per instance, the framework is based on learning performance predictors that predict the performance of algorithms for each instance. On the other hand, our meta-solving framework is designed to configure an algorithm per instance, and directly optimizes the meta-solver without predicting performance.

We present some familiar examples, which can be regarded as special cases of the meta-solving problem. First, we can see that conventional meta-learning problems fall in this formulation.

Example 1 (Few-shot learning).

As an example of the meta-learning problem, we take the few-shot regression problem with MAML Finn2017-qg. The components of the problem are the following. The task is to learn a regression model from data. The dataset is , which satisfies for an unknown function . is divided into the training set and validation set . The solution parameter space is a weights space of a neural network () that models . Note that because the architecture of is shared across all tasks. The approximate solution is the trained weights of , which is obtained by training on . The loss function is the mean squared error (MSE) on . The task distribution is determined by the distribution of the target function and distribution of samples . The solver is the single step gradient descent to minimize . Its parameter is initial weights of , so . Thus, , where is a learning rate. The meta-solver is considered as a constant function that returns its parameter for any task . The parameter is expected to be an appropriate initial weights for all to be fine-tuned easily. Thus, , and . Note that the output of the meta-solver, the initial weights , does not depend on . Then, the few-shot learning problem is defined as a meta-solving problem, which is

(1)

In addition to the conventional learning problems, we can regard other computational problems, such as solving a differential equation, as a task of the meta-solving problem. In contrast with MAML, the inner-loop learning is now replaced with a family of solvers for differential equations. This necessitates the distinction of the meta-solver parameter space and the solution space . Moreover, the meta-solver has to produce a task-specific parameter for the inner solver.

Example 2 (Solving differential equations).

Suppose that we need to repeatedly solve similar instances of a class of differential equations with a given numerical solver. The solver has a number of hyper-parameters, which sensitively affect accuracy and efficiency depending on the problem instance. Thus, finding a strategy to optimally select solver hyper-parameters given a problem instance can be viewed as a meta-solving problem. The components of the problem are the following. The task is to solve a differential equation. In this example, suppose that the target equation is the Poisson equation with Dirichlet boundary conditions . The dataset contains data of the differential equation, . The solution parameter space is . The loss function measures the accuracy of . In this example, the -norm of the residuals obtained by substituting the approximate solution into the equation can be used. The task distribution

is the joint distribution of

and . The solver is a numerical solver with a parameter for the Poisson equation. For example, suppose that is the Jacobi method Saad2003-vm and is its initial guess. The meta-solver is a strategy characterized by to select the initial guess for each task . Note that the output of the meta-solver depends on , which is different from the case of Example 1. Then, finding the strategy to select parameters of the numerical solver becomes a meta-solving problem.

The above examples explain why we describe our problem as meta-solving instead of meta-learning. In Example 1, the task is learning from data, and the solver is gradient descent. In Example 2, the task is solving a differential equation, and the solver is an iterative differential equation solver. Regardless of the type of task and algorithm, in both cases, the solver solves the task , and we do not distinguish whether is a learning algorithm or other numerical solver. In other words, learning algorithms such as gradient descent are also a type of numerical solvers, and we regard learning as a special case of solving. It is also true for the outer learning algorithm to learn how to solve the task with the solver , so learning to solve is a special case of solving to solve. In this sense, we call it meta-solving.

2.2 Gradient-based meta-solving

We defined meta-solving problems as a generalization of meta-learning problems in Section 2.1. In this section, we propose an efficient method to solve them. For meta-learning problems such as Example 1, general methodologies in the form of gradient-based meta-learning Hospedales2021-ib, e.g. MAML Finn2017-qg, have been proposed. In Example 1, MAML solves the minimization problem Eq. 1 by updating using the outer gradient descent algorithm . MAML is an algorithm to find good initial weights of the neural network for conventional meta-learning problems, but it can be generalized to the meta-solving problems. In meta-solving problems, tasks may not be learning problems, may not be gradient descent, and its parameter may not be initial weights of a neural network. However, we can still employ a gradient-based update rule similar to MAML, as long as , , and are differentiable. Thus, for differentiable solvers and , we propose a gradient-based meta-solving algorithm (Algorithm 1) as a generalization of MAML. We note that MAML can be viewed as a special case of the algorithm, where is a learning problem, is gradient descent for a neural network, is a constant function returning its weights, and is its weights space.

1:: task space, : differentiable solver, : stopping criterion, : gradient-based optimization algorithm
2:while  is not satisfied do
3:      Sample task
4:      Generate parameter of by with parameter
5:      Solve by with
6:      Update by to minimize
7:end while
Algorithm 1 Gradient-based meta-solving algorithm

2.3 Organizing related works

Owing to the general formulation presented above, we can organize several related works on learning-based methods for scientific computing and describe them in a unified way. We highlight the advantages of our method using several examples in this section. The detailed descriptions of how the task, the solver, and the meta-solver are defined in each case are found in creftype A.

First, let us review Feliu-Faba2020-yj. This work proposes a neural network architecture inspired by the nonstandard wavelet form with the meta-learning approach to solve the equations containing partial differential or integral operators. It can be regarded as a special case of GBMS, where task is solving the equation, solver is the forward computation of the trained neural network, is its weights, and meta-solver is the constant function that returns the weights . Note that does not depend on the task . We also note that other works where neural networks replace a whole or part of numerical methods Tompson2017-yw; Tang2017-wc; Ajuria_Illarramendi2020-tb; Ozbay2021-qt; Cheng2021-ua; Pfaff2021-uw; Li2021-bl; Hsieh2018-ey can be organized in the same way. Thus, the meta-solving formulation includes many learning-based methods where meta-learning techniques are not explicitly employed.

We take Psaros2022-ym as another example. In this work, meta-learning is used to learn a loss function of the physics-informed neural network (PINN) Raissi2019-xt for solving PDEs. This also can be considered as a special case of GBMS, where is training the PINN, the solver is the gradient descent for training the PINN, is the weights of a fully-connected neural network that works as a parametrized loss function for the training of PINN, and the meta-solver is the constant function that returns the weights . As in the previous example, does not depend on the task . We remark that this example and Example 1 are similar in the sense that is gradient descent and is the constant function returning neural network’s weights in both examples, though the task is different. The unified framework sheds light on a similarity in various methods for various tasks, which enables us to apply a technique developed for one problem to another easily.

Other than solving equations, Yonetani2021-ez can be considered as a different type of example of GBMS. This work presents Neural A* search, a data-driven search method for path planning problems. The proposed method can fall in the GBMS framework, where the task is a path planning problem, the solver is the differentiable A* search algorithm developed by the authors, and the meta-solver is a convolutional neural network that encodes an environmental map to a guidance map which is used in the differentiable A* algorithm . The meta-solver is trained with in an end-to-end manner to minimize the difference between the solution of and the ground truth and also improve the efficiency of the search. This example shows that the GBMS framework can organize a wide range of learning-based algorithms for computational problems beyond solving equations.

A more relevant study to the current work is reference Chen2022-tx. In this work, meta-learning is used to generate a parameter of PDE-MgNet, a neural network representing the multigrid method, for solving parametrized PDEs. This also can be regarded as a special case of GBMS, where task is solving a PDE, the solver is the PDE-MgNet, whose iterative function is implemented by a neural network with weights representing the smoother of the PDE-MgNet, and the meta-solver is another neural network with weights to generate depending on task . However, in this work, the meta-solver is trained with a single step of the solver but tested with multiple steps of . We will show the importance of this difference in Section 3.1. Furthermore, in Chen2022-tx, the target task is limited to parametrized PDEs, and is a single component of the solver . On the other hand, owing to the general framework of GBMS, we study a variety of problems beyond PDEs and generate multiple components of solvers in Section 3.2.

Finally, we review Um2020-zx, which studies the interaction between machine learning and numerical solvers for solving PDEs and proposes a framework of solution correction using machine learning. As an example of this framework, Um2020-zx studies the inference of initial guesses for the Conjugate Gradient (CG) solver using a neural network. Here, the task is solving a Poisson equation, the solver is the CG solver, and the meta-solver is the neural network that outputs the initial guess of the CG solver. This setting is similar to ours in Section 3.1, but Um2020-zx studies the problem in the context of correction of approximate solutions rather than selecting parameters of solvers. On the other hand, we reformulate and study the problem in Section 3.1 as a meta-solving problem with different solver configurations, and expand it in Section 3.2 to generate other solver parameters in addition to initial guesses by using our general framework of GBMS.

The above examples show the generality of our meta-solving formulation that organizes a variety of methods in the systematic way regardless of their type of algorithm. This overall approach allows us to easily design learning-based methods for new problem settings, and we will demonstrate it in the next section.

3 GBMS for iterative solvers

In this section, we show how GBMS can be used to yield effective meta-solvers and consequently effective solver parameters for new problem settings. In particular, we consider the problem of accelerating iterative solvers by adaptively choosing their parameters. The performance and versatility of GBMS are demonstrated by theoretical analysis and numerical examples, including linear and nonlinear differential equations with different iterative solvers and practical applications.

Iterative solvers are powerful tools to solve computational problems. For example, the Jacobi method and SOR (Successive Over Relaxation) method are used to solve PDEs Saad2003-vm. In iterative methods, an iterative function , which depends on the task , is iteratively applied to the current approximate solution to update it closer to the true solution until it reaches a criterion, such as a certain number of iterations or error tolerance. These solvers have parameters, such as initial guesses and relaxation factors, and solver performance is highly affected by the parameters. However, the appropriate parameters depend on problems and solver configurations, and in practice, it is difficult to know them before solving problems. In order to overcome this difficulty and speed up the computation process, we use GBMS to adaptively choose the parameters for solvers in a data-driven manner.

The meta-solving problem here can be viewed as a generalized version of Example 2. The task is any computational problem that can be solved by iterative methods. For example, is solving a PDE as described in Example 2. The task distribution is defined according to each problem. The solver is an iterative solver with a parameter , which gives an approximate solution . For example, is an initial guess in Section 3.1 and relaxation factor in Section 3.2. The meta-solver is a function parametrized by , which takes as an input and generates an efficient parameter for the solver . We implement the meta-solver by a neural network, so is its weights. Then, is trained to generate that minimizes the expectation of a loss function by a gradient-based optimization algorithm. Since both and

are implemented in a deep learning framework,

can be computed by back-propagation.

3.1 GBMS for initial guesses

First, we apply GBMS to derive a meta-solver that produces effective initial guesses for iterative solvers. More specifically, the meta-solver generates a task dependent initial guess , and the solver solves the task starting from the generated initial guess . In this section, we use the number of iterations as a stopping criterion for the iterative solver, so , where is the iterative function.

Note that although Ajuria_Illarramendi2020-tb; Ozbay2021-qt; Huang2020-oa propose to use initial guesses generated by neural networks, these initial guesses are independent of the solvers. On the other hand, our initial guesses are optimized for each solver configuration. We will show its advantage in the following sections.

3.1.1 Toy problem: 1D Poisson equations

To study the property of the proposed algorithm, we consider solving 1D Poisson equations as a toy problem. First, we prove a theorem that guarantees the improvement of the proposed meta-solving approach for the Jacobi method and linear neural networks. Then, we demonstrate that the theorem numerically holds for another iterative method and practical nonlinear neural networks.

Theoretical analysis

Let us recall Example 2 and set the domain of interest . Then, the target equation becomes the following 1D Poisson equation with Dirichlet boundary condition:

(2)

To solve this equation numerically, it is discretized with the finite difference scheme, and we rewrite it as the matrix equation , where the domain is discretized into points, so and . Suppose that we solve the equation with the Jacobi method under randomly sampled .

Then, the meta-solving problem for solving 1D Poisson equations is defined by the following. The task is to solve . The dataset is , where is the solution of . The solution space is . The loss function is . The task distribution is determined by the distribution of , denoted by . It is assumed to be centered and normalized, i.e. the mean of

is 0 and the covariance is the identity matrix. The solver

is the Jacobi method with iterations starting at an initial guess . Its iterative function is , where . To summarize, . Note that is the identity map. The meta-solver is a linear neural network with weights . It can be represented as matrix multiplication for some matrix , so . Here, we assume to avoid the trivial solution , i.e. the meta-solver does not have the capacity to produce an exact solution for each task. Then, the meta-solving problem becomes

(3)

As for this meta-solving problem, the following theorem holds. The proof is presented in creftype B.

Theorem 1 (Guarantee of improvement by meta-solving).

For any , Eq. 3 has the unique minimizer . Furthermore, if , then for all ,

(4)

where the equality holds if and only if .

Theorem 1 guarantees improvement of meta-solving for the considered setting. For example, suppose

(regular supervised learning, i.e. directly predicting the solution without considering the solver) and

. Then, the theorem implies that the meta-solver trained with Jacobi iterations is expected to give a better initial guess than the meta-solver obtained by regular supervised learning for any number of Jacobi iterations larger than . More generally, Theorem 1 shows that meta-solving with a higher number of Jacobi iterations in the inner solver improves the performance, provided one carries out at least a greater number of Jacobi iterations during inference. This shows the advantage of the meta-solving approach over regular supervised learning.

Numerical examples

The key insight from the theoretical analysis is that meta-solving leverages the properties of the solver and adapts the selection of initial conditions to it. Here, we show using numerical examples that this is the case for more complex scenarios, involving different task distributions, different iterative solvers, and a practical nonlinear neural network.

The meta-solving problem for the numerical examples is defined by the following. The task is the same as the theoretical analysis part. Let the number of discretization points be . The task distribution is determined by the distribution of . We consider two distributions and , where the solution consists of sine functions and hyperbolic tangent functions respectively. Their details are listed in Section C.1.1. For each distribution, we prepare 30,000 tasks for training, 10,000 for validation, and 10,000 for test. To solve the tasks, we use the Jacobi method and the Red-Black ordering SOR method Saad2003-vm with iterations starting at an initial guess , denoted by and

respectively. Note that they are implemented by convolutional layers. We consider two meta-solvers. One is a heuristic initial guess generator

that takes

as an input and gives the heuristic initial guess, which is the linear interpolation of the boundary condition.

does not have a parameter and is used as a baseline. The other is a variant of 1D U-Net Ronneberger2015-an with weights , which takes and the heuristic initial guess as inputs and generates an initial guess for the solver . The details of the architecture are found in Section C.1.1.

The meta-solver is trained with solvers with using Algorithm 1 with the Adam optimizer Kingma2015-ys. For each setting, is trained six times with different random seeds. The details of the hyper-parameters are found in Section C.1.1. The trained meta-solver and baseline are tested with solvers with . The performances of the meta-solvers are measured by the MSE on the test set and presented in Table 1. Fig. 3 shows the comparison of the initial guesses obtained by with and . Fig. 4 is the convergence plot of trained with on .

The results show that the meta-solver is optimized for each corresponding solver . In Table 1, the best performance for a given test solver is achieved at the diagonal where the training and test solvers match. We can explicitly investigate this adaptive phenomenon by visualizing the meta-solver outputs for a representative problem instance. Fig. 3 shows that the meta-solver trained with more iterations tends to ignore high frequencies and focus more on low frequencies. This is because the Jacobi method converges fast in high frequencies but slow in low frequencies Saad2003-vm, and the trained meta-solver compensates the weakness of the solver. These results show that the meta-solver actively adapts to the nature of the inner-loop solver.

Furthermore, we can observe that Theorem 1 holds numerically for the results. Fig. 4 illustrates that a meta-solver trained with more iterations is always better than those trained with fewer iterations, provided the number of iterations during testing is large. In addition, at 10,000 iterations, the error of trained with (MSE of 0.0054) remains approximately 8% better than trained with , which shows that GBMS keeps its advantage for a large number of iterations. These results support that Theorem 1 holds for various practical settings with significant improvement.

(test)
(train)
0.222 0.220 0.219 0.214 0.218 0.212 0.193
0.277 0.212 0.210 0.205 0.210 0.203 0.186
1.169 0.231 0.209 0.204 0.221 0.203 0.185
17.684 3.066 0.279 0.198 0.299 0.198 0.179
1.140 0.982 0.753 0.496 0.201 0.195 0.179
25.786 6.145 1.134 0.554 0.392 0.185 0.170
22.451 8.306 2.459 0.744 2.536 0.224 0.169
- 17.033 12.184 9.482 7.926 9.083 7.611 6.606
(a) MSE on
(test)
(train)
1.049 1.041 1.030 1.005 1.028 0.994 0.922
1.059 1.023 1.006 0.976 1.002 0.964 0.888
1.202 1.049 1.006 0.971 1.000 0.958 0.885
2.017 1.572 1.142 0.940 1.058 0.920 0.839
1.632 1.479 1.383 1.333 0.984 0.938 0.862
3.305 2.597 1.764 1.285 1.305 0.928 0.845
7.683 6.009 3.580 1.417 3.029 1.116 0.756
- 10.761 10.744 10.695 10.539 10.681 10.463 9.908
(b) MSE on
Table 1:

Average MSE of GBMS for solving Poisson equations. Boldface indicates best performance for each column. Standard deviations are presented in

Table 5.
(a) generated by trained with
(b) generated by trained with
Figure 3: Comparison of initial guesses generated by .
Figure 4: Convergence plot of trained with on . Vertical dotted lines are .

3.1.2 Application: Incompressible flow simulations

In this section, we introduce an application of GBMS for iterative solvers. GBMS is developed for repeatedly solving task sampled from a given task distribution . One of the typical situations where GBMS is effective is that similar differential equations are repeatedly solved as a step of solving a time-dependent problem. For example, in many methods of incompressible flow simulations, Poisson equations are solved to compute the pressure field at every time step, and this process occupies a large part of the computation time for fluid simulations Ajuria_Illarramendi2020-tb.

Incompressible flow simulation

Let us now describe the incompressible flow simulation setup. We consider 2D channel flow with a fluctuating inflow. The velocity field and pressure field follow the Navier-Stokes equations in the form:

(5)
(6)

where is the density, and is the viscosity. The equation Eq. 6 is called the pressure Poisson equation. Let , . Let the domain of interest be and the time . Let and be the non-slip wall and fluctuating inflow defined on . By using the finite difference scheme, we discretize the equations into 128 by 128 spacial grids and 1,000 time steps. Then, and are computed alternately by the discretized equations. Details of the numerical scheme for solving Navier-Stokes equations follow Barba2018-xs. GBMS is used for solving the pressure Poisson equation Eq. 6.

Problem definition

Let us define the meta-solving problem for the above setting. The task is to solve a pressure Poisson equation Eq. 6. The dataset is , where is the right hand side and is the solution of Eq. 6. are those at th previous time step. Although is determined by theoretically, additional features of previous timesteps can provide useful information to determine a good guess for . The solution space is . The loss function is the relative -error. The task distribution is determined by the distribution of and the computation process of the simulation. Let be the distribution of , whose detail is found in Section C.1.2. We prepare 20 inflows for training, 10 for validation, and 10 for test, which are denoted by and respectively. Then, we generate datasets under each inflow by solving the Navier-Stokes equations with the SOR method without GBMS, where the relative error tolerance is . The solver is the SOR method with iterations starting at an initial guess . We consider two meta-solvers. One is a baseline that takes as an input and gives the heuristic initial guess, which is the solution of the previous step Poisson equation . This choice is based on the domain knowledge of fluid simulations and standard in the literature Ferziger2001-kj. The meta-solver is a variant of 2D U-Net with weights , which takes as an input and generates initial guess for the solver . The detail of the architecture is presented in Section C.1.2. Note that trained with (i.e. solver-independent initial guess) is similar to the method in Ajuria_Illarramendi2020-tb and is considered as a data-driven baseline in this paper. We also note that the choice of is arbitrary, and any other problem-specific neural network architectures can be used as .

Training

We train the meta-solver with the SOR solver with using Algorithm 1. The details of the hyper-parameters in training are found in Section C.1.2. In addition, we use data augmentation during training to prevent overfitting to the high-accuracy training data. During the training, we augment data by proceeding 10 time steps with and being trained. Specifically, after computing the loss for , we proceed one time step using and , and compute the loss for next using the computed previous step information. Then, it is repeated 10 times for each mini-batch. Without the data augmentation, the inputs of are from the prepared dataset and are always accurate during training. However, during inference, the inputs of are the outputs of the simulation at previous time steps, which can be of a slightly different distribution than encountered during training (i.e. distribution shift) due to numerical errors and generalization gaps. This then causes further accuracy problems for the simulation at the next step, and this issue compounds itself in time. This eventually degrades the performance if the shift in distribution is not dealt with. Data augmentation is used to prevent it. We remark that this training process does not require the whole of the simulation to be differentiable. This is practically important because we can employ GBMS with established solvers by only replacing a target part, the Poisson solver in this case, and can utilize the other parts without any modification.

Evaluation

We incorporate the trained meta-solvers into fluid simulations and evaluate them by the relative -error of the velocity field on test inflows under a fixed number of SOR iterations. More specifically, the performance metric is , where is the ground truth (high-accuracy simulation data) of th step velocity under the inflow , which is obtained by using and with a large enough to achieve the relative error tolerance of , and is the th step velocity obtained by using the trained and with a fixed . To stabilize the simulation, the first 100 steps are excluded, and the simulation with the trained starts with 101 steps.

The results presented in Table 2 demonstrate the advantage of GBMS. Although all diverge for because of error accumulation, for the other solvers, the best performance with significant improvement is achieved at the diagonal where the training and test solvers match. For example, for the test solver , the accuracy of trained with is approximately times better than regular supervised learning ( trained with ) and times better than the classical baseline . Furthermore, GBMS keeps its advantage for a larger number of iterations than the number which it is trained with. For example, for , the meta-solver trained with has the best performance, followed by one trained with , , and in that order. As for computation time, trained with and tested with (GBMS approach) takes 11 sec to simulate the flow for one second, while tested with takes 105 sec to achieve similar accuracy to the GBMS approach. Also, trained with and tested with takes 73 sec to achieve the similar accuracy. Since it takes approximately 7 hours to train with a GeForce RTX 3090, our approach is effective if we simulate the flow for more than 268 seconds, which can be easily satisfied in practical settings. To summarize, GBMS performs best for a fixed number of iterations and can achieve high accuracy within a smaller number of iterations and shorter computation time than the classical baseline and regular supervised learning.

(test)
(train)
NaN 0.054861 0.008543 0.001284
NaN 0.001171 0.001625 0.000864
NaN 0.017513 0.000829 0.000723
NaN NaN 0.005567 0.000312
- 1.988177 0.152626 0.014112 0.002941
Table 2: Relative -error of GBMS on incompressible flow simulations.

3.2 GBMS for relaxation factors

This section presents the GBMS approach for generating relaxation factors of the SOR method. The SOR method is an improved version of the Gauss-Seidel method with relaxation factors enabling faster convergence Saad2003-vm. Its performance is sensitive to the choice of the relaxation factor, but the optimal choice is only accessible in simple cases (e.g. Yang2009-dl) and difficult to obtain beforehand in general. Thus, we aim to choose a good relaxation factor to accelerate the convergence of the SOR method using GBMS. Moreover, leveraging the generality of GBMS, we generate the relaxation factor and initial guess simultaneously and observe its synergy.

This section has three important differences from Section 3.1. First, the GBMS approach is essential for generating the relaxation factors. In Section 3.1, the supervised learning approach (i.e. directly predicting the solution and using it as the initial guess) is possible because we can use pre-computed solutions as the ground truth of the best initial guesses. On the other hand, it is not practical to prepare the dataset of the best relaxation factors, because they do not have closed-form formulas in general and are expensive to compute through search algorithms. Second, the stopping criterion for the iterative solver is a given error tolerance, while that in Section 3.1 is a given number of iterations. In practical applications, the stopping criterion of error tolerance is more common than the number of iterations. This is especially the case when high solution accuracy is required, but the number of iterations required to achieve this accuracy varies widely depending on each problem instance. In such cases, it is difficult to set a fixed number of iterations as the stopping criterion. GBMS can handle this setting by customizing the loss function. Third, the target equation is a nonlinear ODE (Robertson equation), and the presented application is an inverse problem (parameter estimation problem), while those in Section 3.1 are a linear PDE (Poisson equation) and time-dependent problem (incompressible flow simulation). This shows the versatility of the proposed method.

3.2.1 Toy problem: Stiff ordinary differential equations

While our goal is to apply GBMS to solve the inverse problem, we start with a simplified toy problem solving many stiff ordinary differential equations.

Robertson equation

Let us consider solving Robertson equation Robertson1966-ea, which is a nonlinear ordinary differential equation that models a certain reaction of three chemicals in the form:

(7)
(8)

where , , are the concentrations of the chemicals, and , , are the reaction rate constants. We write as and the right hand side of the equation Eq. 7 as . The Robertson equation with constants , , is often used as an example of stiff equations, which become numerically unstable for certain numerical methods, such as the forward Euler method and the Runge-Kutta method Saad2003-vm, without a very small stepsize. Thus, stiff equations are usually solved by implicit numerical methods Hairer2010-uw. For simplicity, we use the backward Euler method to solve the equation Eq. 7:

(9)

where is the step size. Since the solution of the Robertson equation has a quick initial transient followed by a smooth variation Hairer2010-uw, we set the step size so that the evaluation points are located log-uniformly in . Note that GBMS can also be applied to more advanced methods, such as adaptive step size methods, in the same manner. To find , we need to solve the nonlinear algebraic equation

(10)

at every time step. For solving the equation Eq. 10, we use the one-step Newton-SOR method Saad2003-vm. Its iteration rule is

(11)
(12)

where is the Jacobian of , its decomposition , , are diagonal, strictly lower triangular, and strictly upper triangular matrices respectively, and is the relaxation factor of the SOR method. It iterates until the residual of the approximate solution reaches a given error tolerance . Note that we choose to investigate the Newton-SOR method, because it is a fast and scalable method and applicable to larger problems.

Problem definition

Let us define the meta-solving problem for solving the Robertson equation. The task is to solve the equation Eq. 10 using the Newton-SOR method. The dataset is . Note that does not contain the solution of Eq. 10, so the training is done in an unsupervised manner. Unlike Section 3.1, we consider all possible sequences of approximate solutions during the Newton-SOR iterations as the solution space , which enables us to build loss functions that account for both accuracy and efficiency. The loss function is designed to minimize the number of iterations to achieve a given error tolerance , while the loss function in the Section 3.1 is the residual after a given number of iterations. We will discuss the details of how we define this loss function in the next paragraph. As for the task distribution, we sample log-uniformly from respectively. We prepare 10,000 sets of and use 2,500 for training, 2,500 for validation, and 5,000 for test. The solver is the Newton-SOR method with a given error tolerance . The solver parameter is a pair of relaxation factor and initial guess , so . More specifically, writing the right hand side of Eq. 12 as , the solver is written as , where is the minimum number of iterations to achieve the given error tolerance .

We compare four meta-solvers , , , and . The meta-solver has no parameters, while , , and are implemeted by fully-connected neural networks with weight . The meta-solver is a classical baseline, which uses the previous time step solution as an initial guess and a constant relaxation factor , so . The constant relaxation factor is , which does not depend on task and is chosen by the brute force search to minimize the average number of iterations to reach in the whole training data. Note that is already a good choice compared to other constant values. For example, requires about times iterations compared to , and leads divergence in solving some problem instances. The meta-solver generates an initial guess adaptively in the same way as in Section 3.1, but uses the constant relaxation factor , so . The meta-solver generates a relaxation factor adaptively but uses the previous time step solution as an initial guess, so . The meta-solver generates both an initial guess and relaxation factor adaptively, so . The details about their architectures are found in Section C.2.1.

Loss function

To solve the defined meta-solving problem, we carefully design the loss function . The new loss function is necessary because the relaxation factor that minimizes the number of iterations for a given tolerance can be different from the relaxation factor that minimizes the residual at a given number of iterations. Ideally, should be . However, during training, can become very large or infinity at the worst case, because the meta-solver can generate a bad relaxation factor . This results in a large number of iterations or even leads to divergence. To avoid this, we limit the maximum number of iterations during training, and define by

(13)

The first term counts the number of iterations until the residual reaches the tolerance, which is defined by

(14)
(15)

Thus, if the residual reaches the tolerance within iterations. The second term is the residual, which becomes effective only if the residual does not reach the tolerance within iterations. is a hyper-parameter to balance the two terms, and is used throughout Section 3.2. In summary, the loss function becomes

(16)

In addition to the custom loss function, we introduce several training tricks. First, since is not differentiable due to the indicator function in Eq. 15

, we implement it to behave the indicator function in the forward pass and the sigmoid function in the backward pass as a surrogate

Yin2019-kz. Second, we clip the approximate solution to be in to prevent divergence and make training stable. The range is set loose compared to the true solution range because the approximate solution can be out of the range during iterations even in successful cases. Third, we initialize the neural network to generate relatively small to avoid divergence in early stages of the training. Further details of training including other hyper-parameters are found in Section C.2.1

Evaluation

The meta-solvers are trained with with using Algorithm 1, and the performance is evaluated by the mean of the required number of iterations on the test set. Table (a)a compares the four meta-solvers with the same solver , and Table (b)b compares the performance of with different error tolerances.

The significant advantage of the GBMS approach over the baselines is observed in Table (a)a and Table (b)b. All the trained meta-solvers are better than the baseline, and the best meta-solver with reduces the number of iterations by 87% compared to the baseline with the same solver.

Moreover, Table (b)b shows that the proposed method can learn not only the characteristics of the target equation but also the solver configurations. In Table (b)b, we can observe that the meta-solver performs the best on the diagonal where the error tolerance used in training and testing is the same. In other words, the best relaxation factors that minimize the number of iterations are different depending on error tolerances, and meta-solvers successfully produce good relaxation factors according to each error tolerance. This adaptivity to solvers is an important property of GBMS.

Furthermore, Table (a)a implies that simultaneously generating the initial guess and relaxation factor can create synergy. This follows from the observation that the degree of improvement by from and (83% and 29%) are greater than that by and from (80% and 20%).

(test)
(train)
55.819
14.379
9.506
- 70.507
(a) of different meta-solvers with the same solver .
(test)
(train)
5.464 10.804 16.810
5.761 9.506 13.684
5.99 9.703 13.539
- 37.563 70.507 104.556
(b) of different solvers with the same meta-solver .
Table 3: Average of GBMS for the Newton-SOR method. Boldface indicates best performance for each column. Standard deviations are presented in Table 6

3.2.2 Application: Parameter estimation of the Robertson equation

In this section, we demonstrate the utility of GBMS by applying it to the parameter estimation problem of chemical reaction systems Moles2003-nn; Chou2009-pr. It is an inverse problem that calibrates a mathematical model of chemical reactions given observation data, which can be solved by optimization methods. The optimization involves the repeated solutions of the forward problem as in the previous section, and we apply the GMBS to accelerate them.

In particular, we estimate the reaction rates of the Robertson equation Eq. 7. In this study, the observation data is simulated by a high accuracy numerical solution (error tolerance ) and Gaussian noise with % standard deviation to the solution. The optimization problem is described as

(17)

Here, the objective function is the mean absolute percentage error, i.e.

(18)

where is the th element of at th timestep of the simulation data, and is that of the numerical solution computed using estimated parameters. The true parameters to be estimated are , ,

. We use a genetic algorithm

Fortin2012-qv to solve the optimization problem because local methods often fail to obtain satisfactory solutions due to the ill-conditioned and multimodal nature of the parameter estimation problem in chemical systems Moles2003-nn. The genetic algorithm stops when reaches , which gives a satisfactory solution. The initial population of is sampled from the same distribution as Section 3.2.1. We conduct this optimization with each of and trained in Section 3.2.1, and it is repeated 10 times with different random seeds. The details of the genetic algorithm we use are found in Section C.2.2.

Table 4 shows the total number of iterations to find a satisfactory solution to the optimization. The meta-solver reduces the number of iterations by 79% compared to the baseline . This result shows that GBMS can significantly accelerate solving inverse problems. Note that here we used the pre-trained meta-solver in Section 3.2.1, but it is also possible to train from scratch or fine-tune the meta-solver during optimization, because the meta-solver can be trained without pre-calculated reference solutions. It can be effective when the task distribution is unknown or shifts during optimization.

total number of iterations
Table 4: The total number of iterations to find a satisfactory solution. Standard deviations are in the brackets.

4 Conclusion

In this paper, we proposed a formulation of meta-solving and a general gradient-based algorithm (GBMS) to solve this class of problems. In the proposed framework, many related works that used neural networks for solving numerical problems were organized in a unified way and can be regarded as variants of GBMS. Thus, the GBMS approach offers a general design pattern to develop solution algorithms that blends machine learning and scientific computing. As concrete illustrations, we applied GBMS to iterative methods for solving differential equations, including a linear PDE and nonlinear ODE, and showed its advantage over both classical numerical methods and regular supervised learning. In particular, in the incompressible flow simulation with a fixed computation budget, the proposed method improved the accuracy by approximately times compared to regular supervised learning and times compared to a classical method using the previous step solution. In addition, in a parameter estimation inverse problem with a fixed error tolerance, it reduced the computation cost by approximately 80% compared to a baseline that used an optimized constant relaxation factor. These results show the high performance and versatility of the proposed approach. In conclusion, GBMS offers a general design pattern for solving numerical problems using machine learning and promising avenues for the fusion of scientific computing and machine learning.

Acknowledgments

Q. Li is supported by the National Research Foundation, Singapore, under the NRF fellowship (NRF-NRFF13-2021-0005), and S. Arisaka is supported by Kajima Corporation, Japan.

Appendix A Details of examples

Example 3 (Feliu-Faba2020-yj).

The authors in Feliu-Faba2020-yj propose the neural network architecture with meta-learning approach that solves the equations in the form with appropriate boundary conditions, where is a partial differential or integral operator parametrized by a parameter function . This work can be described in our framework as follows:

  • Task : The task is to solve a for :

    • Dataset : The dataset is , where are the parameter function, right hand side, and solution respectively.

    • Solution space : The solution space is a subset of for .

    • Loss function : The loss function is the mean squared error with the reference solution, i.e. .

  • Task space : The task distribution is determined by the distribution of and .

  • Solver : The solver is implemented by a neural network imitating the wavelet transform, which is composed by three modules with weights . In detail, the three modules, , , and , represent forward wavelet transform, mapping to coefficients matrix of the wavelet transform, and inverse wavelet transform respectively. Then, using the modules, the solver is represented by .

  • Meta-solver : The meta-solver is the constant function that returns its parameter , so and . Note that does not depend on in this example.

Then, the weights is optimized by a gradient-based algorithm as described in Algorithm 1.

Example 4 (Psaros2022-ym).

In Psaros2022-ym, meta-learning is used for learning a loss function of the physics-informed neural network, shortly PINN Raissi2019-xt. The target equations are the following:

(a)
(b)
(c)

where is a bounded domain, is the solution, is a nonlinear operator containing differential operators, is a operator representing the boundary condition, represents the initial condition, and is a parameter of the equations.

  • Task : The task is to solve a differential equation by PINN:

    • Dataset : The dataset is the set of points and the values of at the points if applicable. In detail, , where , and are sets of points corresponding to the equation Eq. a, Eq. b, and Eq. c respectively. is the set of points and observed values at the points. In addition, each dataset is divided into training set </