Composing Scalable Nonlinear Algebraic Solvers

07/14/2016 ∙ by Peter R. Brune, et al. ∙ Argonne National Laboratory The University of Kansas Rice University 0

Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods.



There are no comments yet.


page 19

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Large-scale algebraic solvers for nonlinear partial differential equations (PDEs) are an essential component of modern simulations. Newton-Krylov methods [23] have well-earned dominance. They are generally robust and may be built from preexisting linear solvers and preconditioners, including fast multilevel preconditioners such as multigrid [5, 7, 9, 67, 73] and domain decomposition methods [55, 63, 64, 66]. Newton’s method starts from whole-system linearization. The linearization leads to a large sparse linear system where the matrix may be represented either explicitly by storing the nonzero coefficients or implicitly by various “matrix-free” approaches [11, 42]. However, Newton’s method has a number of drawbacks as a stand-alone solver. The repeated construction and solution of the linearization cause memory bandwidth and communication bottlenecks to come to the fore with regard to performance. Also possible is a lack of convergence robustness when the initial guess is far from the solution. Luckily, a large design space for nonlinear solvers exists to complement, improve, or replace Newton’s method. Only a small part of this space has yet been explored either experimentally or theoretically.

In this paper we consider a wide collection of solvers for nonlinear equations. In direct equivalence to the case of linear solvers, we use a small number of algorithmic building blocks to produce a vast array of solvers with different convergence and performance properties. Two related solver techniques, nonlinear composition and preconditioning, are be used to construct these solvers. The contributions of this paper are twofold: introduction of a systematic approach to combining nonlinear solvers mathematically and in software and demonstration of the construction of efficient solvers for several problems of interest. Implementations of the solvers in this paper are available in the PETSc library and may be brought to bear on relevant applications, whether simulated on a laptop or on a supercomputer.

2 Background

We concern ourselves with the solution of nonlinear equations of the form


for a general discretized nonlinear function and right hand side (RHS) . We define the nonlinear residual as


We retain both notations as certain solvers (Section 5.3) require the RHS to be modified. We will use


to denote the Jacobian of .

The linear system

with residual

is an important special case from which we can derive valuable insight into solvers for the nonlinear problem. Stationary solvers for linear systems repeatedly apply a linear operator in order to progressively improve the solution. The application of a linear stationary solver by defect correction may be written as


where is a linear operator, called a preconditioner, whose action approximates, in some sense, the inverse of . The Jacobi, Gauss-Seidel, and multigrid iterations are all examples of linear stationary solvers.

composition of linear preconditioners and may proceed in two different ways, producing two new stationary solvers. The first is the additive combination

with weights and . The second is the multiplicative combination

compositions consisting of and are an effective acceleration strategy if eliminates a portion of the error space and handles the rest. A now mature theory for these compositions was developed in the 1980s and 1990s in the context of domain decomposition methods [62, 66].

Linear left- and right-preconditioning, when used in conjunction with Krylov iterative methods [58], is standard practice for the parallel solution of linear systems of equations. We write the use of a linear Krylov method as where is the matrix, the initial solution, and the RHS.

Linear left-preconditioning recasts the problem as

while right-preconditioning takes two stages,

where one solves for the preconditioned solution and then transforms using to the solution of the original problem.

3 Nonlinear Composed Solvers

We take the basic patterns from the previous section and apply them to the nonlinear case. We emphasize that unlike the linear case, the nonlinear case requires the approximate solution as well as the residual to be defined both in the outer solver and in the preconditioner. With this in mind, we will show how composition and preconditioning may be systematically transferred to the nonlinear case.

We use the notation for the action of a nonlinear solver. In most cases, but not always, , that is, is simply the most recent residual. It is also useful to consider the action of a solver that is dependent on the previous approximate solutions and the previous residuals and as . Methods that store and use information from previous iterations such as previous solutions, residuals, or step directions will have those listed in the per-iteration inputs as well.

Nonlinear composition consists of a sequence or series of two (or more) solution methods and , which both provide an approximate solution to (2.1). Nonlinear preconditioning, on the other hand, may be cast as a modification of the residual through application of an inner method . The modified residual is then provided to an outer solver which solves the preconditioned system.

An additive composition may be written as


for weights and . The multiplicative composition is


which simply states: update the solution using the current solution and residual with the first solver and then update the solution again using the resulting new solution and new residual with the second solver.. Nonlinear left-preconditioning may be directly recast from the linear stationary solver case:


which we can rewrite as a fixed-point problem


with nonlinear analog


so the equivalent preconditioned nonlinear problem can be reached by subtracting from both sides, giving


Thus, the left-preconditioned residual is given by


Under the certain circumstances the recast system will have better conditioning and less severe nonlinearities than the original system. The literature provides several examples of nonlinear left-preconditioning: nonlinear successive over relaxation (SOR) has been used in place of linear left-applied SOR [19], additive Schwarz-preconditioned inexact Newton (ASPIN) [16] uses an overlapping nonlinear additive Schwarz method to provide the left-preconditioned problem for a Newton’s method solver. Walker and Ni’s fixed-point preconditioned Anderson mixing [69] uses a similar methodology. Many of these methods or variants thereof are discussed and tested later in this paper.

Nonlinear right-preconditioning111Note that nonlinear right preconditioning is a misnomer because it does not simplify to right linear preconditioning in the linear case. It actually results in the new linear system

. However, as with linear right preconditioning, the inner solver is applied before the function (matrix-vector product in the linear case) evaluation, hence the name.

involves a different recasting of the residual. This time, we treat the nonlinear solver as a nonlinear transformation of the problem to one with solution , as

is a linear transformation of

. Accordingly, the nonlinear system in (2.1) can be rewritten as the solution of the system perturbed by the preconditioner


and the solution by outer solver as


followed by

Nonlinear right-preconditioning may be interpreted as putting preconditioner “within striking distance” of the solution. Once can solve for in (3.8) directly, with an outer solver using residual . However, the combination of inner solve and function evaluation is significantly more expensive than computing and should be avoided. We will show that when is a Newton-Krylov solver, (3.9) is equivalent to (3.2). Considering them as having similar mathematical and algorithmic properties is appropriate in general.

In the special case of Newton’s method, nonlinear right-preconditioning is referred to by Cai [15] and Cai and Li [17] as nonlinear elimination [45]. The idea behind nonlinear elimination is to use a local nonlinear solver to fully resolve difficult localized nonlinearities when they begin to cause difficulties for the global Newton’s method; see Section 4.4. Grid sequencing [41, 65] and pseudo-transient [39] continuation methods work by a similar principle, using precursor solves to put Newton’s method at an initial guess from which it has fast convergence. Alternating a global linear or nonlinear step with local nonlinear steps has been studied as the LATIN method [20, 43, 44]. Full approximation scheme (FAS) preconditioned nonlinear GMRES (NGMRES), discussed below, for recirculating flows [72] is another application of nonlinear right-preconditioning, where the FAS iteration is stabilized and accelerated by constructing a combination of several previous FAS iterates.

For solvers based on a search direction, left-preconditioning is much more natural. In fact, general line searches may be expressed as left-preconditioning by the nonlinear Richardson method. Left-preconditioning also has the property that for problems with poorly scaled residuals, the inner solver may provide a tenable search direction when one could not be found based on the original residual. A major difficulty with the left-preconditioned option is that the function evaluation may be much more expensive. Line searches involving the direct computation of at points along the line may be overly expensive given that the computation of the residual now requires nonlinear solves. Line searches over may also miss stagnation of weak inner solvers and must be monitored. One may also base the line search on the unpreconditioned residual. The line search based on is often recommended [37] and is the “correct” one for general left-preconditioned nonlinear solvers.

Our basic notation for compositions and preconditioning is described in Table 1.

Composition Type Symbol Statement Abbreviation
Additive Composite
Multiplicative Composite
Inner Linearization Inversion
Table 1: Nonlinear compositions and preconditioning given outer and inner solver and .

4 Solvers

We now introduce several algorithms, the details of their implementation and use, and an abstract notion of how they may be composed. We first describe outer solution methods and how composition is applied to them. We then move on to solvers used primarily as inner methods. The distinction is arbitrary but leads to the discussion of decomposition methods in Section 5.

4.1 Line Searches

The most popular strategy for increasing robustness or providing globalization in the solution of nonlinear PDEs is the line search. Given a functional , a starting point , and a direction , we compute . The functional may be or a problem-specific objective function. Theoretical guarantees of convergence may be made for many line search procedures if is a descent direction. In practice, many solvers that do not converge when only full or simply damped steps are used converge well when combined with a line search. Different variants of line searches are appropriate for different solvers. We may organize the line searches into two groups based on a single choice taken in the algorithm: whether the full step is likely to be sufficient (for example, with Newton’s method near the solution) or not. If it is likely to be sufficient, the algorithm should default to taking the full step in a performance-transparent way. If not, and some shortening or lengthening of the step is assumed to be required, the line search begins from the premise that it must determine this scaling factor.

In the numerical solution of nonlinear PDEs, we generally want to guarantee progress in the minimization of at each stage. However, is not , but instead . With Newton’s method, the Jacobian has been computed and iteratively inverted before the line search and may be used in the Jacobian-vector product. Also note that the transpose product is not required for the line search, since the slope of in the direction of step may be expressed as the scalar quantity . A cubic backtracking (BT) line search as described in Dennis and Schnabel [25] is used in conjunction with methods based on Newton’s method in this work. The one modification necessary is that it is modified to act on the optimization problem arising from . BT defaults to taking the full step if that step is sufficient with respect to the Wolfe conditions [74], and does no more work unless necessary. The backtracking line search may stagnate entirely for ill-conditioned Jacobian [68]. For a general step BT is not appropriate for a few reasons. First, steps arising from methods other than Newton’s method are more likely to be ill-scaled, and as such the assumption that the full-step is appropriate is generally invalid. For this reason we only use BT in the case of Newton’s method. Second, there are also cases where we lose many of the other assumptions that make BT appropriate. Most of the algorithms described here, for example, do not necessarily assemble the Jacobian and require many more iterations than does Newton’s method. Jacobian assembly, just to perform the line search, in these cases would become extremely burdensome. This situation removes any possibility of using many of the technologies we could potentially import from optimization, including safeguards and guarantees of minimization. However, in many cases one can still assume that the problem has optimization-like qualities. Suppose that is the gradient of some (hidden) objective functional instead of . Trivially, one may minimize the hidden by finding its critical points, which are roots of

by using a secant method. The resulting critical point (CP) line search is outlined in Alg. 1.

1:procedure CP()
3:     for   do
5:     end for
Alg. 1 CP Line Search

CP differs from BT in that it will not minimize over . However, CP shows much more rapid convergence than does L2 for certain solvers and problems. Both line searches may be started from as an arbitrary or problem-dependent damping parameter, and . In practice, one iteration is usually satisfactory. For highly nonlinear problems, however, overall convergence may be accelerated by a more exact line search corresponding to a small number of iterations. CP has also been suggested for nonlinear conjugate gradient methods [61].

If our assumption of optimization-like qualities in the problem becomes invalid, then we can do little besides attempt to find a minimum of . An iterative secant search for a minimum value of is defined in Alg. 2.

1:procedure L2()
3:     for   do
7:     end for
Alg. 2 L2 Line Search

When converged, L2 is equivalent to an optimal damping in the direction of the residual and will forestall divergence. is calculated by polynomial approximation, requiring two additional residual evaluations per application. In practice and in our numerical experiments the number of inner iterations is .

4.2 Nonlinear Richardson (Nrich)

The nonlinear analogue to the Richardson iteration is merely the simple application of a line search. NRICH takes a step in the negative residual direction and scales that step sufficiently to guarantee convergence. NRICH is known as steepest descent [32] in the optimization context where is the gradient of a functional noted above. NRICH is outlined in Alg. 3.

1:procedure NRICH()
3:      determined by line search
Alg. 3 Nonlinear Richardson Iteration

NRICH is often slow to converge for general problems and stagnates quickly. However, different step directions than may be generated by nonlinear preconditioning and can improve convergence dramatically.

1:procedure NRICH()
3:      determined by line search
Alg. 4 Nonlinear Richardson Iteration: Left-Preconditioned by

As shown in Alg. 4, we replace the original residual equation with and apply NRICH to the new problem. There are two choices for in the line search. The first is based on minimizing the original residual, as in the unpreconditioned case; the second minimizes the norm of the preconditioned residual instead. Minimizing the unpreconditioned residual with a preconditioned step is more likely to stagnate, as there is no guarantee that the preconditioned step is a descent direction with respect to the gradient of .

4.3 Anderson Mixing (Anderson)

ANDERSON[2] constructs a new approximate solution as a combination of several previous approximate solutions and a new trial.

1:procedure ANDERSON( )
3:     minimize over
Alg. 5 Anderson Mixing

In practice, the nonlinear minimization problem in Alg. 5 is simplified. The are computed by considering the linearization


and solving the related linear least squares problem


ANDERSON solves (4.2) by dense factorization. The work presented here uses the SVD-based least-squares solve from LAPACK in order to allow for potential singularity of the system arising from stagnation to be detected outside of the subsystem solve. A number of related methods fall under the broad category of nonlinear series accelerator methods. These include Anderson mixing as stated above. Nonlinear GMRES NGMRES [71] is a variant that includes conditions to avoid stagnation and direct inversion in the iterative subspace (DIIS) [54], which formulates the minimization problem in an alternative fashion. Both right- and left-preconditioning can be applied to Anderson mixing. With right-preconditioning the computation of is replaced with and with . This incurs an extra function evaluation, although this is usually included in the nonlinear preconditioner application. Left-preconditioning can be applied by replacing with . Right-preconditioned NGMRES has been applied for recirculating flows [72] using FAS  and in stabilizing lagged Newton’s methods [18, 59]. Simple preconditioning of NGMRES has also been proposed in the optimization context [22]. Preconditioned Anderson mixing has been leveraged as an outer accelerator for the Picard iteration [47, 70].

We can use the same formulation expressed in (4.2) to determine and (or any number of weights) in (3.1), using the solutions and final residuals from a series of inner nonlinear solvers instead of the sequence of previous solutions. This sort of residual-minimizing technique is generally applicable in additive compositions of solvers. All instances of additive composition in the experiments use this formulation.

4.4 Newton-Krylov Methods (NewtK)

In Newton-Krylov methods, the search direction is determined by inexact iterative inversion of the Jacobian applied to the residual by using a preconditioned Krylov method.

1:procedure NEWTK()
2:      approximate inversion by Krylov method
3:      determined by line search
Alg. 6 Newton-Krylov Method

NEWTK cannot be guaranteed to converge far away from the solution, and it is routinely enhanced by a line search. In our formalism, NEWTK with a line search can be expressed as NRICH left-preconditioned by NEWTK  coming from Alg. 4. However, we will consider line-search globalized NEWTK as the standard, and will omit the outer NRICH when using it, leading to Alg. 6. Other globalizations, such as trust-region methods, are also often used in the context of optimization but will not be covered here.

NEWTK is the general-purpose workhorse of a majority of simulations requiring solution of nonlinear equations. Numerous implementations and variants exist, both in the literature and as software [12, 27]. The general organization of the components of NEWTK is shown in Fig. 1. Note that the vast majority of potential customization occurs at the level of the linear preconditioner and that access to fast solvers is limited to that step.

Figure 1: Organization of a Newton-Krylov solver with potential points of customization. Although one can alter the type, parameters, or tolerances of Newton’s method, the Krylov solver, or the line search, we see that the majority of the potential customization and composition occurs at the level of the preconditioner. Possibilities include the use of algebraic preconditioners, such as direct solvers or sparse factorizations, or domain decomposition, multigrid, or composite solvers, which all have subcomponent solvers and preconditioners. These divisions may either be by subdomain, as with the additive Schwarz methods, or by physics field components, as with block preconditioners (for example, one block for pressures and one for velocities).

Nonlinear right-preconditioning of Newton’s method may take two forms. We describe our choice and comment on the alternative. At first glance, right-preconditioning of Newton’s method requires construction or application of the right-preconditioned Jacobian


and we note that the computation of is generally impractical. Consider the application of nonlinear right preconditioning on an iteration of Newton’s method, which would proceed in two steps:

We may nest these two steps as


and by Taylor expansion one may say that

giving us the simplification

In this form, NEWTK is easily implemented, as shown in Alg. 7.

1:procedure NK()
4:      determined by line search
Alg. 7 Right-Preconditioned Newton-Krylov Method

Note that is merely in this form: one solver runs after the other. The connection between composition and preconditioning in Newton’s method provides an inkling of the advantages afforded to NEWTK as an inner composite solver combined with other methods. An alternative approach applies an approximation to (4.3) directly [4]. The approximation is

which is solved for . Right-preconditioning by the approximation requires an application of the nonlinear preconditioner for every inner Krylov iterate and limits our choices of Krylov solver to those tolerant of nonlinearity, such as flexible GMRES [57].

1:procedure NEWTK()
2:      approximate inversion by Krylov method
3:      determined by line search
Alg. 8 Left-Preconditioned Newton-Krylov Method

For nonlinear left-preconditioning, shown in Alg. 8, we replace the computation with and take the Jacobian of the function as an approximation of


where now the Jacobian is a linearization of and impractical to compute in most cases. In the particular case where the preconditioner is the nonlinear additive Schwarz method (NASM) (Section 5.2), this is known as ASPIN. In the case of NASM preconditioning, one has local block Jacobians, so the approximation of the preconditioned Jacobian as


is used instead. The iteration requires only one inner nonlinear iteration and some small number of block solves, per outer nonlinear iteration. By contrast, direct differencing would require one inner iteration at each inner linear iteration for the purpose of repeated approximate Jacobian application. Note the similarity between ASPIN and the alternative form of right-preconditioning in terms of required components, with ASPIN being more convenient in the end.

4.5 Quasi-Newton (Qn)

The class of methods for nonlinear systems of equations known as quasi-Newton methods [24] uses previous changes in residual and solution or other low-rank expansions [40] to form an approximation of the inverse Jacobian by a series of low rank updates. The approximate Jacobian inverse is then used to compute the search direction. If the update is done directly, it requires storing and updating the dense matrix , which is impractical for large systems. One may overcome this limitation by using limited-memory variants of the method [46, 51, 14, 49], which apply the approximate inverse Jacobian by a series of low-rank updates. The general form of QN methods is similar to that of NEWTK and is shown in Alg. 9.

1:procedure QN()
2:      determined by line search
Alg. 9 Quasi-Newton Update

A popular variant of the method, L-BFGS, can be applied efficiently by a two-loop recursion [51]:

1:procedure ()
3:     for   do
6:     end for
8:     for   do
11:     end for
Alg. 10 Two-Loop Recursion for L-BFGS

Note that the update to the Jacobian in Alg. 10 is symmetric and shares many of the same limitations as nonlinear conjugate gradients (NCG) (Section 4.6). It may be perplexing, from an optimization perspective, to see L-BFGS and NCG applied to nonlinear PDEs as opposed to optimization problems. However, their exclusion from our discussion would be as glaring as leaving conjugate gradient methods out of a discussion of Krylov solvers for linear PDEs. In this paper we adhere to the limitations of NCG and QN and use them only for problems with symmetric Jacobians. In the case where L-BFGS is inapplicable, Broyden’s “good” and “bad” methods [13] have efficient limited memory constructions [14, 26, 29] and are recommended instead [52]. The overall performance of all these methods, however, may rival that of Newton’s method [38] in many cases. QN methods also have deep connections with Anderson mixing [28]. Anderson mixing, and the Broyden methods belong to a general class of Broyden-like methods [29].

The initial approximate inverse Jacobian is often taken as the weighted [60] identity. However, one also can take for lagged Jacobian . Such schemes greatly increase the robustness of lagged Newton’s methods when solving nonlinear PDEs [10]. Left-preconditioning for QN is trivial, and either the preconditioned or unpreconditioned residual may be used with the line search.

4.6 Nonlinear Conjugate Gradients

Nonlinear conjugate gradient methods [31] are a simple extension of the linear CG method but with the optimal step length in the conjugate direction determined by a line search rather than the exact formula that works in the linear case. NCG is outlined in Alg. 11. NCG requires storage of one additional vector for the conjugate direction but has significantly faster convergence than does NRICH for many problems.

1:procedure NCG()
5:      determined by line search
Alg. 11 Nonlinear CG

Several choices exist for constructing the parameter [21, 30, 31, 35]. We choose the Polak-Ribière-Polyak [53] variant. The application of nonlinear left-preconditioning is straightforward. Right-preconditioning is not conveniently applicable as is the case with QN in Section 4.5 and linear CG.

NCG has limited applicability because it suffers from the same issues as its linear cousin for problems with non-symmetric Jacobian. A common practice when solving nonlinear PDEs with NCG is to rephrase the problem in terms of the normal equations, which involves finding the root of . Since the Jacobian must be assembled and multiplied at every iteration, however, the normal equation solver is not a reasonable algorithmic choice, even with drastic lagging of Jacobian assembly. The normal equations also have a much worse condition number than does the original PDE. In our experiments, we use the conjugacy-ensuring CP line search.

5 Decomposition Solvers

An extremely important class of methods for linear problems is based on domain or hierarchical decomposition, and so we consider nonlinear variants of domain decomposition and multilevel algorithms. We introduce three different nonlinear solver algorithms based on point-block solves, local subdomain solves, and coarse-grid solves.

These methods require a trade-off between the amount of computation dedicated to solving local or coarse problems and the communication for global assembly and convergence monitoring. Undersolving the subproblems wastes the communication overhead of the outer iteration, and oversolving exacerbates issues of load imbalance and quickly has diminishing returns with respect to convergence. The solvers in this section exhibit a variety of features related to these trade-offs.

The decomposition solvers do not guarantee convergence. While they might converge, the obvious extension is to use them in conjunction with the global solvers as nonlinear preconditioners or accelerators. As the decomposition solvers expose more possibilities for parallelism or acceleration, their effective use in the nonlinear context should provide similar benefits to the analogous solvers in the linear context. A major disadvantage of these solvers is that each of them requires additional information about the local problems, a decomposition, or hierarchy of discretizations of the domain.

5.1 Gauss-Seidel-Newton (Gsn)

Effective methods may be constructed by exact solves on subproblems. Suppose that the problem easily decomposes into small block subproblems. If Newton’s method is applied multiplicatively by blocks, the resulting algorithm is known as Gauss-Seidel-Newton and is shown in Alg. 12. Similar methods are commonly used as nonlinear smoothers [33, 34]. To construct GSN, we define the individual block Jacobians and residuals ; , which restricts to a block; and , which injects the solution to that block back into the overall solution. The point-block solver runs until the norm of the block residual is less than or steps have been taken.

1:procedure GSN()
3:     for   do
5:         while  and  do
7:               direct inversion of block Jacobian
9:         end while
11:     end for
Alg. 12 Gauss-Seidel-Newton


 solves small subproblems with a fair amount of computational work per degree of freedom and thus has high arithmetic intensity. It is typically greater than two times the work per degree of freedom of

NRICH for scalar problems, and even more for vector problems where the local Newton solve is over several local degrees of freedom.

When used as a solver, GSN requires one reduction per iteration. However, stationary solvers, in both the linear and nonlinear case, do not converge robustly for large problems. Accordingly, GSN would typically be used as a preconditioner, making monitoring of global convergence, within the GSN iterations, unnecessary. Thus, in practice there is no synchronization; GSN is most easily implemented with multiplicative update on serial subproblems and additively in parallel.

5.2 Nonlinear Additive Schwarz Method

GSN can be an efficient underlying kernel for sequential nonlinear solvers. For parallel computing, however, an additive method with overlapping subproblems has desirable properties with respect to communication and arithmetic intensity. The nonlinear additive Schwarz methods allow for medium-sized subproblems to be solved with a general method, and the corrections from that method summed into a global search direction. Here we limit ourselves to decomposition by subdomain rather than splitting the problem into fields. While eliminating fields might be tempting, the construction of effective methods of this sort is significantly more involved than in the linear case, and many interesting problems do not have such a decomposition readily available. Using subdomain problems , restrictions , injections , and solvers , Alg. 13 outlines the NASM solver with subdomains.

1:procedure NASM()
2:     for   do
6:     end for
Alg. 13 Nonlinear Additive Schwarz

Two choices exist for the injections in the overlapping regions. We will use NASM to denote the variant that uses overlapping injection corresponding to the whole subproblem step. The second variant, restricted additive Schwarz (RAS), injects the step in a non-overlapping fashion. The subdomain solvers are typically Newton-Krylov, but the other methods described in this paper are also applicable.

5.3 Full Approximation Scheme

FAS [6] accelerates convergence by advancing the nonlinear solution on a series of coarse rediscretizations of the problem. As with standard linear multigrid, the cycle may be constructed either additively or multiplicatively, with multiplicative being more effective in terms of per-iteration convergence. Additive approaches provide the usual advantage of allowing the coarse-grid corrections to be computed in parallel. We do not consider additive FAS in this paper.

Given the smoother at each level, as well as restriction (), prolongation () and injection () operators, and the coarse nonlinear function , the FAS V-cycle takes the form shown in Alg. 14.

1:procedure FAS()
Alg. 14 Full Approximation Scheme

The difference between FAS and linear multigrid is the construction of the coarse RHS . In FAS  it is guaranteed that if an exact solution to the fine problem is found,


The correction is not necessary in the case of linear multigrid. However, FAS is mathematically identical to standard multigrid when applied to a linear problem.

The stellar algorithmic performance of linear multigrid methods is well documented, but the arithmetic intensity may be low. FAS presents an interesting alternative to , since it may be configured with high arithmetic intensity operations at all levels. The smoothers are themselves nonlinear solution methods. FAS-type methods are applicable to optimization problems as well as nonlinear PDEs [8], with optimization methods used as smoothers [50].

5.4 Summary

We have now introduced the mathematical construction of nonlinear composed solvers. We have also described two classes of nonlinear solvers, based on solving either the global problem or some partition of it. The next step is to describe a set of test problems with different limitations with respect to which methods will work for them, construct a series of instructive example solvers using composition of the previously defined solvers, and test them. We have made an effort to create flexible and robust software for composed nonlinear solvers. The organization of the software is in the spirit of PETSc and includes several interchangeable component solvers, including NEWTK; iterative solvers such as ANDERSON and QN that may contain an inner preconditioner; decomposition solvers such as NASM and FAS, which are built out of subdomain solvers; and metasolvers implementing compositions . A diagram enumerating the composed solver framework analogous to that of the preconditioned NEWTK case is shown in Fig. 2.

Figure 2: Organization of the components of a composed nonlinear solver. We discard the difference between solvers and preconditioners and see that the nesting and potential for recursive customization lives at every level of the tree. Possibilities include iterative solvers (including Newton’s method) with nonlinear preconditioning, composite solvers consisting of several subsolvers, or decomposition solvers consisting of subdomain or coarse nonlinear solvers.

6 Experiments

We will demonstrate the efficacy nonlinear composition and preconditioning by experiments with a suite of nonlinear partial differential equations that show interesting behavior in regimes with difficult nonlinearities. These problems are nonlinear elasticity, the lid-driven cavity with buoyancy, and the -Laplacian.

One goal of this paper is to be instructive. We first try to solve the problem efficiently with the standard solvers, then choose a subset of the above solvers for each problem and show how to gain advantage by using composition methods. Limitations of discretization or problem regime are noted, and we discuss how the solvers may be used under these limitations. We also explain how readers may run the examples in this paper and experiment with the solvers both on the test examples shown here and on their own problems. We feel we have no “skin in the game” and are not trying to show that some approaches are better or worse than others.

6.1 Methods

Our set of test algorithms is defined by the preconditioning of an iterative solver with a decomposition solver, or the composition of two solvers with different advantages. In the case of the decomposition solvers, we choose one or two configurations per test problem, in order to avoid the combinatorial explosion of potential methods that already is apparent from the two forms of composition combined with the multitude of methods.

Our primary measure of performance is time to solution, which is both problem and equipment dependent. Since it depends strongly on the relative cost of function evaluation, Jacobian assembly, matrix multiplication, and subproblem or coarse solve, we also record number of nonlinear iterations, linear iterations, linear preconditioner applications, function evaluations, Jacobian evaluations, and nonlinear preconditioner applications. These measures allow us to characterize efficiency disparities in terms of major units of computational work.

We make a good faith effort to tune each method for performance while keeping subsolver parameters invariant. Occasionally, we err on the side of oversolving for the inner solvers, to make the apples-to-apples comparison more consistent between closely related methods. The results here should be taken as a rough guide to improving solver performance and robustness. The test machine is a 64-core AMD Opteron 6274-based [1] machine with 128 GB of memory. The problems are run with one MPI process per core. Plots of convergence are generated with Matplotlib [36], and plots of solutions are generated with Mayavi [56].

6.2 Nonlinear Elasticity

A Galerkin formulation for nonlinear elasticity may be stated as


for all test functions ;

is the deformation gradient. We use the Saint Venant-Kirchhoff model of hyperelasticity with second Piola-Kirchhoff stress tensor

for Lagrangian Green strain and Lamé parameters and , which may be derived from a given Young’s modulus and Poisson ratio. We solve for displacements , given a constant load vector imposed on the structure.

The domain is a cylindrical arch of inner radius m. Homogeneous Dirichlet boundary conditions are imposed on the outer edge of the ends of the arch. The goal is to “snap through” the arch, causing it to sag under the load rather than merely compressing it. To cause the sag, we set and set the Lamé constants consistent with a Young’s modulus of 100 and a Poisson ratio of 0.2. The nonlinearity is highly activated during the process of snapping through and may be tuned to present a great deal of difficulty to traditional nonlinear solvers. The problem is discretized by using hexahedral finite elements on a logically structured grid, deformed to form the arch. The grid is 401x9x9, and therefore the problem has 97,443 degrees of freedom. The problem is a three-dimensional extension of a problem put forth by Wriggers [75]. The unstressed and converged solutions are shown in Fig. 3.

Figure 3: Unstressed and stressed configurations for the elasticity test problem. Coloration indicates vertical displacement in meters.

For 3D hexahedral FEM discretizations, GSN and related algorithms are inefficient because for each degree of freedom each element in the support of the degree of freedom must be visited, resulting in eight visits to an interior element per sweep. Instead, we focus on algorithms requiring only a single visit to each cell, restricting us to function and Jacobian evaluations. Such approaches are also available in the general unstructured FEM case, and these experiments may guide users in regimes where no decomposition solvers are available.

For the experiments, we emphasize the role that nonlinear composition and series acceleration may play in the acceleration of nonlinear solvers. The problem is amenable to NCG, ANDERSON, and NEWTK with an algebraic multigrid preconditioner. We test a number of combinations of these solvers. Even though we have a logically structured grid, we approach the problem as if no reasonable grid hierarchy or domain decomposition were available. The solver combinations we use are listed in Table 2. In all the following tables, Solver denotes outer solver, LPC denotes the linear PC when applicable, NPC denotes the nonlinear PC when applicable, Side denotes the type of preconditioning, Smooth denotes the level smoothers used in the multilevel method (MG/FAS), and LS denotes line search.

Name Solver LPC NPC Side Smooth LS
Table 2: Series of solvers for the nonlinear elasticity test problem.

For the NEWTK methods, we precondition the inner GMRES solve with a smoothed aggregation algebraic multigrid method provided by the GAMG package in PETSc. The relative tolerance for the GMRES solve is . For all instances of the NCG solver, the CP line search initial guess for is the final value from the previous nonlinear iteration. In the case of the inner line search is L2. NGMRES stores up to 30 previous solutions and residuals for all experiments. The outer line search for is a second-order secant approximation rather than first-order as depicted in Alg. 1. For the composite examples, 10 iterations of NCG are used as one of the subsolvers, denoted . The additive composition ’s weights are determined by the ANDERSON minimization mechanism as described in Section 4.3.

The example, SNES ex16.c, can be run directly by using a default PETSc installation. The command line used for these experiments is

./ex16 -da_grid_x 401 -da_grid_y 9 -da_grid_z 9 -height 3 -width 3
  -rad 100 -young 100 -poisson 0.2 -loading -1 -ploading 0
Figure 4: and NCG convergence.
Solver T N. It L. It Func Jac PC NPC
23.43 27 1556 91 27 1618
NCG 53.05 4495 0 8991
Table 3: and NCG results.

The solvers shown in Table 3 converge slowly. takes 27 nonlinear iterations and 1,618 multigrid V-cycles to reach convergence. NCG requires 8,991 function evaluations and since it is unpreconditioned, scales unacceptably with problem size with respect to number of iterations. Note in Fig. 4 that while NCG takes an initial jump and then decreases at a constant rate, is near stagnation until the end, when it suddenly drops into the basin of attraction.

Figure 5: and convergence.
Solver T N. It L. It Func Jac PC NPC
14.92 9 459 218 9 479
16.34 11 458 251 11 477
Table 4: and results.

Results for composition are listed in Table 4. Additive and multiplicative composite combination provides roughly similar speedups for the problem, with more total outer iterations in the additive case. As shown in Fig. 5, neither the additive nor multiplicative methods stagnate. After the same initial jump and convergence at the pace of NCG, the basin of attraction is reached, and the entire iteration converges quadratically as should be expected with NEWTK.

Figure 6: and convergence.
Solver T N. It L. It Func Jac PC NPC
9.65 13 523 53 13 548 13
9.84 13 529 53 13 554 13
Table 5: and results.

Left-preconditioning of NCG and ANDERSON with NEWTKMG provides even greater benefits, as shown in Fig. 6 and Table 5. The number of iterations is nearly halved from the unpreconditioned case, and the number of linear preconditioner applications is only slightly increased compared with the composition cases. In Fig. 6, we see that the Newton’s method-preconditioned nonlinear Krylov solvers never dive as NEWTK did but maintain rapid, mostly constant convergence instead. This constant convergence is a significantly more efficient path to solution than the solver alone.

6.3 Driven Cavity

The driven cavity formulation used for the next set of experiments can be stated as


The fluid motion is driven in part by a moving lid and in part by buoyancy. A Grashof number , Prandtl number , and lid velocity of 100 are used in these experiments. No-slip, rigid-wall Dirichlet conditions are imposed for . Dirichlet conditions are used for , based on the definition of vorticity, , where along each constant coordinate boundary the tangential derivative is zero. Dirichlet conditions are used for on the left and right walls, and insulating homogeneous Neumann conditions are used on the top and bottom walls. A finite-difference approximation with the usual 5-point stencil is used to discretize the boundary value problem in order to obtain a nonlinear system of equations. Upwinding is used for the divergence (convective) terms and central differencing for the gradient (source) terms.

In these experiments, we emphasize the use of ANDERSON, MG or FAS, and GSN. The solver combinations are listed in Table 6. The multilevel methods use a simple series of structured grids, with the smallest being 5x5 and the largest being 257x257. With four fields per grid point, we have an overall system size of 264,196 unknowns. In , the Jacobian is rediscretized on each level. GSN is used as the smoother for FAS and solves the four-component block problem corresponding to (6.2) per grid point in a simple sweep through the processor local part of the domain.

The example, SNES ex19.c, can be run directly by using a default PETSc installation. The command line used for these experiments is

  ./ex19 -da_refine 6 -da_grid_x 5 -da_grid_y 5 -grashof 2e4 -lidvelocity 100 -prandtl 1.0

and the converged solution using these options is shown in Fig. 7.

Name Solver LPC NPC Side MG/FASSmooth. LS
Table 6: Solvers for the lid driven cavity problem.

All linearized problems in the NEWTK iteration are solved by using GMRES with geometric multigrid preconditioning to a relative tolerance of . There are five levels, with ChebychevSOR smoothers on each level. For FAS, the smoother is GSN(5). The coarse-level smoother is five iterations of NEWTK-LU; chosen for robustness. In the case of and , the step is damped by one half. Note that acting alone, this damping would cut the rate of convergence to linear with a rate constant . The ideal step size is recovered by the minimization procedure. is undamped in all other tests.

Figure 7: , , and temperature profiles for the lid driven cavity solution.

In these experiments we emphasize the total number of V-cycles, linear and nonlinear, since they dominate the runtime and contain the lion’s share of the communication and floating-point operations.

Figure 8: and convergence.
Solver T N. It L. It Func Jac PC NPC
7.48 10 220 21 50 231 10
9.83 17 352 34 85 370
Table 7: and results.

In Table 7, we see that Newton’s method converges in 17 iterations, with 370 V-cycles. Using NGMRES instead of a line search provides some benefit, as takes 231 V-cycles and 10 iterations.

Figure 9: , , and FAS convergence.
Solver T N. It L. It Func Jac PC NPC
1.91 24 0 447 83 166 24
3.20 50 0 1180 192 384 50
FAS 6.23 162 0 2382 377 754
Table 8: , , and FAS results.

takes 50 V-cycles, at the expense of three more fine-level function evaluations per iteration. reduces the number of V-cycles to 24 at the expense of more communication.

Figure 10: and convergence.
Solver T N. It L. It Func Jac PC NPC
4.01 5 80 103 45 125
8.07 10 197 232 90 288
Table 9: and results.

Composed nonlinear methods are shown in Table 9. Multiplicative composition consisting of FAS and NEWTKMG reduces the total number of V-cycles to 130. Additive composition using the least-squares minimization is less effective, taking 298 V-cycles. Note in Fig. 10 that both the additive and multiplicative solvers show that combining FAS and NEWTK-MG may speed solution, with multiplicative combination being significantly more effective.

6.4 Tuning the Solvers to Obtain Convergence

We now show how the composed and preconditioned solves may be tuned for more difficult nonlinearities where the basic methods fail to converge using the same model problem. For Grashof number and Prandtl number , Newton’s method converges well:

lid velocity = 100, prandtl # = 1, grashof # = 10000
  0 SNES Function norm 715.271
  1 SNES Function norm 623.41
  2 SNES Function norm 510.225
  6 SNES Function norm 0.269179
  7 SNES Function norm 0.00110921
  8 SNES Function norm 1.12763e-09
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 8
Number of SNES iterations = 8

For higher Grashof number, Newton’s method stagnates

./ex19 -lidvelocity 100 -grashof 5e4 -da_refine 4 -pc_type lu -snes_monitor_short -snes_converged_reason
lid velocity = 100, prandtl # = 1, grashof # = 50000
  0 SNES Function norm 1228.95
  1 SNES Function norm 1132.29
 29 SNES Function norm 580.937
 30 SNES Function norm 580.899

It also fails with the standard continuation strategy from coarser meshes (not shown). We next try nonlinear multigrid, in the hope that multiple updates from a coarse solution are sufficient, using GS as the nonlinear smoother,

./ex19 -lidvelocity 100 -grashof 5e4 -da_refine 4 -snes_monitor_short -snes_converged_reason -snes_type fas
       -fas_levels_snes_type ngs -fas_levels_snes_max_it 6 -snes_max_it 25
lid velocity = 100, prandtl # = 1, grashof # = 50000
  0 SNES Function norm 1228.95
  1 SNES Function norm 574.793
  2 SNES Function norm 513.02
  3 SNES Function norm 216.721
  4 SNES Function norm 85.949
  5 SNES Function norm 108.24
  6 SNES Function norm 207.469
 22 SNES Function norm 131.866
 23 SNES Function norm 114.817
 24 SNES Function norm 71.4699
 25 SNES Function norm 63.5413

But the residual norm just jumps around and the method does not converge. We then accelerate the method with ANDERSON and obtain convergence.

./ex19 -lidvelocity 100 -grashof 5e4 -da_refine 4 -snes_monitor_short -snes_converged_reason -snes_type anderson
       -npc_snes_max_it 1 -npc_snes_type fas  -npc_fas_levels_snes_type ngs -npc_fas_levels_snes_max_it 6
lid vel