1 Introduction
Interior methods for nonlinear optimization [52, 7, 49, 17] are essential in many areas of science and engineering. They are commonly used in model predictive control with applications to robotics [43], autonomous cars [53], aerial vehicles [21], combustion engines [22], and heating, ventilation and airconditioning systems [25], to name a few. Interior methods are also used in public health policy strategy development [3, 42], data fitting in physics [36], genome science [4, 50], and many other areas.
Most of the computational cost within interior methods is in solving linear systems of KarushKuhnTucker (KKT) type [29]. The linear systems are sparse, symmetric indefinite, and usually illconditioned and difficult to solve. Furthermore, implementations of interior methods for nonlinear optimization, such as the filterlinesearch approach in Ipopt [52] and HiOp [34]
, typically expect the linear solver to provide the matrix inertia (number of positive, negative and zero eigenvalues) to determine if the system should be regularized. (Otherwise, interior methods perform curvature tests to ensure descent in a certain merit function
[10].) Relatively few linear solvers are equipped to solve KKT systems, and even fewer to run those computations on hardware accelerators such as graphic processing units (GPUs) [45].At the time of writing, six out of the ten most powerful computers in the world have more than 90% of their compute power in hardware accelerators [47]. Hardware accelerator technologies are becoming ubiquitous in offtheshelf products, as well. In order to take advantage of these emerging technologies, it is necessary to develop finegrain parallelization techniques tailored for highthroughput devices such as GPUs.
For such sparse problems, pivoting becomes extremely expensive, as data management takes a large fraction of the total time compared to computation [14]. Unfortunately, LDL factorization is unstable without pivoting. This is why LDL approaches, typically used by interior methods for nonlinear problems on CPUbased platforms [46], have not performed as well as on hardware accelerators [45]. Some iterative methods such as MINRES [30]
for general symmetric matrices can make efficient (albeit memory bandwidth limited) use of GPUs because they only require matrixvector multiplications at each iteration, which can be highly optimized, but they have limited efficiency when the number of iterations becomes large. Another approach for betterconditioned KKT systems is using a modified version of the preconditioned conjugate gradient (
PCG) with implicitfactorization preconditioning [12]. In our case, the illconditioned nature of our linear problems means that iterative methods alone are not practical [30, 35].We propose a hybrid directiterative method for solving KKT systems that is suitable for execution on hardware accelerators. The method only requires direct solves using a Cholesky factorization, as opposed to LDL, which means it avoids pivoting. We provide preliminary test results that show the practicality of our approach. Our test cases are generated by optimal power flow analysis [8, 23, 27] applied to realistic power grid models that resemble actual grids, but do not contain any proprietary data [26]. These systems are extracted from optimal power flow analysis using Ipopt [52] with MA57 as its linear solver. Solving such sequences of linear problems gives us an insight in how our linear solver behaves within an interior method. Using these test cases allowed us to assess the practicality of our hybrid approach without interfacing the linear solver with an optimization solver. Power grids are representative of very sparse and irregular systems commonly found in engineering disciplines.
The paper is organized as follows. Table 1 defines our notations. Section 2 describes the optimization problem being solved. Section 3 defines the linear systems that arise when an interior method is applied to the optimization problem. In Section 4, we derive a novel hybrid directiterative algorithm to utilize the block structure of the linear systems, and prove convergence properties for the algorithm. Numerical tests in Section 5 show the accuracy of our algorithm on realistic systems, using a range of algorithmic parameter values. Section 6 compares our C and CUDA implementation to MA57 [13]. Section 7 explains our decision to use a direct solver in the inner loop of our algorithm. In Section 8, we summarize our main contributions and results. Appendix A provides supplemental figures for Section 5.
Variable  Properties  Functions  Meaning 

Symmetric matrix  ,  largest, smallest (most negative) eigenvalues  
SPSD matrix  the smallest nonzero eigenvalue  
SPD matrix  condition number  
rectangular matrix  null()  nullspace  
vector,  A diagonal matrix ,  
vector  a vector of s 
2 Nonlinear optimization problem
We consider constrained nonlinear optimization problems of the form equationparentequation
(1a)  
s.t.  (1b)  
(1c)  
(1d) 
where is an vector of optimization parameters, is a possibly nonconvex objective function, defines equality constraints, and defines inequality constraints. (Problems with more general inequalities can be treated in the same way.) Functions , and are assumed to be twice continuous differentiable. Interior methods enforce bound constraints (1d) by adding barrier functions to the objective (1a):
where the inequality constraints (1c) are treated as equality constraints with slack variables . The barrier parameter is reduced toward zero using a continuation method to obtain solution that is close to the solution of (1) to within a solver tolerance.
Interior methods are most effective when exact first and second derivatives are available, as we assume for , , and . We define and as the sparse Jacobians for the constraints. The solution of a barrier subproblem satisfies the nonlinear equations equationparentequation
(2a)  
(2b)  
(2c)  
(2d)  
(2e)  
(2f) 
where and are primal variables, and are Lagrange multipliers (dual variables) for constraints (2c)–(2d), and and are Lagrange multipliers for the bounds and . The conditions , , , and are maintained throughout, and the matrices and are SPD.
Analogously to [52], at each continuation step in we solve nonlinear equations (2) using a variant of Newton’s method. Typically and are eliminated from the linearized version of (2) by substituting the linearized versions of (2e) and (2f) into the linearized versions of (2a) and (2b), respectively, to obtain a smaller symmetric problem. Newton’s method then calls the linear solver to solve a series of linearized systems , , of block form
(3) 
where index denotes optimization solver iteration (including continuation step in and Newton iterations), each is a KKTtype matrix (with saddlepoint structure), vector is a search direction^{1}^{1}1Search directions are defined such that for some linesearch steplength . for the primal and dual variables, and is derived from the residual vector for (2) evaluated at the current value of the primal and dual variables (with as the method converges):
With and , the sparse Hessian
and diagonal are matrices, is a diagonal matrix, is a sparse matrix, and is a sparse matrix. We define , , and .
Interior methods may take hundreds of iterations (but typically not thousands) before they converge to a solution. All matrices have the same sparsity pattern, and their nonzero entries change slowly with . An interior method can exploit this by reusing output from linear solver functions across multiple iterations :

Ordering and symbolic factorization are needed only once because the sparsity pattern is the same for all .

Numerical factorizations can be reused over several adjacent Newton’s iterations, e.g., when an inexact Newton solver is used within the optimization algorithm.
Operations such as triangular solves have to be executed at each iteration .
The workflow of the optimization solver with calls to different linear solver functions is shown in Fig. 1 (where denotes the linear system to be solved at each iteration). The main optimization solver loop is the top feedback loop in Fig. 1. It is repeated until the solution is optimal or a limit on optimization iterations is reached. At each iteration, the residual vector is updated. Advanced implementations have control features to ensure stability and convergence of the optimization solver. The lower feedback loop in Fig. 1 shows linear system regularization by adding a diagonal perturbation to the KKT matrix. One such perturbation removes singularity [52, Sec. 3.1], which happens if there are redundant constraints. The linear solver could take advantage of algorithm control like this and request matrix perturbation when beneficial.
3 Solving KKT linear systems
LDL factorization via MA57 [13] has been used effectively for extremely sparse problems on traditional CPUbased platforms, but is not suitable for fine grain parallelization required for GPU acceleration. Parallel and GPU accelerated direct solve implementations such as SuperLU [24, 2], STRUMPACK [38, 44], and PaStiX [19, 31] exist for general symmetric indefinite systems (although the first two are designed for general systems), but these software packages are designed to take advantage of dense blocks of the matrices in denser problems and do not perform well on our systems of interest, which do not yield these dense blocks [45, 18].
The fundamental issue with using GPUs for LDL is that this factorization is not stable without pivoting [16]. Pivoting requires considerable data movement and, as a result, a substantial part of the run time is devoted to memory access and communication. Any gains from the hardware acceleration of floating point operations are usually outweighed by the overhead associated with permuting the system matrix during the pivoting. This is especially burdensome because both rows and columns need to be permuted in order to preserve symmetry [14]. While any of the two permutations can be performed efficiently on its own with an appropriate data structure for the system’s sparse matrix (i.e., row compressed sparse storage for row permutations and, analogously, column compressed sparse storage for column permutations) swapping both rows and columns simultaneously is necessarily costly.
Here we propose a method that uses sparse Cholesky factorization (in particular, a GPU implementation). Cholesky factorization is advantageous for a GPU implementation because it is stable without pivoting and can use GPUs efficiently compared to LDL [37]. Furthermore, the ordering of the unknowns (also known as the symbolic factorization) for the purpose of reducing fillin in the factors can be established without considering the numerical values and only once at the beginning of the optimization process, and hence, its considerable computational cost is amortized over the optimization iterations.
To make the problem smaller, we can eliminate and from (3) to obtain the system [32, Sec. 3.1]
(4) 
where . This reduction requires blockwise Gaussian elimination with block pivot , which is illconditioned when has large elements, as it ultimately does. Thus, system (4) is smaller but more illconditioned. After solving (4), we compute and in turn to obtain the solution of (3).
4 A block system solution method
Let be any SPD matrix. Multiplying the second row of (4) by and adding it to the first row gives a system of the form
(5) 
where . The simplest choice is with :
(6) 
When is SPD, its Schur complement is well defined, and (5) is equivalent to
(7)  
(8) 
This is the approach of Golub and Greif [15] for saddlepoint systems (which have the structure of the system in (4)). Golub and Greif found experimentally that made SPD and better conditioned than smaller or larger values. We show in Theorems 2 and 4 that for large , the condition number increases as , but converges to as . Corollary 1 shows there is a finite value of that minimizes
. This value is probably close to
.Our contribution is to combine the system reduction in [32] (from (3) to (4)) with the method of [15] for changing (4) to (5) to solve an optimization problem consisting of a series of block systems using a GPU implementation of sparse Cholesky factorization applied to . A new method for regularizing is added, and practical choices for parameters are given based on scaled systems. Also, important convergence properties of the method are proven in Theorems 1–4.
If (5) or are poorly conditioned, the only viable option may be to ignore the block structure in (5) and solve (3) with an LDL factorization such as MA57 (likely without help from GPUs). This is the failsafe option. Otherwise, we require to be SPD (for large enough ) and use its Cholesky factorization to apply the conjugate gradient method (CG) [20] or MINRES [30] to (7) with or without a preconditioner.If the part of QR factors of is not too dense, we could use
as a multiplicative preconditioner. This gives the exact solution if the RHS is orthogonal to null() or is square, which is not possible in our case. In our experiments, solutions with the preconditioner lagged slightly behind those without it, but both take iterations. We proceed without the use of preconditioner.
4.1 A hybrid solver with minimal regularization
Typically, (4) starts with an SPD and fullrank . As the optimization solver iterations progress, may become indefinite and ’s rank may shrink (at least numerically). This means the system becomes singular and must be regularized. We seek a small regularization to avoid changing too much the solution of the equivalent system (5).
An SPD guarantees that is SPD on null(), a requirement at the solution of the optimization problem.
Theorem 1.
For large enough () and full row rank , is SPD iff is positive definite on null().
Proof.
Assume is SPD. For any nonzero vector in null(), we have (this direction of the proof does not require to have full row rank).
Conversely, assume is positive definite on null() and has full row rank. For any nonzero vector with in null() and orthogonal to null(),
We have by assumption. Further,
if . ∎
We work with . We use SPD as a proxy for being SPSD on null(), keeping in mind that if it does not hold even for a very large in (6), is singular and needs to be modified. However, cannot be made arbitrarily large without increasing when is rectangular, as in our case. There must be an ideal intermediate value of .
Theorem 2.
If has full row rank with more columns than rows, is symmetric and positive definite on null(), and , there exists such that for , increases linearly with .
Proof.
Corollary 1.
Among s such that is SPD (), is minimized for some .
In practice, the optimizer may provide systems where is not SPD on null(). In this case we can regularize by using instead. Unlike , the introduction of changes the solution of the system, so it is essential to keep as small as possible. If is not SPD, we set , a parameter for some minimum value of regularization. If is still not SPD, we double until it is. This ensures we use the minimal value of regularization (to within a factor of 2) needed to make SPD.
If proves to be large, which can happen before we are close to a solution, it is essential for the optimizer to be informed to allow it to modify the next linear system. When the optimizer nears a solution, will not need to be large.
In our tests, starts at for the next matrix in the sequence, but is set back to its previous value if the factorization fails.
We also set , the maximum allowed before we resort to LDL factorization of or return to the optimization solver.
If has low row rank, in (7) is SPSD. In this case, CG will succeed if (7) is consistent. Otherwise, it will encounter a nearzero quadratic form. We then restart CG on the regularized system . In this way, regularizes the block and regularizes the block.
To ensure we can judge the size of parameters and relative to system (5), we first scale (5) with a symmetrized version of Ruiz scaling [40].
Algorithm 1 is a generalization of the Uzawa iteration [48] for KKT systems with a (1,1) block that is not necessarily positive definite. It gives a method for solving a sequence of systems {(7)–(8)} with used in the calculation of . The workflow is similar to Fig. 1 except only is factorized. On lines 16–19, is applied by direct triangular solves with the Cholesky factors , of . Each iteration of the CG solve on line 17 requires multiplication by , applying , multiplying by , and adding a scalar multiple of the original vector. Section 7 shows why complete Cholesky factorization of was chosen.
The rest of the section discusses other important bounds that must obey. However, selecting an ideal
in practice is difficult and requires problem heuristics (like
in [16]) or trial and error.4.2 Guaranteed descent direction
Optimization solvers typically use an LDL factorization to solve the system (3) at each step because (with minimal extra computation) it supplies the inertia of the matrix. A typical optimization approach treats each of the four block of (3) as one block, accounting for the possible regularization applied to . We mean that the (1,1) block is a Hessian of dimension , and the (2,1) block represents the constraint Jacobian and has dimensions . The inertia being implies that (a) is uniformly SPD on null() for all , meaning for all vectors satisfying , iterations , and some positive constant ; (b) KKT matrix in (3) is nonsingular. Together these conditions are sufficient to ensure a descent direction and fast convergence in the optimization algorithm [52, 29]. We show that Algorithm 1 ensures properties (a) and (b) without computing the inertia of the KKT matrix.
Theorem 3.
If Algorithm 1 succeeds with , it provides a descent direction for the interior method for large enough .
Proof.
By construction, is uniformly SPD (because the Cholesky factorization was successful for all iterations of the solver until this point) and the KKT system in (5) is nonsingular (because has full row rank, or was added to to shift it from SPSD to SPD). Therefore, Algorithm 1 provides a descent direction for (5) even if regularization is added. The rest of the proof assumes , and we return to the other case later. Since has full row rank, is nonsingular by assumption, all block rows in (3) have full rank internally, and Gaussian elimination cannot cause cancellation of an entire block row, we conclude that (3) is nonsingular. Let (for used in Algorithm 1). For any nonzero vector with and of dimensions and ,
where . If , then , which means and the proof is complete (with ). Otherwise, . So for large enough , . Applying Theorem 1 to (3) with and replacing and shows that is positive definite on null(). ∎
We note that corresponds to the socalled primal regularization of the filter linesearch algorithm [52]. Under this algorithm, whenever becomes too large, one can invoke the feasibility restoration phase of the filter linesearch algorithm [52] as an alternative to performing the LDL factorization on the CPU. Feasibility restoration “restarts” the optimization at a point with more favorable numerical properties. We also note that when is sufficiently large, the curvature test used in [10] should be satisfied. Hence, inertiafree interior methods have the global convergence property without introduction of other regularization from the outer loop.
The regularization is a numerical remedy for lowrank , caused by redundant equality constraints. This regularization is similar to the socalled dual regularization used in Ipopt [52] and specifically addresses the issue of rank deficiency; however, there is no direct analogue from a regularization in (5) to Ipopt’s dual regularization in (3) and neither of the two heuristics guarantees a descent direction for (3). Given the similarity between the two heuristics, we believe that the regularization can be effective within the filter linesearch algorithm.
When Algorithm 1 is integrated with a nonlinear optimization solver such as [51] and [33], the while loop (Line 5) in Algorithm 1 can be removed and the computation of and can be decided by the optimization algorithm. The development of robust heuristics that allow Algorithm 1’s and regularization within the filter linesearch algorithm will be subject of future work.
4.3 Convergence for large
In Section 4.1 we showed that is SPD for large enough , and that should not be so large that the lowrank term makes illconditioned. Here we show that in order to decrease the number of CG iterations, it is beneficial to increase beyond what is needed for to be SPD.
Theorem 4.
In exact arithmetic, for , the eigenvalues of converge to with an error term that decays as .
Proof.
By definition,
(9) 
Since is nonsingular and has full row rank by assumption, the Searle identity [41] with and gives
For , . The identity , which applies when , gives
(10) 
∎
A different proof of an equivalent result is given by Benzi and Liu [5, Lemma 3.1].
Corollary 2.
The eigenvalues of are well clustered for , and the iterative solve in Algorithm 1 line 16 converges quickly.
5 Practicality demonstration
We demonstrate the practicality of Algorithm 1 using five series of linear problems [26] generated by the Ipopt solver performing optimal power flow analysis on the power grid models summarized in Table 2. We compare our results with a direct solve via MA57’s LDL factorization [13].
Name  nnz()  

South Carolina grid  56K  411K 
Illinois grid  4.64K  21.6K 
Texas grid  55.7K  268K 
US Western Interconnection grid  238K  1.11M 
US Eastern Interconnection grid  1.64M  7.67M 
5.1 selection
We use Algorithm 1 with CG as the iterative method on line 16. A larger in may improve CG convergence but make more illconditioned and increase the error in the solution of (3). We therefore run some preliminary tests to find a suitable before selecting and testing other problems, and we require CG to solve accurately (stopping tolerance for the relative residual norm). We start with one of the smaller problems, the Illinois grid, to eliminate some values of and see if the condition number of for each matrix in the sequence is useful in determining the needed iterations or regularization. We run these tests with .
Figure 2(a) shows that for the Illinois grid sequence, values of give CG convergence in approximately 10 iterations for every matrix in the sequence. For the last matrix we found that eigenvalues of in (7) are very well clustered and the condition number of is , as shown in Fig. 2(b). This guarantees and explains the rapid convergence because for CG applied to a general SPD system ,
(11) 
where is the error in an approximate solution at iteration [16].
For the last few matrices, is the only value requiring . The final value of was . No important information was gleaned from . For all other values of , for the whole sequence. For a system and an approximate solution , we define the backward error BE as and the relative residual RR as . As is common practice, we use
to estimate
, which is too expensive to calculate directly. always provides an upper bound for , but in practice is quite close to the actual value. Note that MA57 always has a BE of order machine precision. Figure 3 shows the (a) BE and (b) RR for system (3) for varying . Results for the BE and RR of system (4) are not qualitatively different and are given in Appendix A. One conclusion is that increasing to reduce CG iterations can be costly for the accuracy of the solution of the full system. Based on the results of this section, in the range – gives reasonable CG iterations and final accuracy. For other matrices, we present a selected in this range that produced the best results.5.2 Results for larger matrices
Solving larger (and perhaps more poorly conditioned) problems brings about new computational challenges and limits the amount of time any particular task can take. We wish to set small enough to avoid overregularizing the problem, and large enough to eliminate wasteful iterations and numerical issues. We want small enough to recognize that we have overregularized (and should try a different method), but large enough to allow for reasonable regularization. In our numerical tests, we use and large enough so that can increase until is SPD. This informs the parameter selection for the next system.
Figure 4 shows that the South Carolina grid matrices as currently constructed cannot benefit from this method. They need to make SPD, which on a scaled problem means as much weight is given to regularization as to the actual problem. Algorithm 1 succeeds on the other matrix sequences, at least for certain s, and needs no regularization ().
For the US Western Interconnection grid, Fig. 5(a) shows a CG convergence graph and (b) shows several types of error. For the US Eastern Interconnection grid, Fig. 6(a) shows a CG convergence graph and (b) shows several types of error. Figures for the Texas grid are given in Appendix A, as they do not provide more qualitative insight. Convergence occurs for all matrix sequences in less than iterations on average. The BE for (3) is consistently less than and, with two exceptions in the US Western Interconnection grid, is close to machine precision. Results for the US Eastern Interconnection grid show that the method does not deteriorate with problem size, but rather there are some irregular matrices in the US Western Interconnection grid.
The results in this section suggest that in the range down to is reasonable for any . There is no clear choice for , but a plausible value would be . This way we are guaranteed that the regularization doesn’t take over the problem, and the number of failed factorizations is limited to , which should be negligible in the total solution times for a series of problems.
5.3 Reordering
The efficiency of sparse Cholesky factorization depends greatly on the row/column ordering defined by permutation . Figure 7 compares the sparsity of , corresponding to of matrix 22 in the Illinois grid sequence, obtained from two choices of : approximate minimum degree (AMD) and nested dissection. (The data in this section are generated via Matlab.) We see that AMD produces a sparser ( nonzeros vs. ).
Reverse CuthillMcKee and no ordering gave and nonzeros respectively. Recall that the sparsity structure is identical for matrices from the same family and similar for other matrix families. As expected, AMD was the sparsest ordering tested for other matrix families. Thus, AMD is our reordering of choice. It is a onetime cost performed during the optimization problem setup.
6 Comparison with LDL
We compare our method with a direct solve of (3) using MA57’s LDL factorization [13] with default settings. All testing in this section is done using a prototype C/CUDA code on a single GPU device. Reference MA57 solutions were computed on a CPU. Further details on computational platforms used are given in Table 3.
Machine name  Host processor  Accelerator device  Host compiler  Device compiler 

Newell  IBM Power9  NVIDIA Volta 100  GCC 7.4.0  CUDA 10.2 
Deception  AMD EPYC 7502  NVIDIA Ampere 100  GCC 7.5.0  CUDA 11.1 
The factorization density is . We define analogously for the Cholesky factors of in (5), with and dimension . Note that gives the average number of nonzeros in the factorization per row or column. Table 4 shows that the Cholesky factor of is usually less dense than the LDL factor for (3), even though (3) is sparser than (5). Table 5 shows the solve times on the Newell computing cluster [28]. The main trend is that as the problem size increases, GPUs using cuSolver [1] increasingly outperform equivalent computations on a CPU.
Abbreviation  

Illinois  4.64K  94.7K  20.4  2.28K  34.9K  15.3 
Texas  55.7K  2.95M  52.9  25.9K  645K  24.9 
Western US  238K  10.7M  44.8  116K  2.23M  19.2 
Eastern US  1.64M  85.4M  52.1  794K  17.7M  22.3 
Name  MA57  CM CPU  CM GPU  CS CPU  CS GPU 

Illinois  
Texas  
Western US  
Eastern US 
Name  MA57  Hybrid DirectIterative Solver  

Analysis  Factorization  Total solves  
Illinois  
Texas  
Western US  
Eastern US 
Supernodal Cholesky via Cholmod in Suitesparse [9] does not perform well on GPUs for these test cases, but performs better than cuSolver on CPUs. This matches literature showing that multifrontal or supernodal approaches are not suitable for very sparse and irregular systems, where the dense blocks become too small, leading to an unfavorable ratio of communication versus computation [11, 6, 18]. This issue is exacerbated when supernodal or multifrontal approaches are used for finegrain parallelization on GPUs [45]. Our method becomes better when the ratio of LDL to Cholesky factorization time grows, because factorization is the most costly part of linear solvers and our method has more (but smaller and less costly) system solves.
Table 6 compares a direct solve of (3) using MA57’s LDL factorization [13] and the full CG solve and direct solves on {(7)–(8)}, broken down into symbolic analysis of , factorization of , and CG on (7) on Deception [28].
When Cholesky is used, symbolic analysis is needed only for the first matrix in the sequence because pivoting is not a concern. As problems grow larger, the solve phase becomes a smaller part of the total run time. Also, our method increasingly outperforms MA57. The run time is reduced by a factor of more than on one matrix from the US Eastern Interconnection grid and more than on the whole series, because the analysis cost can be amortized over the entire optimization problem. This motivates using our method for even larger problems.
Another advantage of our method is that it solves systems that are less than half the size of the original one, though it does have to solve more of them. Notably, the LDL factorization may require pivoting during the factorization, whereas Cholesky does not. With MA57, all our test cases required substantial permutations even with a lax pivot tolerance of 0.01 (and a tolerance as large as 0.5 may be required to keep the factorization stable). This means that for our systems, LDL factorization requires considerable communication and presents a major barrier for GPU programming. On the other hand, the direct solve is generally more accurate by 2–3 orders of magnitude.
7 Iterative vs. direct solve with in Algorithm 1
We may ask if forming and its Cholesky factorization is worthwhile when it could be avoided by iterative solves with . Systems (7)–(8) require two solves with and an iterative solve with , which includes “inner” solves with until CG converges. As we see in Section 5, between 6 and 92 solves with are needed in our cases (and possibly more in other cases). Further, the inner iterative solves with would degrade the accuracy compared to inner direct solves or would require an excessive number of iterations. Therefore, for iterative solves with to be viable, the direct solve would have to cause substantial densification of the problem (i.e., the Cholesky factor would have to be very dense). Let be the number of multiplications when is applied as an operator, and be the number of multiplications for solving systems with . These values (generated in Matlab) and their ratio are given in Table 7. The ratio is always small and does not grow with problem size, meaning remains very sparse and the factorization is efficient. As the factorization dominates the total time of a direct solve with multiple righthand sides, this suggests that performing multiple inner iterative solves is not worthwhile.
Name  ratio  

Illinois  20.5K  34.9K  1.70 
Texas  249K  646K  2.59 
Western US  1.05M  2.23M  2.11 
Eastern US  7.23M  17.7M  2.45 
8 Summary
Following the approach of Golub and Greif [15
Comments
There are no comments yet.