1 The problem
We are interested in iterative methods for solving convex quadratic optimization problem
where is an symmetric positive definite matrix and a given vector in
. Such problems are at the centre of efficient methods for a large variety of domains in applied mathematics, the most proeminent being large-scale numerical nonlinear optimization and the solution of large symmetric positive-definite linear systems (often derived from applications in partial differential equations). It is thus critical to make the solution of (1.1) as efficient as possible, especially when the problem size grows. Since the cost of most iterative methods for solving (1.1) is often dominated by the (potentially many) computations of products of the form for some vector , it is then of interest to investigate if efficiency gains may be obtained for this ’core’ operation. This is the object of this paper.
Two different but converging contexts motivate the analysis presented here. The first is the frequent occurence of problems involving a hierachy of model representations themselves resulting in the ability to use approximate versions of to compute the product . This occurs for instance in discretized applications, possibly in a multi-resolution framework, or in inexactly weighted linear and nonlinear least-squares where the product itself is obtained by applying an iterative procedure(1)(1)(1)Our starting point was a nonlinear weighted least-squares problem occuring in large-scale data assimilation for weather forecasting , where the inverse of the weighting matrix can not be computed. It use thus requires the iterative solution of an innermost linear system.. The second is the increasing importance of computations in multi-precision arithmetic on the new generations of high-performance computers (see  and the many references therein), in which the use of varying levels of accuracy is seen as a key ingredient in obtaining state-of-the-art energy-efficient computer architectures. In both cases, using ’inexact’ products (while controlling their inexactness) within Krylov-based iterative methods is natural option. In what follows, we focus on the analysis of this choice from the point of view of ensuring a prescribed decrease in the objective function .
Although using inexact product in Krylov-based iterative methods has already been investigated (see [17, 18, 19, 10] for example), the proposed analysis typically focus on the Euclidean norm of the residual but none of them, to the best of our knowledge, considers its effect on assessing decrease in the objective function of the associated optimization problem. This point of view is however important. In optimization, monitoring the evolution of the objective function or of its model is an obvious concern: ensuring a fraction of the optimal decrease is, for instance, a standard convergence argument in trust-region or inexact Newton methods. In applications arising from elliptic partial differential applications, several authors have argued that monitoring the energy norm of the error leads to better subproblem termination rules, avoiding under- or over-solving (see [1, 3, 5, 2]).
As it turns out, monitoring the residual of the linear system
provides a handle for monitoring the error in the quadratic , provided it is considered in the appropriate norm. Indeed, if is the solution of (1.2) and , then
This approach however requires first that or a sufficiently good approximation thereof is available and, second, that its -norm can be estimated, both of which are non-trivial if the products are inexact.
The contributions of this paper are the derivation of theoretical residual bounds ensuring that the error on remains suitably small in the presence of inexact products (Section 2), the traduction of these results into computable estimates and the definition of the associated algorithms (Section 3), and the demonstration, in Section 4, that very significant efficiency gains can be obtained by the use of these algorithms, both in the case where the accuracy of can be varied continuously and in the case where it is bound to prescribed levels (as is the case in multi-precision arithmetic). We finally provide some conclusions and perspectives in Section 5.
Notations. In the sequel of this paper, denotes the standard Euclidean norm for vectors and its induced (spectral) norm for matrices. In addition, if is symmetric positive definite and is a vector, . The dual norm of with respect to the Euclidean inner product is the norm . Tr is the trace of the matrix and is the -th vector of the canonical basis of .
2 A rigorous error analysis
Standard Krylov-based methods (we will consider here the full orthonormalization (FOM) method  and the conjugate-gradients (CG) algorithm  with and without reorthogonalization, both initialized with the zero vector, i.e. and ) do provide recurrences for the residual , where is the approximate solution at iteration . However, these recurrences rely on the assumption that the products computed in the course of the FOM or CG iterations are exact. In our context, where we aim at using inexact products, we therefore need to bound the residual gap, that is the difference between the residuals within FOM or CG at iteration and the true residual , where is the approximate solution at this iteration. If we manage to do this and, at the same time, make the computed residual small (as is typical in the application of FOM or CG), we therefore obtain the following desirable property.
Lemma 2.1Suppose that, at some iterate of either FOM or CG, one has that for any
Thus the decrease in the quadratic obtained at is at least times the maximum obtainable decrease. This is exactly the type of result we wish if we are to terminate the quadratic minimization, for instance in a trust-region context (see [6, Theorem 6.3.5]). Because we expect FOM or CG to make small eventually, the rest of the section is now devoted to analyzing how to enforce the part of (2.1) related to the residual gap, that is the condition that .
Because this last condition measures the residual gap in the -norm (i.e. a norm in the dual space), it is natural to use the -norm in the primal space and to consider the primal-dual matrix norm defined by
We will thus use this norm to measure the size of the backward error made on the matrix-vector product. Note for future reference that
We first derive a useful observation.
Lemma 2.2Let be a symmetric and positive definite matrix and be any symmetric perturbation. Then, if , the matrix is symmetric positive definite.
Proof. Assume that . From the definition of the norm we deduce that
which shows that . Using
we obtain that is symmetric and positive definite. Sylvester’s inertia theorem then yields that is symmetric and positive definite, which completes the proof.
2.1 Inexact FOM
We first focus on deriving conditions on the FOM algorithm for which the analysis is simpler. A first version of the inexact FOM algorithm (initialized with the zero vector) can be stated as follows.
In this description, we use the notation . It is also convenient to define
holds. Note that, again for , the approximation , the residual and , the estimated value of the quadratic objective at , are not directly recurred by FOM, but can easily be retrieved from
Note that, unless no error is made in the products, might differ from , just as may differ from . Also note that, because of Step 9 of the algorithm, and thus Step 10 branches out of the outer loop as soon as is small enough. When the error matrices are nonzero, it is easy to verify that (2.8) becomes
We now derive conditions on the error matrices which ensure that the relative residual gap (measured in dual norm) remains suitably small.
Lemma 2.3Let and let be a positive vector such that
because of the definition of in Step 9 of FOM. Introducing and , one has that
It then follows that
where is the first vector of the canonical basis of . Thus, using the second part of (2.9),
The Cauchy-Schwarz inequality then implies that
Now the definition of in (2.13) gives that
We now combine all the above results in a single theorem.
2.2 Inexact CG
We now turn to the conjugate gradient algorithm and derive an analog of Theorem 2.4 for this case. We first state the algorithm itself.
Observe that, at variance with FOM, the values of the iteratesand recurred residuals are explicitly available. The value of may again be retrieved from (2.9).
We next restate, for clarity, a simple result relating the residual gap to the error matrices for the conjugate gradient algorithm (see ).
Lemma 2.5The residual gap in the inexact CG satisfies
Proof. We proceed inductively. Observe that and . Suppose now that the result is true for iterations . We then have that
We are now in position the derive suitable bounds on the error matrices.
Lemma 2.6Let and let be a positive vector satisfying (2.12). Suppose furthermore that
As above, we summarize the results for the CG algorithm.
Some comments are in order at this stage.
The appearing in (2.13) and (2.16) may be considered as an “error management strategy”, in that they potentially allow for mitigating the direct effect of the residual norm in the definition of the error bound. A simple choice is to define for all , which obviously satisfies (2.12). We will show below that they can be used to further advantage.
2.3 Bounding the error of the computed quadratic value
Neither the FOM or CG algorithm updates recursively the value of the quadratic objective function at the current iterate. However, tracking this value may be important in practice (as we will see in Section 3), and, if there is no error in the products , can be done easily without computing additional products with . Indeed, since is the minimizer of in the direction , one verifies that
In the presence of errors on , this property may no longer hold, and it is of interest to analyze how much differs from . This is important if the decrease in the quadratic objective function is used for other purposes, as is the case, for instance, in trust-region methods where this decrease is a key ingredient in the management of the trust-region radius (see [6, Chapter 6]).
In order to provide an error estimate, we first prove the following backward error property.
Lemma 2.8Let be a symmetric positive-definite matrix and a vector in . Then, for any ,
such that, by construction and
we successively used (2.21), the Cauchy-Schwarz inequality, Lemma 2.8, the Cauchy-Schwarz inequality again and (2.3). But, using the triangle inequality, Lemma 2.3 or (2.19), (2.3) and (2.15), we obtain that
We summarize the above discussion in the following theorem.
3 Computable bounds and practical inexact algorithms
The first two comments at the end of Section 2.2 might, at first sight, suggest that the above approach has to remain of purely theoretical interest. However, the potential of allowing inexact products revealed by (2.13) and (2.16) prompts us to estimate the unavailable quantities, from which practical algorithms may then be derived. We consider these questions in this section, and demonstrate in the next section that substantial benefits may result.
3.1 Managing the inaccuracy budget
An important ingredient of a computable inexact FOM or CG algorithm is the choice of in (2.13) or (2.16). We note that (2.12) limits the sum of the inverse of over all iteration preceding termination. Of course, choosing is adequate (assuming FOM or CG will not use more than iterations!), but is often suboptimal. Indeed, the expected number of iterations to termination is very commonly, for not too-ill-conditioned or correctly preconditioned problems, much lower than the problem’s dimension. This can be exploited to make the smaller, for instance by choosing equal to this expected iteration number. This last number may be obtained from the maximum of iterations for FOM or CG () typically imposed by the user, or from
where and approximate and , respectively (see [8, Theorem 10.2.6] for example). Moreover, it is possible to adjust the adaptively in the case where the user is able to specify a bound on when it is smaller than . As we will see below, this typically happens when the accuracy of the product cannot be chosen continuously, but is bound to prescribed levels (such as arithmetic precision). Suppose therefore that, besides the product , one knows a value such that . We may then decide to add the (potential) “unused inaccuracy” to the available inaccuracy budget for the remaining iterations, thereby allowing larger subsequent errors. Indeed, knowing , one may first compute as the root of the (linear) equation , that is
and then reset
In practice, this allows maintaining only single running values for and for ranging from to .
We now attempt to estimate unavailable quantities in the theoretical FOM and CG algorithms. We first consider that may not be available from the application context and note that, from (2.7),
so that abound on
can be used provided one knows (an approximation of) the smallest eigenvalue of. We also have to compute for a variety of vectors (the for FOM and the for CG). We can also use the fact that
to derive suitable bounds. However, for ill-conditioned problem, these bounds are likely to be often very loose. Another approach is to choose
as an approximation (this would be the average value for vectors with random independent components). Finding an estimation of is slighlty more difficult, since this is the size of value of the quadratic at the solution, the quantity we seek to compute. As it turns out(2)(2)(2)Despite the weak bound of Theorem 2.9., the best available approximation is the absolute value of current value of the quadratic at the current iterate, and thus we choose
(At ; which occurs at the first iteration of both FOM and CG), we choose the even more approximate value .). Finally, (2.13) requires an approximation of . Unfortunately, this value depends on the matrix at termination of FOM and its estimation from the current turns out to be quite poor. We have therefore chosen the safe option to use
and (2.16) by
(The formulae are similar for in FOM and in CG, with replaced by ).