1 The problem
We are interested in iterative methods for solving convex quadratic optimization problem
(1.1) 
where is an symmetric positive definite matrix and a given vector in
. Such problems are at the centre of efficient methods for a large variety of domains in applied mathematics, the most proeminent being largescale numerical nonlinear optimization and the solution of large symmetric positivedefinite linear systems (often derived from applications in partial differential equations). It is thus critical to make the solution of (
1.1) as efficient as possible, especially when the problem size grows. Since the cost of most iterative methods for solving (1.1) is often dominated by the (potentially many) computations of products of the form for some vector , it is then of interest to investigate if efficiency gains may be obtained for this ’core’ operation. This is the object of this paper.Two different but converging contexts motivate the analysis presented here. The first is the frequent occurence of problems involving a hierachy of model representations themselves resulting in the ability to use approximate versions of to compute the product . This occurs for instance in discretized applications, possibly in a multiresolution framework, or in inexactly weighted linear and nonlinear leastsquares where the product itself is obtained by applying an iterative procedure^{(1)}^{(1)}(1)Our starting point was a nonlinear weighted leastsquares problem occuring in largescale data assimilation for weather forecasting [9], where the inverse of the weighting matrix can not be computed. It use thus requires the iterative solution of an innermost linear system.. The second is the increasing importance of computations in multiprecision arithmetic on the new generations of highperformance computers (see [12] and the many references therein), in which the use of varying levels of accuracy is seen as a key ingredient in obtaining stateoftheart energyefficient computer architectures. In both cases, using ’inexact’ products (while controlling their inexactness) within Krylovbased iterative methods is natural option. In what follows, we focus on the analysis of this choice from the point of view of ensuring a prescribed decrease in the objective function .
Although using inexact product in Krylovbased iterative methods has already been investigated (see [17, 18, 19, 10] for example), the proposed analysis typically focus on the Euclidean norm of the residual but none of them, to the best of our knowledge, considers its effect on assessing decrease in the objective function of the associated optimization problem. This point of view is however important. In optimization, monitoring the evolution of the objective function or of its model is an obvious concern: ensuring a fraction of the optimal decrease is, for instance, a standard convergence argument in trustregion or inexact Newton methods. In applications arising from elliptic partial differential applications, several authors have argued that monitoring the energy norm of the error leads to better subproblem termination rules, avoiding under or oversolving (see [1, 3, 5, 2]).
As it turns out, monitoring the residual of the linear system
(1.2) 
provides a handle for monitoring the error in the quadratic , provided it is considered in the appropriate norm. Indeed, if is the solution of (1.2) and , then
(1.3) 
This approach however requires first that or a sufficiently good approximation thereof is available and, second, that its norm can be estimated, both of which are nontrivial if the products are inexact.
The contributions of this paper are the derivation of theoretical residual bounds ensuring that the error on remains suitably small in the presence of inexact products (Section 2), the traduction of these results into computable estimates and the definition of the associated algorithms (Section 3), and the demonstration, in Section 4, that very significant efficiency gains can be obtained by the use of these algorithms, both in the case where the accuracy of can be varied continuously and in the case where it is bound to prescribed levels (as is the case in multiprecision arithmetic). We finally provide some conclusions and perspectives in Section 5.
Notations. In the sequel of this paper, denotes the standard Euclidean norm for vectors and its induced (spectral) norm for matrices. In addition, if is symmetric positive definite and is a vector, . The dual norm of with respect to the Euclidean inner product is the norm . Tr is the trace of the matrix and is the th vector of the canonical basis of .
2 A rigorous error analysis
Standard Krylovbased methods (we will consider here the full orthonormalization (FOM) method [16] and the conjugategradients (CG) algorithm [11] with and without reorthogonalization, both initialized with the zero vector, i.e. and ) do provide recurrences for the residual , where is the approximate solution at iteration . However, these recurrences rely on the assumption that the products computed in the course of the FOM or CG iterations are exact. In our context, where we aim at using inexact products, we therefore need to bound the residual gap, that is the difference between the residuals within FOM or CG at iteration and the true residual , where is the approximate solution at this iteration. If we manage to do this and, at the same time, make the computed residual small (as is typical in the application of FOM or CG), we therefore obtain the following desirable property.
Lemma 2.1
Suppose that, at some iterate of either FOM or CG, one has that for anyThus the decrease in the quadratic obtained at is at least times the maximum obtainable decrease. This is exactly the type of result we wish if we are to terminate the quadratic minimization, for instance in a trustregion context (see [6, Theorem 6.3.5]). Because we expect FOM or CG to make small eventually, the rest of the section is now devoted to analyzing how to enforce the part of (2.1) related to the residual gap, that is the condition that .
Because this last condition measures the residual gap in the norm (i.e. a norm in the dual space), it is natural to use the norm in the primal space and to consider the primaldual matrix norm defined by
(2.5) 
We will thus use this norm to measure the size of the backward error made on the matrixvector product. Note for future reference that
(2.6) 
We first derive a useful observation.
Lemma 2.2
Let be a symmetric and positive definite matrix and be any symmetric perturbation. Then, if , the matrix is symmetric positive definite.
Proof. Assume that . From the definition of the norm we deduce that
(2.7) which shows that . Using
we obtain that is symmetric and positive definite. Sylvester’s inertia theorem then yields that is symmetric and positive definite, which completes the proof.
2.1 Inexact FOM
We first focus on deriving conditions on the FOM algorithm for which the analysis is simpler. A first version of the inexact FOM algorithm (initialized with the zero vector) can be stated as follows.
Theoretical inexact FOM algorithm 1. Set , and , 2. For k=1, 2, …, do 3. 4. For do 5. 6. 7. EndFor 8. 9. 10. if is small enough then go to 13 11. 12. EndFor 13.
In this description, we use the notation . It is also convenient to define
It is wellknown [16, 13] that both and are upper Hessenberg matrices and that, if for all , the Arnoldi relation
(2.8) 
holds. Note that, again for , the approximation , the residual and , the estimated value of the quadratic objective at , are not directly recurred by FOM, but can easily be retrieved from
(2.9) 
Note that, unless no error is made in the products, might differ from , just as may differ from . Also note that, because of Step 9 of the algorithm, and thus Step 10 branches out of the outer loop as soon as is small enough. When the error matrices are nonzero, it is easy to verify that (2.8) becomes
(2.10) 
where
(2.11) 
We now derive conditions on the error matrices which ensure that the relative residual gap (measured in dual norm) remains suitably small.
Lemma 2.3
Let and let be a positive vector such that
Proof. Using (2.11), (2.10), (2.9) and the triangle inequality, we find that
(2.14) Furthermore,
because of the definition of in Step 9 of FOM. Introducing and , one has that
It then follows that
where is the first vector of the canonical basis of . Thus, using the second part of (2.9),
The CauchySchwarz inequality then implies that
because the columns of are orthonormal. Thus we deduce from (2.14) and (2.13) that
Now the definition of in (2.13) gives that
as requested.
We now combine all the above results in a single theorem.
2.2 Inexact CG
We now turn to the conjugate gradient algorithm and derive an analog of Theorem 2.4 for this case. We first state the algorithm itself.
Theoretical inexact CG algorithm 1. Set , , and 2. For k=0, 1, …, do 3. 4. 5. 6. 7. if is small enough then stop 8. 9. 10. EndFor
Observe that, at variance with FOM, the values of the iterates
and recurred residuals are explicitly available. The value of may again be retrieved from (2.9).We next restate, for clarity, a simple result relating the residual gap to the error matrices for the conjugate gradient algorithm (see [18]).
Lemma 2.5
The residual gap in the inexact CG satisfies
Proof. We proceed inductively. Observe that and . Suppose now that the result is true for iterations . We then have that
We are now in position the derive suitable bounds on the error matrices.
Lemma 2.6
Let and let be a positive vector satisfying (2.12). Suppose furthermore thatAs above, we summarize the results for the CG algorithm.
Some comments are in order at this stage.

The appearing in (2.13) and (2.16) may be considered as an “error management strategy”, in that they potentially allow for mitigating the direct effect of the residual norm in the definition of the error bound. A simple choice is to define for all , which obviously satisfies (2.12). We will show below that they can be used to further advantage.
2.3 Bounding the error of the computed quadratic value
Neither the FOM or CG algorithm updates recursively the value of the quadratic objective function at the current iterate. However, tracking this value may be important in practice (as we will see in Section 3), and, if there is no error in the products , can be done easily without computing additional products with . Indeed, since is the minimizer of in the direction , one verifies that
In the presence of errors on , this property may no longer hold, and it is of interest to analyze how much differs from . This is important if the decrease in the quadratic objective function is used for other purposes, as is the case, for instance, in trustregion methods where this decrease is a key ingredient in the management of the trustregion radius (see [6, Chapter 6]).
In order to provide an error estimate, we first prove the following backward error property.
Lemma 2.8
Let be a symmetric positivedefinite matrix and a vector in . Then, for any ,Consider the result of applying the FOM or CG methods with inexact products, such that (2.15) and and either (2.13) or (2.16) holds. We deduce from the preceding lemma that there exists a quadratic
(2.21) 
such that, by construction and
(2.22) 
we successively used (2.21), the CauchySchwarz inequality, Lemma 2.8, the CauchySchwarz inequality again and (2.3). But, using the triangle inequality, Lemma 2.3 or (2.19), (2.3) and (2.15), we obtain that
and hence, using (2.15) and (2.3) again,
We summarize the above discussion in the following theorem.
3 Computable bounds and practical inexact algorithms
The first two comments at the end of Section 2.2 might, at first sight, suggest that the above approach has to remain of purely theoretical interest. However, the potential of allowing inexact products revealed by (2.13) and (2.16) prompts us to estimate the unavailable quantities, from which practical algorithms may then be derived. We consider these questions in this section, and demonstrate in the next section that substantial benefits may result.
3.1 Managing the inaccuracy budget
An important ingredient of a computable inexact FOM or CG algorithm is the choice of in (2.13) or (2.16). We note that (2.12) limits the sum of the inverse of over all iteration preceding termination. Of course, choosing is adequate (assuming FOM or CG will not use more than iterations!), but is often suboptimal. Indeed, the expected number of iterations to termination is very commonly, for not tooillconditioned or correctly preconditioned problems, much lower than the problem’s dimension. This can be exploited to make the smaller, for instance by choosing equal to this expected iteration number. This last number may be obtained from the maximum of iterations for FOM or CG () typically imposed by the user, or from
(3.1) 
where and approximate and , respectively (see [8, Theorem 10.2.6] for example). Moreover, it is possible to adjust the adaptively in the case where the user is able to specify a bound on when it is smaller than . As we will see below, this typically happens when the accuracy of the product cannot be chosen continuously, but is bound to prescribed levels (such as arithmetic precision). Suppose therefore that, besides the product , one knows a value such that . We may then decide to add the (potential) “unused inaccuracy” to the available inaccuracy budget for the remaining iterations, thereby allowing larger subsequent errors. Indeed, knowing , one may first compute as the root of the (linear) equation , that is
(3.2) 
and then reset
In practice, this allows maintaining only single running values for and for ranging from to .
3.2 Estimations
We now attempt to estimate unavailable quantities in the theoretical FOM and CG algorithms. We first consider that may not be available from the application context and note that, from (2.7),
(3.3) 
so that abound on
can be used provided one knows (an approximation of) the smallest eigenvalue of
. We also have to compute for a variety of vectors (the for FOM and the for CG). We can also use the fact thatto derive suitable bounds. However, for illconditioned problem, these bounds are likely to be often very loose. Another approach is to choose
as an approximation (this would be the average value for vectors with random independent components). Finding an estimation of is slighlty more difficult, since this is the size of value of the quadratic at the solution, the quantity we seek to compute. As it turns out^{(2)}^{(2)}(2)Despite the weak bound of Theorem 2.9., the best available approximation is the absolute value of current value of the quadratic at the current iterate, and thus we choose
(3.4) 
(At ; which occurs at the first iteration of both FOM and CG), we choose the even more approximate value .). Finally, (2.13) requires an approximation of . Unfortunately, this value depends on the matrix at termination of FOM and its estimation from the current turns out to be quite poor. We have therefore chosen the safe option to use
(3.5) 
as done in [17]. Combining the above approximations, we suggest to approximate (2.13) by the condition
(3.6) 
and (2.16) by
(3.7) 
(The formulae are similar for in FOM and in CG, with replaced by ).
Comments
There are no comments yet.