 # Minimizing convex quadratic with variable precision Krylov methods

Iterative algorithms for the solution of convex quadratic optimization problems are investigated, which exploit inaccurate matrix-vector products. Theoretical bounds on the performance of a Conjugate Gradients and a Full-Orthormalization methods are derived, the necessary quantities occurring in the theoretical bounds estimated and new practical algorithms derived. Numerical experiments suggest that the new methods have significant potential, including in the steadily more important context of multi-precision computations.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 The problem

We are interested in iterative methods for solving convex quadratic optimization problem

 minx∈\footnotesize IRnq(x)def=12xTAx−bTx (1.1)

where is an symmetric positive definite matrix and a given vector in

. Such problems are at the centre of efficient methods for a large variety of domains in applied mathematics, the most proeminent being large-scale numerical nonlinear optimization and the solution of large symmetric positive-definite linear systems (often derived from applications in partial differential equations). It is thus critical to make the solution of (

1.1) as efficient as possible, especially when the problem size grows. Since the cost of most iterative methods for solving (1.1) is often dominated by the (potentially many) computations of products of the form for some vector , it is then of interest to investigate if efficiency gains may be obtained for this ’core’ operation. This is the object of this paper.

Two different but converging contexts motivate the analysis presented here. The first is the frequent occurence of problems involving a hierachy of model representations themselves resulting in the ability to use approximate versions of to compute the product . This occurs for instance in discretized applications, possibly in a multi-resolution framework, or in inexactly weighted linear and nonlinear least-squares where the product itself is obtained by applying an iterative procedure(1)(1)(1)Our starting point was a nonlinear weighted least-squares problem occuring in large-scale data assimilation for weather forecasting , where the inverse of the weighting matrix can not be computed. It use thus requires the iterative solution of an innermost linear system.. The second is the increasing importance of computations in multi-precision arithmetic on the new generations of high-performance computers (see  and the many references therein), in which the use of varying levels of accuracy is seen as a key ingredient in obtaining state-of-the-art energy-efficient computer architectures. In both cases, using ’inexact’ products (while controlling their inexactness) within Krylov-based iterative methods is natural option. In what follows, we focus on the analysis of this choice from the point of view of ensuring a prescribed decrease in the objective function .

Although using inexact product in Krylov-based iterative methods has already been investigated (see [17, 18, 19, 10] for example), the proposed analysis typically focus on the Euclidean norm of the residual but none of them, to the best of our knowledge, considers its effect on assessing decrease in the objective function of the associated optimization problem. This point of view is however important. In optimization, monitoring the evolution of the objective function or of its model is an obvious concern: ensuring a fraction of the optimal decrease is, for instance, a standard convergence argument in trust-region or inexact Newton methods. In applications arising from elliptic partial differential applications, several authors have argued that monitoring the energy norm of the error leads to better subproblem termination rules, avoiding under- or over-solving (see [1, 3, 5, 2]).

As it turns out, monitoring the residual of the linear system

 Ax=b (1.2)

provides a handle for monitoring the error in the quadratic , provided it is considered in the appropriate norm. Indeed, if is the solution of (1.2) and , then

 12∥r(x)∥2A−1=12(Ax−b)TA−1(Ax−b)=12(x−x∗)TA(x−x∗)=12[xTAx−2xTAx∗+xT∗Ax∗]=q(x)−q(x∗). (1.3)

This approach however requires first that or a sufficiently good approximation thereof is available and, second, that its -norm can be estimated, both of which are non-trivial if the products are inexact.

The contributions of this paper are the derivation of theoretical residual bounds ensuring that the error on remains suitably small in the presence of inexact products (Section 2), the traduction of these results into computable estimates and the definition of the associated algorithms (Section 3), and the demonstration, in Section 4, that very significant efficiency gains can be obtained by the use of these algorithms, both in the case where the accuracy of can be varied continuously and in the case where it is bound to prescribed levels (as is the case in multi-precision arithmetic). We finally provide some conclusions and perspectives in Section 5.

Notations. In the sequel of this paper, denotes the standard Euclidean norm for vectors and its induced (spectral) norm for matrices. In addition, if is symmetric positive definite and is a vector, . The dual norm of with respect to the Euclidean inner product is the norm . Tr is the trace of the matrix and is the -th vector of the canonical basis of .

## 2 A rigorous error analysis

Standard Krylov-based methods (we will consider here the full orthonormalization (FOM) method  and the conjugate-gradients (CG) algorithm  with and without reorthogonalization, both initialized with the zero vector, i.e.  and ) do provide recurrences for the residual , where is the approximate solution at iteration . However, these recurrences rely on the assumption that the products computed in the course of the FOM or CG iterations are exact. In our context, where we aim at using inexact products, we therefore need to bound the residual gap, that is the difference between the residuals within FOM or CG at iteration and the true residual , where is the approximate solution at this iteration. If we manage to do this and, at the same time, make the computed residual small (as is typical in the application of FOM or CG), we therefore obtain the following desirable property.

###### Lemma 2.1
Suppose that, at some iterate of either FOM or CG, one has that for any
(2.1)
Then
(2.2)

• Proof.    First evaluating the quadratic at gives a very useful identity, namely that

 |q(x∗)|=−q(x∗)=12∥b∥2A−1=12∥x∗∥2A=12|bTx∗|. (2.3)

Using this identity, (1.3), the triangle inequality and (2.1) we deduce that

 |q(xk)−q(x∗)|=12∥r(xk)∥2A−1≤12(∥r(xk)−rk∥A−1+∥rk∥A−1)2≤12(√ϵ∥b∥A−1)2=12ϵ∥b∥2A−1=ϵ|q(x∗)|. (2.4)

Thus the decrease in the quadratic obtained at is at least times the maximum obtainable decrease. This is exactly the type of result we wish if we are to terminate the quadratic minimization, for instance in a trust-region context (see [6, Theorem 6.3.5]). Because we expect FOM or CG to make small eventually, the rest of the section is now devoted to analyzing how to enforce the part of (2.1) related to the residual gap, that is the condition that .

Because this last condition measures the residual gap in the -norm (i.e. a norm in the dual space), it is natural to use the -norm in the primal space and to consider the primal-dual matrix norm defined by

We will thus use this norm to measure the size of the backward error made on the matrix-vector product. Note for future reference that

 ∥A∥A−1,A=1. (2.6)

We first derive a useful observation.

###### Lemma 2.2
Let be a symmetric and positive definite matrix and be any symmetric perturbation. Then, if , the matrix is symmetric positive definite.

• Proof.    Assume that . From the definition of the norm we deduce that

 ∥E∥2A−1,A=supx≠0xTETA−1ExxTAx=supu≠0uTA−1/2ETA−1EA−1/2uuTu<1, (2.7)

which shows that . Using

 λmin(I+A−1/2EA−1/2)≥λmin(I)−|λmax(A−1/2EA−1/2)|>0,

we obtain that is symmetric and positive definite. Sylvester’s inertia theorem then yields that is symmetric and positive definite, which completes the proof.

### 2.1 Inexact FOM

We first focus on deriving conditions on the FOM algorithm for which the analysis is simpler. A first version of the inexact FOM algorithm (initialized with the zero vector) can be stated as follows.

Theoretical inexact FOM algorithm 1. Set , and , 2. For k=1, 2, …, do 3.    4.   For do 5.      6.      7.   EndFor 8.    9.    10.   if is small enough then go to 13 11.    12. EndFor 13.

In this description, we use the notation . It is also convenient to define

 ~Hk=[Hkhk+1,keTk] and Vk=[v1,…,vk].

It is well-known [16, 13] that both and are upper Hessenberg matrices and that, if for all , the Arnoldi relation

 AVkyk=Vk+1~Hkyk (2.8)

holds. Note that, again for , the approximation , the residual and , the estimated value of the quadratic objective at , are not directly recurred by FOM, but can easily be retrieved from

 xk=Vkyk,rk=Vk+1~Hkyk−b=Axk−b=r(xk) and qk=−12bTxk=q(xk). (2.9)

Note that, unless no error is made in the products, might differ from , just as may differ from . Also note that, because of Step 9 of the algorithm, and thus Step 10 branches out of the outer loop as soon as is small enough. When the error matrices are nonzero, it is easy to verify that (2.8) becomes

 AVkyk+GkVkyk=Vk+1~Hkyk (2.10)

where

 Gk=(E1v1,…,Ekvk)VTk=k∑j=1EjvjvTj. (2.11)

We now derive conditions on the error matrices which ensure that the relative residual gap (measured in dual norm) remains suitably small.

###### Lemma 2.3
Let and let be a positive vector such that
(2.12)
Suppose furthermore that
(2.13)
Then

• Proof.    Using (2.11), (2.10), (2.9) and the triangle inequality, we find that

 ∥r(xk)−rk∥A−1=∥GkVkyk∥A−1=∥k∑j=1EjvjeTjyk∥A−1≤k∑j=1∥Ej∥A−1,A∥vj∥A|eTjyk|. (2.14)

Furthermore,

 eTjyk=eTjH−1k(βe1)

because of the definition of in Step 9 of FOM. Introducing and , one has that

 eTjH−1k~H0j−1y0j−1=eTjH−1kHky0j−1=0.

It then follows that

 eTjyk=eTjH−1k(βe1−~H0j−1y0j−1)=eTjH−1k[β~e1−~Hj−1yj−10]

where is the first vector of the canonical basis of . Thus, using the second part of (2.9),

 eTjyk = eTjH−1k[VTj(βVj~e1−Vj~Hj−1yj−1)0] = eTjH−1k[VTj(b−Vj~Hj−1yj−1)0] = −eTjH−1k[VTjrj−10].

The Cauchy-Schwarz inequality then implies that

 |eTjyk| ≤∥ej∥2∥H−1k∥2∥VTjrj−1∥2≤∥H−1k∥2∥rj−1∥2

because the columns of are orthonormal. Thus we deduce from (2.14) and (2.13) that

 ∥r(xk)k−rk∥A−1∥b∥A−1≤∥H−1k∥2∑kj=1ωj∥vj∥A∥rj−1∥2∥b∥A−1.

Now the definition of in (2.13) gives that

 ∥r(xk)−rk∥A−1∥b∥A−1≤ϵπk∑j=11ϕj≤ϵπ,

as requested.

We now combine all the above results in a single theorem.

###### Theorem 2.4
Let and suppose that, at iteration of the FOM algorithm,
(2.15)
and the product error matrices satisfy (2.13) with for some positive vector satisfying (2.12). Then (2.2) holds.

• Proof.    Directly results from Lemmas 2.1 and 2.3.

### 2.2 Inexact CG

We now turn to the conjugate gradient algorithm and derive an analog of Theorem 2.4 for this case. We first state the algorithm itself.

Theoretical inexact CG algorithm 1. Set , , and 2. For k=0, 1, …, do 3.    4.    5.    6.    7.   if is small enough then stop 8.    9.    10. EndFor

Observe that, at variance with FOM, the values of the iterates

and recurred residuals are explicitly available. The value of may again be retrieved from (2.9).

We next restate, for clarity, a simple result relating the residual gap to the error matrices for the conjugate gradient algorithm (see ).

###### Lemma 2.5
The residual gap in the inexact CG satisfies

• Proof.    We proceed inductively. Observe that and . Suppose now that the result is true for iterations . We then have that

 r(xk+1)−rk+1 = (Axk+αkApk−b)−rk+rk−rk+1 = r(xk)+αkApk−rk−αk(A+Ek)pk = r(xk)−rk−αkEkpk.

We are now in position the derive suitable bounds on the error matrices.

###### Lemma 2.6
Let and let be a positive vector satisfying (2.12). Suppose furthermore that
(2.16)
Then

• Proof.    First note that (2.16) ensures that . Lemma 2.5, the triangle inequality and (2.5) imply that

 ∥r(xk)−rk∥A−1≤k−1∑j=0∥αjEjpj∥A−1≤k−1∑j=0|αj|∥Ej∥A−1,A∥pj∥A≤k−1∑i=0|αj|ωCGj∥pj∥A. (2.17)

Now, using (2.16),

 αj=∥rj∥22pTj(A+Ej)pj≤∥rj∥22pTjApj−ωCGj∥pj∥2A=∥rj∥22(1−ωCGj)∥pj∥2A.

Substituting this bound in (2.17) and using (2.16) again, we obtain that

 ∥r(xk)−rk∥A−1≤k−1∑j=0ωCGj1−ωCGj∥rj∥22∥pj∥A. (2.18)

But the definition of in (2.16) gives that

 ωCGj1−ωCGj=(ϵπ∥b∥A−1∥pj∥A)(ϕj+1∥rj∥22+ϵπ∥b∥A−1∥pj∥A)(ϕj+1∥rj∥22+ϵπ∥b∥A−1∥pj∥A)ϕj+1∥rj∥22=ϵπ∥b∥A−1∥pj∥Aϕj+1∥rj∥22,

so that (2.18) becomes

 ∥r(xk)−rk∥A−1≤k−1∑j=0ϵπ∥b∥A−1ϕj+1≤ϵπ∥b∥A−1. (2.19)

As above, we summarize the results for the CG algorithm.

###### Theorem 2.7
Let and suppose that (2.15) holds at iteration of the CG algorithm, and that the product error matrices satisfy (2.16) with for some positive vector satisfying (2.12). Then (2.2) holds.

Some comments are in order at this stage.

1. Both (2.13) and (2.16) assume the primal-dual norm is the natural norm for measuring the size of the error matrices . While this may be true in certain applications, it makes these requirements difficult to verify as such in other cases.

2. Even discounting that potential difficulty, (2.13) and (2.16) remain unfortunately impractical, since they involve quantites, such as , or , which cannot be computed readily in the course of the FOM or CG algorithm. Moreover, in (2.15) is also unavailable in practice.

3. Observe that (2.13) allows a growth of the error in while (2.16) allows a growth of the order of instead.

4. In (2.13), the minimum with one is taken to ensure that the relative error remains meaningful (while at the same time allowing to apply Lemma 2.2). This guarantee is automatic in (2.16).

5. The appearing in (2.13) and (2.16) may be considered as an “error management strategy”, in that they potentially allow for mitigating the direct effect of the residual norm in the definition of the error bound. A simple choice is to define for all , which obviously satisfies (2.12). We will show below that they can be used to further advantage.

### 2.3 Bounding the error of the computed quadratic value

Neither the FOM or CG algorithm updates recursively the value of the quadratic objective function at the current iterate. However, tracking this value may be important in practice (as we will see in Section 3), and, if there is no error in the products , can be done easily without computing additional products with . Indeed, since is the minimizer of in the direction , one verifies that

 q(xk)=12xTkAxk−bTxk=−12bTxk.

In the presence of errors on , this property may no longer hold, and it is of interest to analyze how much differs from . This is important if the decrease in the quadratic objective function is used for other purposes, as is the case, for instance, in trust-region methods where this decrease is a key ingredient in the management of the trust-region radius (see [6, Chapter 6]).

In order to provide an error estimate, we first prove the following backward error property.

###### Lemma 2.8
Let be a symmetric positive-definite matrix and a vector in . Then, for any ,
(2.20)

• Proof.    Using (2.5) and (2.6), we rewrite the optimization problem as

 minE{∥A−1/2EA−1/2∥2∣xTEx=xT(b−Ax)}

which, since is positive-definite and setting , is itself equivalent to

 minE{∥A−1/2EA−1/2∥2∣yTA−1/2EA−1/2y=−xTr(x)}.

But, for any symmetric matrix and scalar ,

 minM{∥M∥2∣yTMy=γ}=γ

is attained for . Thus

 E=−(xTr(x))AxxTA∥x∥4A and ∥E∥A−1,A=∥A−1/2EA−1/2∥2=|xTr(x)|∥x∥2A.

Consider the result of applying the FOM or CG methods with inexact products, such that (2.15) and and either (2.13) or (2.16) holds. We deduce from the preceding lemma that there exists a quadratic

 ^q(x)=12xT(A+E)x−bTx (2.21)

such that, by construction and

 |q(x)−q|=|q(x)−^q(x)|=12|xTEx|≤12∥E∥A−1,A∥x∥2A=12|xTr(x)|≤12∥r(x)∥A−1∥x∗∥A∥x∗∥A∥x∥A, (2.22)

we successively used (2.21), the Cauchy-Schwarz inequality, Lemma 2.8, the Cauchy-Schwarz inequality again and (2.3). But, using the triangle inequality, Lemma 2.3 or (2.19), (2.3) and (2.15), we obtain that

 ∥x∥A≤∥x∗∥A+∥x−x∗∥A=∥x∗∥A+∥r(x)∥A−1≤∥x∗∥A+∥r(x)−r∥A−1+∥r∥A−1≤(1+2ϵπ)∥x∗∥A

and hence, using (2.15) and (2.3) again,

 |q(x)−q|≤12∥r(x)∥A−1∥x∥A∥x∥2A≤2ϵπ(1+2ϵπ)|q(x∗)|.

We summarize the above discussion in the following theorem.

###### Theorem 2.9
Let be the result of applying the FOM or CG algorithm with inexact products and suppose that (2.15) and either (2.13) or (2.16) holds with . Then
(2.23)

Remembering that , we see that this bound is considerably weaker than (2.2) but is likely to be pessimistic, as we have not taken into account in (2.22) the fact that the angle between and is expected to be small. This will be numerically confirmed in Section 4.

## 3 Computable bounds and practical inexact algorithms

The first two comments at the end of Section 2.2 might, at first sight, suggest that the above approach has to remain of purely theoretical interest. However, the potential of allowing inexact products revealed by (2.13) and (2.16) prompts us to estimate the unavailable quantities, from which practical algorithms may then be derived. We consider these questions in this section, and demonstrate in the next section that substantial benefits may result.

### 3.1 Managing the inaccuracy budget

An important ingredient of a computable inexact FOM or CG algorithm is the choice of in (2.13) or (2.16). We note that (2.12) limits the sum of the inverse of over all iteration preceding termination. Of course, choosing is adequate (assuming FOM or CG will not use more than iterations!), but is often suboptimal. Indeed, the expected number of iterations to termination is very commonly, for not too-ill-conditioned or correctly preconditioned problems, much lower than the problem’s dimension. This can be exploited to make the smaller, for instance by choosing equal to this expected iteration number. This last number may be obtained from the maximum of iterations for FOM or CG () typically imposed by the user, or from

 kspectralmax=log(ϵ)log(ρ) where ρdef=√λmax/λmin−1√λmax/λmin+1 (3.1)

where and approximate and , respectively (see [8, Theorem 10.2.6] for example). Moreover, it is possible to adjust the adaptively in the case where the user is able to specify a bound on when it is smaller than . As we will see below, this typically happens when the accuracy of the product cannot be chosen continuously, but is bound to prescribed levels (such as arithmetic precision). Suppose therefore that, besides the product , one knows a value such that . We may then decide to add the (potential) “unused inaccuracy” to the available inaccuracy budget for the remaining iterations, thereby allowing larger subsequent errors. Indeed, knowing , one may first compute as the root of the (linear) equation , that is

 ^ϕj=ϵπ∥b∥A−1^ωj∥vj∥A∥H−1k∥2∥rj−1∥2 for FOM, and ^ϕj=(1−^ωj)ϵπ∥b∥A−1∥pj∥A^ωj∥rj∥22 for CG, (3.2)

and then reset

 ϕi=kmax−j1−∑jp=1^ϕ−1p(i=j+1,…,kmax).

In practice, this allows maintaining only single running values for and for ranging from to .

### 3.2 Estimations

We now attempt to estimate unavailable quantities in the theoretical FOM and CG algorithms. We first consider that may not be available from the application context and note that, from (2.7),

 ∥E∥A−1,A=∥A−1/2EA−1/2∥2≤λmin(A)−1∥E∥2, (3.3)

so that abound on

can be used provided one knows (an approximation of) the smallest eigenvalue of

. We also have to compute for a variety of vectors (the for FOM and the for CG). We can also use the fact that

 λmin(A)1/2∥u∥2≤∥u∥A≤∥A∥1/22∥u∥2

to derive suitable bounds. However, for ill-conditioned problem, these bounds are likely to be often very loose. Another approach is to choose

 ∥u∥A≈√1nTr(A)∥u∥2

as an approximation (this would be the average value for vectors with random independent components). Finding an estimation of is slighlty more difficult, since this is the size of value of the quadratic at the solution, the quantity we seek to compute. As it turns out(2)(2)(2)Despite the weak bound of Theorem 2.9., the best available approximation is the absolute value of current value of the quadratic at the current iterate, and thus we choose

 ∥b∥A−1≈√2|q(xk)|≈√2|qk|=√|bTxk|. (3.4)

(At ; which occurs at the first iteration of both FOM and CG), we choose the even more approximate value .). Finally, (2.13) requires an approximation of . Unfortunately, this value depends on the matrix at termination of FOM and its estimation from the current turns out to be quite poor. We have therefore chosen the safe option to use

 ∥H−1k∥2=1σmin(Hk)≈1λmin(A) (3.5)

as done in . Combining the above approximations, we suggest to approximate (2.13) by the condition

 ∥Ej∥2λmin(A)≤min[1,ϵπ√n√2|qj|λmin(A)ϕj∥rj−1∥2√Tr(A)∥vj∥2] for all j∈{2,…,k}, (3.6)

and (2.16) by

 ∥Ej∥2λmin(A)≤ϵπ√2|qj|√Tr(A)∥pj∥2√nϕj+1∥rj∥22+ϵπ√2|qj|√Tr(A)∥pj∥2, for% all j∈{1,…,k−1}. (3.7)

(The formulae are similar for in FOM and in CG, with replaced by ).

It also remains to replace (2.15) by a computable estimate. We have chosen to follow ideas in  ignoring pathological convergence instances and to estimate

 12∥rk∥2A−1=q(xk)−q(x∗)≈q(xk−d)−q(x