 # Alternating Energy Minimization Methods for Multi-term Matrix Equations

We develop computational methods for approximating the solution of a linear multi-term matrix equation in low rank. We follow an alternating minimization framework, where the solution is represented as a product of two matrices, and approximations to each matrix are sought by solving certain minimization problems repeatedly. The solution methods we present are based on a rank-adaptive variant of alternating energy minimization methods that builds an approximation iteratively by successively computing a rank-one solution component at each step. We also develop efficient procedures to improve the accuracy of the low-rank approximate solutions computed using these successive rank-one update techniques. We explore the use of the methods with linear multi-term matrix equations that arise from stochastic Galerkin finite element discretizations of parameterized linear elliptic PDEs, and demonstrate their effectiveness with numerical studies.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We are interested in computing a low-rank approximate solution of a Kronecker-product structured linear system ,

 (m∑i=0Gi⊗Ki)u=r∑i=0gi⊗fi, (1)

where is symmetric positive definite, is the Kronecker product, , , , and . Systems with such structure arise in the discretization of linear elliptic PDEs in high dimensions [ballani2013projection, kressner2015truncated, kressner2010krylov, kressner2011low] and stochastic Galerkin finite element discretization of parameterized linear elliptic PDEs [ghanem2003stochastic, lee2018stochastic, lord2014introduction, xiu2010numerical]

. The solution vector

consists of subvectors of dimension , i.e., , where . It also has an alternative representation in matrix format, , for which the system equivalent to (1) is the linear multi-term matrix equation [powell2017efficient]

 m∑i=0KiUGTi=B, (2)

where and it is assumed that . The system matrices and obtained from discretization methods are typically sparse and, thus, for moderately large system matrices, Krylov subspace methods [pellissetti2000iterative, powell2009block] and multigrid methods [corveleyn2013iterative, elman2007solving, le2003multigrid] have been natural choices to solve such systems.

The dimensions of the system matrices grow rapidly, however, if a solution is sought on a refined grid or (in the case of stochastic Galerkin methods) if the so-called parametric space is high-dimensional. For large and , direct applications of standard iterative methods may be computationally prohibitive and storing or explicitly forming the matrix may be prohibitive in terms of memory. Instead of computing an exact solution of (2), we are interested in inexpensive computation of an approximate solution of low rank. To achieve this goal, we begin by introducing a factored representation of ,

 U=VWT,

where, if is of full rank , and . Our aim is to find a low-rank approximation to this factored matrix of the form

 Up=VpWTp∈Rn1×n2, (3)

where and and , and we want to derive solution algorithms for computing that operate only on the factors and without explicitly forming .

One such solution algorithm has been developed for matrix completion/sensing [hardt2014understanding, jain2013low], which, at the th iteration, computes and by alternately solving certain minimization problems. Although the algorithm computes highly accurate approximations, it can become very expensive as increases. Another approach is to use successive rank-one approximations and successively compute pairs of vectors , to build the factors and of (3) until a stopping criterion is satisfied. The th iteration starts with and and constructs and as the solutions of certain minimization problems. This approach for solving parameterized PDEs is one component of a methodology known as Proper Generalized Decomposition (PGD) [nouy2007generalized, nouy2010proper, tamellini2014model]. As observed in those works, using only successive rank-one approximations is less expensive but may not be practical because it typically results in approximate solutions with an unnecessarily large value of for satisfying a certain error tolerance.

Our goal in this study is to develop solution algorithms that preserve only the good properties of the above two types of solution strategies, i.e., algorithms that compute an accurate solution in a computationally efficient way. In developing such algorithms, we take our cue from PGD methods, in which, to improve accuracy, the successive rank-one constructions are supplemented with an updating procedure that is performed intermittently during the iteration. Inspired by this approach, we propose a solution algorithm that adaptively computes approximate solutions in an inexpensive way via the successive rank-one approximation method. This is then supplemented by an enhancement procedure, which effectively improves the accuracy of the resulting approximate solutions. We propose two novel enhancement procedures developed by modifying some ideas used for matrix completion problems [jain2013low].

Some other rank-adaptive approaches for approximating solutions of parameterized or high-dimensional PDEs in low-rank format are as follows. A method in [dolgov2014alternating]

uses alternating energy minimization techniques in combination with tensor-train decompositions

[oseledets2011tensor]. One can incrementally compute rank-one solution pairs by solving a residual minimization problem, an approach known as alternating least-squares (ALS) methods, which has been used to compute low-rank approximate solutions of parameterized PDEs in [doostan2009least, doostan2013non], and to solve matrix recovery problems, matrix sensing and completion problems, [haldar2009rank, hardt2014understanding, jain2013low, recht2010guaranteed]. In [powell2017efficient], an adaptive iterative procedure to solve the matrix equation (2) is given, which incrementally computes a set of orthonormal basis vectors for use in representing the spatial part of the solution, . See [simoncini2016computational] for an overview of other computational approaches for solving linear matrix equations.

An outline of the paper is as follows. In Section 2, we introduce and derive alternating energy minimization (AEM) methods using the well-known general projection framework and discuss a collection of methods developed for constructing low-rank approximate solutions of the form (3). In Section 3, we discuss enhancement procedures and derive two new approaches for performing such updates. In Section LABEL:sec:numexp, we measure the effectiveness and the efficiency of the variants of the methods with numerical experiments. Finally, in Section LABEL:sec:conclusions, we draw some conclusions.

## 2 Alternating energy minimization (AEM) methods

In this section, we derive AEM methods for solving the matrix equation (2) from the optimal projection framework, and review two variants of such methods. We first introduce some notation. Capital and small letters are used to denote matrices and vectors, respectively. As a special case, a zero-column matrix is indicated by using a subscript 0, e.g., . An inner product between two matrices is defined as , where tr is the trace operator, and if . The norm induced by is the Frobenius norm . For shorthand notation, we introduce a linear operator for . Using this, we can define the weighted inner product and the induced -norm . Finally, vec denotes a vectorization operator, , where and , where , for .

### 2.1 General projection framework

For the computation of and in (3), we rely on the classical theory of orthogonal (Galerkin) projection methods [saad2003iterative, Proposition 5.2]. Let be a search space in which an approximate solution is sought, and let be a constraint space onto which the residual is projected. Following [saad2003iterative, Proposition 5.2], if the system matrix is symmetric positive definite and , then a matrix is the result of an orthogonal projection onto if and only if it minimizes the -norm of the error over , i.e.,

 U∗p=argminUp∈KJA(Up),

where the objective function is

 JA(Up)=12∥U−Up∥2A. (4)

Because we seek a factored representation of , we slightly modify (4) to give

 JA(Vp,Wp)=12∥U−VpWTp∥2A, (5)

and obtain a new minimization problem

 minVp∈Rn1×p,Wp∈Rn2×pJA(Vp,Wp). (6)

Since is quadratic, gradients with respect to and can be easily obtained as

 ∇VpJA =(A(VpWTp)−B)Wp=m∑i=0(KiVpWTpGTi)Wp−BWp, (7) ∇WpJA =(A(VpWTp)−B)TVp=m∑i=0(KiVpWTpGTi)TVp−BTVp. (8)

Employing the first-order optimality condition on (7)–(8) (i.e., setting (7) and (the transpose of) (8) to be zero) results in the set of equations

 m∑i=0(KiVpWTpGTi)Wp =BWp∈Rn1×p, (9) m∑i=0VTp(KiVpWTpGTi) =VTpB∈Rp×n2. (10)

These equations can be interpreted as projections of the residual onto the spaces spanned by the columns of and , respectively.

Given (9)–(10), a widely used strategy for solving the minimization problem (6) is to compute each component of the solution pair alternately [dolgov2014alternating, doostan2009least, doostan2013non, haldar2009rank, hardt2014understanding, jain2013low]. That is, one can fix and solve the system of equations of order in (9) for , and then one can fix and solve the system of equations of order in (10) for . However, in this approach, suitable choices of for satisfying a fixed error tolerance are typically not known a priori. Thus, adaptive schemes that incrementally compute solution pairs (,) have been introduced [jain2013low, nouy2007generalized, nouy2010proper, tamellini2014model]. All of these schemes are based on alternately solving two systems of equations for two types of variables in an effort to minimize a certain error measure. In this study, we employ alternating methods for minimizing the energy norm of the error (6) and, thus, we refer to approaches of this type as alternating energy minimization (AEM) methods. In the following sections, we present two adaptive variants of AEM methods: a Stage- AEM method and a successive rank-one AEM method.

### 2.2 Stage-p AEM method

An alternating minimization method that entails solving a sequence of least-squares problems whose dimensions increase with was developed in [jain2013low] for solving matrix-recovery problems [haldar2009rank, hardt2014understanding, jain2013low]. We adapt this approach to the energy minimization problem (6) and refer to it as the Stage- AEM method. It is an iterative method that runs until an approximate solution satisfies a stopping criterion (e.g., the relative residual with a user-specified stopping tolerance ). At the th iteration, called a “stage” in [jain2013low], this method seeks -column factors and determining an approximate solution by initializing and solving the following systems of equations in sequence:

 m∑i=0(Ki)V(k)p(W(k−1)pTGiW(k−1)p)T =BW(k−1)p, (11) m∑i=0(V(k)pTKiV(k)p)W(k)pT(GTi) =V(k)pTB, (12)

for , where the superscript indicates the number of alternations between the two systems of equations (11)–(12). Note that the method can also begin by initializing and alternating between (12) and (11). Algorithm 1 summarizes the entire procedure. For the initialization of (line 3

), one step of the singular value projection method

[jain2010guaranteed] is performed with the exact settings from [jain2013low, Algorithm 3]. The CheckConvergence procedure (line 9) is detailed in Section 3.

Systems of equations for “vectorized” versions of the matrix factors and can be derived111The left-hand sides of (13)–(14) are derived using . Note that (14) is derived by first transposing (12) and then vectorizing the resulting equation. In the sequel, vectorized versions of equations for the factor are derived by first taking the transpose. from (11) and (12) as follows

 m∑i=0[(W(k−1)pTGiW(k−1)p)⊗Ki]vec(V(k)p) =vec(BW(k−1)p), (13) m∑i=0[(V(k)pTKiV(k)p)⊗Gi]vec(W(k)p) =vec(BTV(k)p). (14)

Thus, solving (11) and (12) is equivalent to solving coupled linear systems with coefficient matrices of dimensions and , respectively, which are smaller than that of the original system (2) when is small. However, the reduced matrix factors (of size ) are dense, even if the original ones are sparse, and so as increases, the computational costs for solving (11)–(12) increase and the Stage- AEM method may be impractical for large-scale problems.

### 2.3 Successive rank-one AEM method

We now describe a successive rank-one (S-rank-) approximation method which, at each iteration, adds a rank-one correction to the current iterate. This is a basic component of PGD methods [nouy2007generalized, nouy2010proper, tamellini2014model] for solving parameterized PDEs. The method only requires solutions of linear systems with coefficient matrices of size and rather than coupled systems like those in the Stage- AEM method that grow in size with the step counter .

Assume that pairs of solutions are computed, giving and . The next step is to compute a new solution pair by choosing the objective function

 JA(vp,wp)=12∥U−Vp−1WTp−1−vpwTp∥2A,

and solving the following minimization problem

 minvp∈Rn1,wp∈Rn2JA(vp,wp).

The gradients of with respect to and are

 ∇vpJA =(A(vpwTp)+A(Vp−1WTp−1)−B)wp, (15) ∇wpJA =(A(vpwTp)+A(Vp−1WTp−1)−B)Tvp. (16)

Employing the first-order optimality conditions (setting (15) and (the transpose of) (16) to zero) results in systems of equations for which, in a succession of steps , is updated using fixed and then is updated using fixed :

 m∑i=0(Ki)v(k)p(w(k−1)pTGiw(k−1)p)T =Bw(k−1)p−A(Vp−1WTp−1)w(k−1)p, (17) m∑i=0(v(k)pTKiv(k)p)w(k)pT(GTi) =v(k)pTB−v(k)pTA(Vp−1WTp−1). (18)

Algorithm 2 summarizes this procedure, which randomly initializes and then alternately solves (17)–(18). Like the Stage- AEM method, the algorithm can start with either or .

### 2.4 Algebraic interpretation of the methods

Algorithms 1 and 2 both entail an “outer iteration” with counter and an “inner iteration” with counter , and both are designed to minimize the objective function (5). It is instructive to see the difference between the two methods in vectorized format. To this end, let

 Aw(wi,wj)=m∑l=0Kl(wTjGTlwi)∈Rn1×n1,Av(vi,vj)=m∑l=0Gl(vTjKTlvi)∈Rn2×n2,

and let us assume for simplifying the presentation.

Both methods seek solution pairs satisfying the systems of equations (9)–(10), which can be written in a vectorized form:

 =[\tabularcell@hbox\centering$Bw1$\@add@centering\tabularcell@hbox\centering$Bw2$\@add@centering], (19) =[\tabularcell@hbox\centering$BTv1$\@add@centering\tabularcell@hbox\centering$BTv2$\@add@centering]. (20)

In the second outer iteration, the Stage- AEM method alternately solves fully coupled linear systems (11)–(12) specified by and , which can be written in vectorized form as in (19)–(20):

 ⎡⎢ ⎢ ⎢⎣\tabularcell@hbox\centering$Aw(w(k−1)1,w(k−1)1)$\@add@centering\tabularcell@hbox\centering$Aw(w(k−1)1,w(k−1)2)$\@add@centering\tabularcell@hbox\centering$Aw(w(k−1)2,w(k−1)1)$\@add@centering\tabularcell@hbox\centering$Aw(w(k−1)2,w(k−1)2)$\@add@centering\tabularcell@hbox\centering$$\@add@centering⎤⎥ ⎥ ⎥⎦[\tabularcell@hbox\centeringv(k)1\@add@centering\tabularcell@hbox\centeringv(k)2\@add@centering] ⎡⎢ ⎢ ⎢⎣\tabularcell@hbox\centeringAv(v(k)1,v(k)1)\@add@centering\tabularcell@hbox\centeringAv(v(k)1,v(k)2)\@add@centering\tabularcell@hbox\centeringAv(v(k)2,v(k)1)\@add@centering\tabularcell@hbox\centeringAv(v(k)2,v(k)2)\@add@centering\tabularcell@hbox\centering$$\@add@centering⎤⎥ ⎥ ⎥⎦[\tabularcell@hbox\centering$w(k)1$\@add@centering\tabularcell@hbox\centering$w(k)2$\@add@centering] =[\tabularcell@hbox\centering$BTv(k)1$\@add@centering\tabularcell@hbox\centering$BTv(k)2$\@add@centering]. (21)

In contrast, the S-rank- method seeks approximate solutions of (19)–(20) by solving systems of equations associated with only the diagonal blocks. In the first iteration, the method alternates between the following equations to find and :

 =[\tabularcell@hbox\centering$Bw(k−1)1$\@add@centering], =[\tabularcell@hbox\centering$BTv(k)1$\@add@centering].

In the second iteration, the method alternately solves the systems of equations in the second rows of the following equations:

 [\tabularcell@hbox\centering$Aw(w1,w1)$\@add@centering\tabularcell@hbox\centering$$\@add@centering\tabularcell@hbox\centeringAw(w(k−1)2,w1)\@add@centering\tabularcell@hbox\centeringAw(w(k−1)2,w(k−1)2)\@add@centering][\tabularcell@hbox\centeringv1\@add@centering\tabularcell@hbox\centeringv(k)2\@add@centering] =[\tabularcell@hbox\centeringBw1\@add@centering\tabularcell@hbox\centeringBw(k−1)2\@add@centering], [\tabularcell@hbox\centeringAv(v1,v1)\@add@centering\tabularcell@hbox\centering$$\@add@centering\tabularcell@hbox\centering$Av(v(k)2,v1)$\@add@centering\tabularcell@hbox\centering$Av(v(k)2,v(k)2)$\@add@centering][\tabularcell@hbox\centering$w1$\@add@centering\tabularcell@hbox\centering$w(k)2$\@add@centering]

Because and are fixed, the (2,1)-block matrices are multiplied with and and the resulting vectors are moved to the right-hand sides. Then solving the equations associated with the (2,2)-block matrices gives and . As illustrated in this example, the S-rank- AEM method approximately solves (19)–(20) by taking the matrices in the lower-triangular blocks to the right-hand sides and solving only the systems associated with the diagonal blocks, as opposed to solving fully coupled systems as in the Stage- AEM method.

The system matrices that arise in Algorithm 1 have reduced factors that are dense but small (of size ) and their counterpart factors are large but sparse. In Algorithm 2, the system matrices are sparse and of order and (as the reduced factors are of size 1 1). Thus in both cases, we may use Krylov subspace methods to solve the systems. Then, with the iteration counter , the cost of the Stage- AEM method grows quadratically (since the reduced factors are dense), whereas that of the S-rank- AEM method grows linearly with . Thus, using the Stage- AEM method can be impractical for large-scale applications. On the other hand, as the S-rank- AEM method employs only the lower-triangular part of the system matrices, convergence tends to be slow and the level of accuracy that can be achieved in a small number of steps is limited. To overcome these shortcomings, in the next section, we will consider several ways to modify and enhance them to improve accuracy.

###### Remark 1

The Stage- AEM and S-rank- AEM methods can be seen as two extreme versions of AEM methods. The former solves fully coupled systems and the latter sequentially solves systems associated with the diagonal blocks. Although it has not been explored in this study, in an intermediate approach, more than one consecutive pair of solution vectors , with , can be computed in a coupled manner at each outer iteration.

## 3 Enhancements

We now describe variants of the S-rank- AEM method that perform extra computations to improve accuracy. The general strategy is to compute an enhancement of the approximate solution at every outer iterations of the S-rank- AEM method, as specified in Algorithms 35.

We present three enhancement procedures, one taken from the literature and two new ones. These are (i) a procedure adopted from an updating technique developed in [tamellini2014model, Section 2.5], which defines one variant of PGD methods; (ii) a refined version of this approach, which only solves systems associated with the diagonal blocks of the system matrices but incorporates information (upper-triangular blocks) in a manner similar to Gauss-Seidel iterations; and (iii) an adaptive enhancement of the Stage- AEM method that decreases costs with negligible impact on accuracy. In discussing these ideas, we distinguish updated solutions using the notation, , (for vectors), and , (for matrices).

Before we detail each method, we first elaborate on the CheckConvergence procedure in Algorithm 5. This checks the relative difference between the current iterate and the previous iterate in the Frobenius norm.222To compute , we form , where is the Hadamard product, and then sum-up all the elements of X. The product is never explicitly formed. If this condition is met, we apply the Enhancement procedure and check the convergence with the same criterion. The purpose of this extra enhancement is to help prevent Algorithm 3 from terminating prematurely (i.e., the stopping condition can be met when Algorithm 3 stagnates.).

### 3.1 PGD-updated AEM

Suppose the factors and obtained from RankOneCorrection do not satisfy the first-order optimality conditions (9)–(10). An enhancement like that of the PGD update [nouy2007generalized, nouy2010proper, tamellini2014model] modifies one of these factors (e.g., the one corresponding to the smaller dimension or ) by solving the associated minimization problem for (given , when ) or for (given when ) so that one of the first-order conditions holds. We outline the procedure for approximating ; the procedure for is analogous. The basic procedure is to solve the optimization problem every steps. In place of , an orthonormal matrix is used, so that the construction entails solving

 ¯¯¯¯¯¯Wp=argminWp∈Rn2×pJA(˜Vp,Wp), (22)

where is the quadratic objective function defined in (5). The gradient of the objective function with respect to can be computed as

 ∇WpJA =(A(˜VpWTp)−B)T˜Vp=m∑i=0(Ki˜VpWTpGTi)T˜Vp−BT˜Vp.

Thus, solving the minimization problem (22) by employing the first-order optimality condition is equivalent to solving a system of equations similar in structure to (10),

 m∑i=0(˜VTpKi˜Vp)¯¯¯¯¯¯WTp(GTi) =˜VTpB∈Rp×n2. (23)

Compared to the original system (2), the dimension of this matrix is reduced via a “single-sided” reduction; in (23), the reduction is on the side of the first dimension, i.e., is reduced to . The vectorized form of this system, for , is

which has structure like that of the second system in (21) of the Stage-  AEM method. We summarize this single-sided enhancement method in Algorithm LABEL:alg:aem_sr.