On maximum volume submatrices and cross approximation for symmetric semidefinite and diagonally dominant matrices

02/06/2019
by   Alice Cortinovis, et al.
EPFL
0

The problem of finding a k × k submatrix of maximum volume of a matrix A is of interest in a variety of applications. For example, it yields a quasi-best low-rank approximation constructed from the rows and columns of A. We show that such a submatrix can always be chosen to be a principal submatrix if A is symmetric semidefinite or diagonally dominant. Then we analyze the low-rank approximation error returned by a greedy method for volume maximization, cross approximation with complete pivoting. Our bound for general matrices extends an existing result for symmetric semidefinite matrices and yields new error estimates for diagonally dominant matrices. In particular, for doubly diagonally dominant matrices the error is shown to remain within a modest factor of the best approximation error. We also illustrate how the application of our results to cross approximation for functions leads to new and better convergence results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/09/2020

Some algorithms for maximum volume and cross approximation of symmetric semidefinite matrices

Finding the r× r submatrix of maximum volume of a matrix A∈ℝ^n× n is an ...
11/30/2020

Low rank approximation of positive semi-definite symmetric matrices using Gaussian elimination and volume sampling

Positive semi-definite matrices commonly occur as normal matrices of lea...
06/14/2019

Optimal orthogonal approximations to symmetric tensors cannot always be chosen symmetric

We study the problem of finding orthogonal low-rank approximations of sy...
08/16/2019

Low-rank approximation in the Frobenius norm by column and row subset selection

A CUR approximation of a matrix A is a particular type of low-rank appro...
04/11/2022

Rank One Approximation as a Strategy for Wordle

This paper presents a mathematical method of playing the puzzle game Wor...
04/28/2020

Denise: Deep Learning based Robust PCA for Positive Semidefinite Matrices

We introduce Denise, a deep learning based algorithm for decomposing pos...
06/13/2018

Stability and complexity of mixed discriminants

We show that the mixed discriminant of n positive semidefinite n × n rea...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given a matrix and , the volume of a submatrix for two index sets of cardinality is defined as the absolute value of the determinant. The problem of finding the submatrix of maximum volume is connected to a range of applications in discrete mathematics, engineering, and scientific computing; see, e.g., [1, 17, 36]. Our primary motivation is its connection to low-rank approximation. Specifically, if is invertible then the matrix has rank where the colon is used to denote that all rows or columns are selected. Low-rank approximations that involve columns and rows of the original matrix come in various flavors, as pseudoskeleton approximation [16], cross approximation [4], CUR approximation [11], or (strong) rank-revealing LU factorizations [27, 29]. If has maximum volume then, by a result of Goreinov and Tyrtyshnikov [15], we have

(1)

where denotes the maximum absolute value of the entries of a matrix. The

th largest singular value of

, denoted by , is well known to govern the best rank- approximation error of in the spectral norm; see, e.g., [22, Chapter 7.4]. In other words, the result (1) states that volume maximization yields a quasi-best low-rank approximation.

Finding the submatrix of maximum volume is a difficult problem. In fact, it is NP hard to determine the maximum volume submatrix of  [7, 30]. In [9] it is shown that there exists a universal constant such that it is NP hard to approximate the maximum volume of a submatrix of a matrix within a factor . By a trivial embedding, this implies that it is also NP hard to approximate the maximum volume of a submatrix of an matrix. However, it is important to emphasize that while a good approximation of the maximum volume submatrix yields a good low-rank approximation [15, Theorem 2.2], the converse is generally not true, see also Remark 11 below.

Despite the difficulties associated with volume maximization, this concept has proven fruitful in the development of greedy and randomized algorithms that often yield reasonably good low-rank approximations. In particular, (adaptive) cross approximation [4], a greedy method for volume maximization, can now be regarded as the work horse for matrices that cannot be stored in memory, because, for example, has too many nonzero entries or it is too expensive to compute all the entries of . This situation occurs frequently for discretized integral operators and cross approximation plays an important role in accelerating computations within the boundary element method [4] and uncertainty quantification [18]. Low-rank approximations of the form (1) also feature prominently in the Nyström method for kernel-based learning [2]

and spectral clustering 

[13]. Let us also stress that cross approximation is equivalent to Gaussian elimination and (incomplete) LU factorization with complete pivoting; it primarily constitutes a different point of view with stronger emphasis on low-rank approximation.

In many of the applications mentioned above, the matrix carries additional structure. For example, if is the discretization of an integral operator with a positive semidefinite kernel then is symmetric positive semidefinite. In the first part of this work, we will show that the submatrix of maximum volume is always attained by a principal submatrix if is symmetric positive semidefinite (SPSD). This has a number of important consequences. For example, it allows us to draw a one-to-one correspondence to the column selection problem considered in [7]. In turn, the maximum volume problem remains NP hard when restricted to SPSD matrices. We also extend this result to diagonally dominant (DD) matrices. Somewhat surprisingly, we have not found such results for SPSD and DD matrices in the existing literature.

In the second part of this work, we derive a priori error bounds for the approximation returned by cross approximation. Although the literature on rank-revealing LU factorizations contains related results, see in particular [12, Corollary 5.3], the non-asymptotic bound of Theorem 6 appears to be new. Our result includes existing work by Harbrecht et al. [18] for SPSD matrices as a special case. For the particular case of doubly DD matrices, we show that the approximation error returned by cross approximation is at most times larger than the right-hand side of (1). This class of matrices includes symmetric DD matrices, which play a prominent role in [24, 33]. Our result also allows us to obtain refined bounds for the convergence of cross approximation applied to functions [4, 35].

2 Maximum volume submatrices

In this section, we will prove for two classes of matrices that the submatrix of maximum volume can always be chosen to be a principal submatrix, that is, a submatrix of the form .

2.1 Symmetric positive semidefinite (SPSD) matrices

Theorem 1.

Let be SPSD and let . Then the maximum volume submatrix of can be chosen to be a principal submatrix.

Proof.

Let be any submatrix of . As is SPSD, it admits a Cholesky decomposition and, in turn, . The singular values of a principal submatrix satisfy

Noting that the absolute value of the determinant equals the product of the singular values, we obtain

where we used [21, Theorem 3.3.4] for the inequality. This implies that the volume of is not larger than the maximum of the volumes of and . In turn, can be replaced by a principal submatrix without decreasing the volume. ∎

Trivially, the result of Theorem 1 extends to symmetric negative semidefinite matrices. On the other hand, it does not extend to the indefinite case; consider for example the matrix .

2.1.1 Connection to column selection

The volume of a general matrix is defined as the product of its singular values. The column selection problem, which is also connected to low-rank approximation [8], is the following:

Given and , select the submatrix of maximum volume.

In [7] it is shown that this problem is NP hard.

Theorem 1 allows us to relate the column selection problem to the classical maximum volume submatrix problem. Given , we consider the SPSD matrix . As for any index set , there is a one-to-one correspondence between the principal submatrices of and the subsets of columns of . Moreover, as seen in the proof of Theorem 1, the volume of is the square of the volume of . This shows that has maximum volume if and only if has maximum volume. In turn, this proves that the maximum volume submatrix problem remains NP hard when restricted to the subclass of SPSD matrices.

2.2 Diagonally dominant (DD) matrices

Definition 2.

A matrix is called (row) diagonally dominant (DD) if

(2)

If (2) holds with strict inequality for , we call strictly DD. A matrix is called doubly DD if both and are DD.

Lemma 3.

Let be a strictly DD, upper triangular matrix. Then holds for every with and .

Proof.

Let be the diagonal matrix with and set . Because of

the statement of the lemma holds for if and only if it holds for . In turn, this allows us to assume without loss of generality that has ones on the diagonal. In particular, .

The statement of the lemma will be proven by induction on . The case follows immediately from the diagonal dominance of . Suppose now that the statement of the lemma is true for fixed . To prove the statement for , we consider an arbitrary submatrix . If then there exists a row that does not contain a diagonal element of . By diagonal dominance of ,

(3)

Denote by the submatrix of obtained from eliminating the th row and th column of . By induction assumption, . Thus, combining (3) with the Laplace expansion gives

In other words, . ∎

Theorem 4.

Let be a diagonally dominant matrix and . Then the maximum volume submatrix of can be chosen to be a principal submatrix.

Proof.

We prove the theorem in the case when is strictly DD; the DD case follows by a continuity argument, noting that volumes of submatrices are continuous in . Let be a submatrix of . Also, by applying a suitable permutation to the rows and columns of , we may assume that and with . The result of the theorem follows if we can prove

(4)

For this purpose, we note that the LU factorization always exists with strictly DD; see Theorem 9.9 in [20]. We have that

As is lower triangular with ones on the diagonal, we obtain

Thus, the inequality (4) follows from Lemma 3. ∎

For , the result of Theorem 4 is covered in the proof of Theorem 2.5.12 in [21], while the result of Lemma 3 for follows from Proposition 2.1 in [31].

3 Cross Approximation

In the following, we summarize the idea behind Bebendorf’s cross approximation algorithm [4]. For this purpose, we first recall that an approximation of the form is closely connected to an incomplete LU decomposition of . To see this, suppose that has been permuted such that and partition

Assume that is invertible and admits an LU decomposition , where is lower triangular and is upper triangular with ones on the diagonal. By setting and , we obtain

(5)
(6)

with the Schur complement

This shows that the approximation error is governed by . The factorized form (5) corresponds exactly to what is obtained after applying steps of the LU factorization to , see, e.g., [14, Chapter 3.2].

Given index sets and , one step of the greedy method for volume maximization consists of choosing indices such that

(7)

Again, let us assume that and set , for some . Then (6) implies

In other words, the local optimization problem (7) is solved by searching the entry of that has maximum modulus. This choice leads to Algorithm 1, which is equivalent to applying LU factorization with complete pivoting to .

1:  Initialize
2:  for  do
3:     
4:     
5:     
6:     
7:  end for
Algorithm 1 Cross approximation with complete pivoting [4]
Remark 5.

Because of (5), the remainder term of Algorithm 1 satisfies after a suitable permutation of the indices. Both for SPSD and DD matrices, the element of maximum modulus is on the diagonal. Positive definiteness and diagonal dominance are preserved by taking Schur complements; see, e.g., [38, Chapter 4]. In turn, the search for the pivot element in Step 3 can be restricted to the diagonal for such matrices. This significantly reduces the number of entries of that need to be evaluated when running Algorithm 1. It also implies that Algorithm 1 returns , which aligns nicely with the results from Section 2. Notice that if is an SPSD matrix then the cross approximation

obtained by Algorithm 1 is SPSD. In contrast, diagonal dominance is generally not preserved by the low-rank approximation returned by Algorithm 1.

3.1 Error analysis for general matrices

Although not desirable, it may happen that the pivots in Algorithm 1 grow. Upper bounds on the growth factor play an important role in the error analysis of Gaussian elimination (see e.g. [37]). In the setting of complete pivoting, we can define

(8)

where the supremum is taken over all matrices of rank at least . This condition ensures that there is no breakdown in the first steps of Algorithm 1. By definition, . Wilkinson [37] proved that

but it is known that this bound cannot be attained for . For matrices occurring in practice, it is rare to see any significant growth and it is not unreasonable to consider ; we refer to [20, Section 9.4] for more details. Extending the proof of [18, Theorem 3.2], we obtain the following result.

Theorem 6.

Let have rank at least . Then the index sets returned by Algorithm 1 satisfy

(9)
Proof.

Without loss of generality, we may assume . We perform one more step of Algorithm 1 and consider the relation from (5) for . Because of complete pivoting, the element of largest modulus in the th column of is on the diagonal and equals . For such triangular matrices, Theorem 6.1 in [19] gives

where denotes the spectral norm of a matrix. Analogously, using that the element of largest modulus in every row of is on the diagonal and equals , we obtain . Hence,

This implies

(10)

where we used interlacing properties of singular values, see [22, Corollary 7.3.6].

On the other hand, as is the matrix obtained after steps of Algorithm 1 applied to the matrix , the definition (8) gives the inequalities

for . We therefore obtain

(11)

Combined with (10), this shows the result of the theorem. ∎

Because of the factor , Theorem 6 only guarantees good low-rank approximations when the singular values are strongly decaying. This limitation does not correspond to the typical behavior observed in practice; the quantities and rarely assume the exponential growth estimates used in the proof of Theorem 6. In turn, the factor usually severely overestimates the error. Nevertheless, there are examples for which the error estimate of Theorem 6 is asymptotically tight; see Section 3.2 below.

The matrix norms on the two sides of the estimate (9) do not match. In the following we develop a variant of Theorem 6 in which the best approximation error is also measured in terms of . This will be useful later on, in Section 3.5, when considering approximation of functions. Let us define the approximation numbers

Because of , we have . If is invertible then . This result, relating the distance to singularity to the norm of the inverse, extends to general subordinate matrix norms; see, e.g., [20, Theorem 6.5]. In particular, we have

(12)

with denoting the matrix norm induced by the - and -norms. More generally, we set

for vector norms

, . Note that .

Theorem 7.

Under the assumptions of Theorem 6 and with the notation introduced above, we have

Proof.

Along the lines of the proof of Theorem 6, we first note that

which can be shown by induction on . Combined with , see [19, Theorem 6.1], we obtain

where we used submultiplicativity [20, Eqn (6.7)]. Using (12), this implies

with the last inequality being a direct consequence of the definition of . The rest of the proof is identical with the proof of Theorem 6. ∎

3.2 Error analysis for SPSD matrices

In the SPSD case, the pivot elements of Algorithm 1 are always non-increasing. Thus, when restricting the supremum in (8) to SPSD matrices of rank at least , one obtains . In turn, the following result due to Harbrecht et al. [18] is a corollary of Theorem 6.

Corollary 8.

For an SPSD matrix of rank at least , the bound of Theorem 6 improves to

The bound of Corollary 8 is asymptotically tight, see [18, Remark 3.3] and [23, p. 791]. As the growth factor which comes into play in Theorem 6 is small compared to the factor, this also proves that the bound of Theorem 6 is almost tight.

3.3 Error analysis for DD matrices

When restricting the supremum in (8) to DD matrices of rank at least , one obtains ; see Theorem 13.8 in [20].

Corollary 9.

For a DD matrix of rank at least , the bound of Theorem 6 improves to

Proof.

It is well known that the factor in the LU decomposition of a DD matrix is again DD; see [6]. In particular, this implies that the unit upper triangular matrix in the proof of Theorem 6 is DD. Then, for every entry of we have by [31, Prop. 2.1]. Therefore,

(13)

This shows that the factor can be reduced to in the bound of Theorem 6. Combined with , this establishes the desired result. ∎

Corollary 10.

For a doubly DD matrix of rank at least , the bound of Corollary 9 improves to

Proof.

Trivially, is DD. By the same arguments as in the proof of Corollary 9 this implies that not only but also is DD. Proceeding as in the derivation of (13), we get

This shows that the factor of Corollary 9 can be improved to . ∎

Remark 11.

The fact that Algorithm 1 gives a polynomially good low-rank approximation of a doubly DD matrix does not imply that it also gives a polynomially good approximation of the maximum volume submatrix. For instance, let and consider , where . Then Algorithm 1 does not perform any pivoting during its steps and thus the submatrix is selected. Its volume is , while the volume of is exponentially larger, it grows like .

3.4 Tightness of estimates for DD matrices

To study the tightness of the estimates from Section 3.3, it is useful to connect Algorithm 1 to LDU decompositions. Suppose that the application of steps of Algorithm 1 yields . As in the proof of Theorem 6, we exploit the relation (5) for to obtain the factorization

where and are lower and upper unit triangular matrices, respectively. We recall from (11) that the error of the approximation returned by Algorithm 1 is governed by .

From now on, let be a DD matrix. In this case, the pivot growth factor does not exceed and we have that . In turn, the ratio between and the best rank- approximation error satisfies

(14)

Inheriting the diagonal dominance from , the matrix is well conditioned; see (13). Therefore, large require to become large.

The quantity also plays a prominent role in the stability analysis of LDU decompositions, see [10] and the references therein. In particular, the potential rapid growth of under complete pivoting has motivated the search for alternative pivoting strategies [31]. However, the existing literature is scarce on examples actually exhibiting such rapid growth. The worst example we could find is by Barreras and Peña [3, Sec. 3], which exhibits linear growth. A more rapid growth is attained by the matrix

where is even and each block has size . When applying complete pivoting to this matrix, no interchanges are performed and the LDU factorization satisfies

Note that, for this example, the right-hand side of (14) overestimates the error. This example attains quadratic growth: . This is still far away from the exponential growth estimated in Corollary 9, but closer than the example from [3, Sec. 3], which yields .

For a doubly DD matrix, one obtains linear growth in (14) by considering the lower bidiagonal matrix having on the diagonal and on the first subdiagonal. In this case, , and hence , showing that can grow at least linearly with . We have not found an example exhibiting the quadratic growth estimated by Corollary 10.

3.5 Cross approximation for functions

Let us consider the approximation of a function by a sum of separable functions:

In the context of cross approximation, the factors are restricted to functions of the form and , where and are fixed elements of . In particular, Micchelli and Pinkus [26]

considered interpolating approximations of the following form:

for some . Townsend and Trefethen [35] use a strategy for choosing the interpolation points which is basically equivalent to Algorithm 1 and they prove a convergence result under some analyticity hypotheses on the function . There also exist error analyses for cross approximation of functions when using different pivoting strategies, see, e.g., [5, 32].

Algorithm 2 summarizes cross approximation of functions with complete pivoting.

0:   and
1:  
2:  
3:  
4:  for  do
5:     
6:     
7:     
8:  end for
Algorithm 2 Cross approximation of functions [34, Figure 2.1]

We now explain the connection to Algorithm 1. Fix and consider the points and obtained by the first steps of Algorithm 2. Consider what happens when applying Algorithm 1 to the matrix obtained by interpolating in the points mentioned above:

The first chosen pivot will be because it is the largest entry of the matrix. Now observe that the Schur complement obtained after the first step, is the matrix that interpolates the function in the points and . At this point, the second pivot chosen by Algorithm 1 will be because of how Algorithm 2 chose in line . After steps of Algorithm 1 we will be left with only one nonzero entry in position and this will be . This allows us to estimate via Theorem 7:

(15)

The last thing we need is an estimate on that is uniform in . This will follow from analyticity assumptions on the functions for .

Definition 12.

The Bernstein ellipse of radius is the ellipse with foci in and and with sum of the semi-axes equal to .

Corollary 13.

Let be such that admits an analytic extension - which we will denote by - in the Bernstein ellipse of radius for each . Let and

After steps of Algorithm 2 the error function satisfies

Proof.

Fix and let be the vector-valued function defined by

The analyticity hypothesis allows us to apply standard polynomial approximation results (see e.g. Corollary 2.2 in [25]) and conclude that there exists an approximation given by

(16)

where are constant vectors and are polynomials, such that

for any . We can clearly bound .

The matrix is obtained by sampling in the points , i.e.

Let us define, analogously,

Notice that has rank as most because by (16) each of the columns of is a linear combination of the vectors , so

The result then follows from Equation (15). ∎

To get convergence of the error function to zero as , in Corollary 13 it is sufficient that the function admits an analytic extension to the Bernstein ellipse with for each , because the factor has subexponential growth. Our result compares favorably to Theorem 8.1 in [35], which requires an analytic extension to the region consisting of all points at a distance from .

Figure 1 compares the two domains and it is evident that the requirement from [35] is significantly more restrictive.