Rank-revealing factorizations have been around for a long time, including [BG65], which introduced the world to QR with pivoting to solve least-squares problems. Since then, many other algorithms have been proposed, among which we mention [CI94], [PT99]; for a more complete list the reader can refer to [GE96]. While they all perform very well most of them time, the only one that stably produces a strong rank-revealing factorization (in the [GE96] sense, defined in the next section) in an arithmetic complexity comparable to QR belongs to Ming Gu and Stanley Eisenstat [GE96].
Recently, the idea of using randomized algorithms for rank approximation (or more generally for low-rank approximations of matrices) has received a lot of attention due to the applications in signal processing and information technology, for example [LSN09] and [Mar14]. For a good overview of the types of algorithms involved, see [HMT11].
We provide here an analysis of the (“Randomized URV”) factorization, or RURV, which will allow us to prove that it has the following three properties.
It is communication-optimal. It uses only QR and and matrix multiplication, and thus both its arithmetic complexity and its communication complexity are asymptotically the same as QR and matrix multiplication.
If the information desired is related to the invariant subspaces, it can be applied to a product of matrices and inverses of matrices without the need to explicitly calculate any products or inverses.
To place these three properties in context, we compare with recent trends in the randomized numerical linear algebra literature. In work focused on sketching, such as the approaches in the overview [HMT11], it is customary to make the assumption that the rank is small, generally much smaller than the size of the matrix. In such cases, the speedups achieved by the algorithms in [HMT11] and others like them over QR
is significant, both in terms of arithmetic complexity and other features, like parallelization; naturally, the results can be achieved only with (arbitrarily) high probability. The downside is that this literature largely focuses on taking advantage of matrices with low numerical rank, and quickly producing low-rank approximations of such matrices. Such developments are insufficient for effective use as a numerical subroutine within the communication optimal generalized eigenvalue algorithm, which at minimum require parts of the first and third properties mentioned above.
Other recent approaches using randomization include [DG17] and [MQOHvdG17]. These works recognize that QR with column pivoting tends to have good rank-revealing properties, but that column pivoting induces extra communication. They build on this realization by using randomization to guide pivot selection during the QR algorithm and introduce block pivoting strategies. However, in contrast to our work and earlier work such as [GE96], theoretical bounds are not provided. We therefore emphasize that our RURV is communication optimal while maintaining the key theoretical properties of strong rank-revealing QR-factorizations. Moreover, RURV is conceptually simple, as it depends only on the existence of communication-avoiding QR algorithms, which were popularized in [DGHL12].
A subset of the authors introduced RURV in [DDH07], for the purpose of using it as a building block for a divide-and-conquer eigenvalue computation algorithm whose arithmetic complexity was shown to be the same as that of matrix multiplication. The analysis of RURV performed at the time was not optimal, and the authors of [DDH07] had not realized that RURV has the third property listed above. This property makes RURV unique among rank-revealing factorizations, as far as we can tell, and it is crucial in the complexity analysis of the aforementioned divide-and-conquer algorithm for nonsymmetric eigenvalue computations [BDD11].
The rest of the paper is structured as follows: in Section 2 we give the necessary definitions and the algorithms we will use, and Section 3 proves some necessary probability results; Section 4 deals with the analysis of RURV, the short Section 5 generalizes the algorithm to work for a product of matrices and inverses, and Section 6 presents some numerical experiments validating the correctness and tightness of the results of Sections 4 and 5.
2 Randomized Rank-Revealing Decompositions
Let be an
matrix with singular values, and assume that there is a “gap” in the singular values at level , that is, .
Informally speaking, a decomposition of the form is called rank revealing if the following conditions are fulfilled:
and are orthogonal/unitary and is upper triangular, with and ;
is a “good” approximation to (at most a factor of a low-degree polynomial in away from it),
is a “good” approximation to (at most a factor of a low-degree polynomial in away from it);
In addition, if is small (at most a low-degree polynomial in ), then the rank-revealing factorization is called strong (as per [GE96]).
, condition estimation[Bis90], etc., as well as in divide-and-conquer algorithms for eigenproblems. For a good survey paper, we recommend [GE96].
In the paper [DDH07], the authors proposed a randomized rank revealing factorization algorithm RURV. Given a matrix , the routine computes a decomposition with the property that is a rank-revealing matrix; the way it does it is by “scrambling” the columns of
via right multiplication by a uniformly random orthogonal (or unitary) matrix
(the uniform distribution over the manifold of unitary/orthogonal matrices is known as Haar). One way to obtain a Haar-distributed random matrix is to start from a matrix of independent, identically distributed normal variables of mean
and variance(denoted, here and throughout the paper, by ), and to perform the QR algorithm on it. The orthogonal/unitary matrix obtained through this procedure is Haar distributed.
It is worth noting that there are in the literature other ways of obtaining Haar-distributed matrices, and some involve using fewer random bits and/or fewer arithmetic operations; we have chosen to use this one because it is simple and communication-optimal (as it only involves one QR operation, which can be performed optimally from a communication perspective both sequentially and in parallel [DGHL12, BDG14, BDG18]. On a practical side, we note that using less randomness would not incur any significant overall savings, as the total arithmetic cost is much higher than the cost of generating
normal random variables.
Performing QR on the resulting matrix yields two matrices, (orthogonal or unitary) and (upper triangular), and it is immediate to check that .
We also define the routine RULV, nearly identical to RURV, which performs the same kind of computation (and obtains a rank revealing decomposition of ), but uses QL instead of QR, and thus obtains a lower triangular matrix in the middle, rather than an upper triangular one. red(Note one can think of the decomposition RULV as being the transpose of the decomposition RURV performed on .)
Given RURV and RULV, we now can give a method to find a randomized rank-revealing factorization for a product of matrices and inverses of matrices, without actually computing any of the inverses or matrix products. This is a very interesting and useful procedure in itself, and at the same time it is crucial in the analysis of a communication-optimal Divide-and-Conquer algorithm for the non-symmetric eigenvalue problem presented in [BDD11].
Suppose we wish to stably find a randomized rank-revealing factorization for the matrix , where are given matrices, and , without actually computing or any of the inverses.
Essentially, the method performs RURV or, depending on the power, RULV, on the last matrix of the product, and then uses a series of QR/RQ to “propagate” an orthogonal/unitary matrix to the front of the product, while computing factor matrices from which (if desired) the upper triangular matrix can be obtained. A similar idea was explored by G.W. Stewart in [Ste95] to perform graded QR; although it was suggested that such techniques can be also applied to algorithms like URV, no randomization was used.
The algorithm is presented in pseudocode below. For the proof of correctness, see Lemma 5.1.
3 Smallest singular value bounds
The estimates for our main theorem are based on the following result, a more general case of which can be found in [Dum12]; in particular, the following is a consequence of Theorem 3.2 and Lemma 3.5.
Let be a random variable denoting the smallest singular value of an corner of an real Haar matrix.
This Proposition allows us to estimate very closely the probability that is small. In particular, the correct scaling for the asymptotics of under andor was proved in [Dum12] to be (that is, almost surely), which means that the kind of upper bounds one should search for are of the form “” for some constant . This constant will depend on how confident we want to be that the inequality holds; if we wish to say that the inequality fails with probability , then we will have as a function of .
Let , ; then the probability that is .
What we essentially need to do here is find an upper bound on which, when integrated over small intervals next to , yields the bound in the Lemma.
We will first upper bound the term in the expression of by .
Secondly, we note that the hypergeometric function has all positive arguments, and hence from its definition, it is monotonically decreasing from to , and so we bound it by its value at . As per [AS70, Formula 15.1.20],
and after some obvious cancellation we obtain that
The following expansion can be derived from Stirling’s formula and is given as a particular case of [AS70, Formula 6.1.47] (with ):
as real and . In particular, means that .
Provided that , we thus have
Note that this last inequality allows us to conclude the following:
As an immediate corollary to Lemma 3.3 we obtain the following result, which is what we will actually use in our calculations.
Let , . Then
4 Analysis for RURV
4.1 Bounding the probability of failure for RURV
It was proven in [DDH07] that, with high probability, RURV computes a good rank revealing decomposition of in the case of real. Specifically, the quality of the rank-revealing decomposition depends on computing the asymptotics of , the smallest singular value of an submatrix of a Haar-distributed orthogonal matrix. All the results of [DDH07] can be extended verbatim to Haar-distributed unitary matrices; however, the analysis employed in [DDH07] is not optimal. Using the bounds obtained for in the previous section, we can improve them here.
We will tighten the argument to obtain one of the upper bounds for . In addition, the result of [DDH07] states only that RURV is, with high probability, a rank-revealing factorization. Here we strengthen these results to argue that it is actually a strong rank-revealing factorization (as defined in the Introduction), since with high probability will be small.
In proving Theorem 4.3, we require two lemmas that we state here and prove afterwards.
Let be an matrix whose SVD is , and with singular values .
Let be the matrix produced by the RURV algorithm on , in exact arithmetic, so that . Then defining ,
where is the upper-left submatrix.
Carrying over the notation of Lemma 4.1,
We are ready for the main result.
Let be an matrix with singular values . Let . Let be the matrix produced by the RURV algorithm on , in exact arithmetic. Assume that .
The factor in the equations (3), (4), matches the best deterministic algorithms up to a constant. When the gap is large enough so that is with some small constant, so that the additional hypothesis applies, (6) also matches the best deterministic algorithms up to a constant. Even when the gap is small, (5) shows the factorization is strong with high probability.
There are two cases of the problem, and . Let be the Haar matrix used by the algorithm. From [GVL12, Theorem 2.4-1], the singular values of when consist of ’s and the singular values of . Thus, the case reduces to the case .
The upper bound in inequality (3) and the lower bound in inequality (4) follow from the Cauchy interlace theorem (see [HJ90, Theorem 7.3.9]). The lower bound in inequality (3) follows immediately from [DDH07, Theorem 5.2] and Corollary 3.4. We provide proofs of the upper bounds of inequalities (4) and (5), below.
Theorem 5.2 from [DDH07] states that
provided that . This upper bound is lax, and we tighten it here. Note that Lemma 4.1 establishes that
where is Haar distributed and is its upper-left submatrix. We conclude with probability , by using Corollary 3.4. This completes the proof of (2).
where again is Haar distributed. We conclude
It remains to show the strengthened bound (6) on when . We use the following notation. Let
be the singular value decomposition of, where and . Let be the random unitary matrix in RURV, so that . Then has the same distribution as , by virtue of the fact that ’s distribution is uniform over unitary matrices.
where is and is . Then
Denote where is an matrix and is . Since is unitary, it is not hard to check that
where is the pseudoinverse of , i.e. .
There are two crucial facts that we need to check here: one is that actually exists, and the other is that the pseudoinverse (as defined above) is well-defined, that is, that is full rank. We start with the second one of these facts.
The matrix is full-rank with probability . This is true due to two reasons: the first one is that the first singular values of , ordered decreasingly on the diagonal of , are strictly positive. The second one is that is Haar distributed, and hence Lemma 3.3 shows that is invertible with probability 1. It follows that is well-defined.
To argue that exists, note that so rank()rank() as is unitary. Since is full-rank, it follows that is invertible.
Having made sure that the equation relating and is correct, we proceed to study the right hand side. From the definition of , we obtain that
We split this into two terms. Let be defined as follows:
where the last equality reflects the factoring out of to the left and of to the right inside the first parenthesis, followed by cancellation. Since is a submatrix of a unitary matrix, , and thus
where the last inequality follows from the fact that for a matrix with , . The right hand side has been obtained by applying norm inequalities and using the fact that . The assumption on can be rearranged to . Combine this with to get
We conclude that
We now apply similar reasoning to the second (remaining) term
to yield that
We return now for the proofs of the two lemmas.
Proof of Lemma 4.1.
Let . We begin by introducing a few block notations naturally suggested by the singular value gap, separating the first coordinates from the final coordinates:
Note that is the upper-triangular factor resulting from QR-factorization of . From this and understanding that the QR-factorization records the Gram-Schmidt orthogonalization process, we see that has the same singular values as , where is any matrix whose columns are a basis for the orthogonal complement of . In particular, .
It is also clear from that if contains elements on the diagonal, then the corresponding rows of are . Therefore we make the assumption that there are no singular values.
We next relate to in order to analyze this matrix. Using a common matrix identity in first equality and the orthogonality of in the third equality, we see that
Thus, we seek to bound .
We will present a coordinate-based bound. First, it is true for any invertible that . We set to be the orthogonal matrix
of right-singular vectors of. Thus,
and by the definition of ,
To keep notation as simple as possible, we denote and partition . Selecting any column from , observe that . Thus we can compute the needed projection by performing Gram-Schmidt on with respect to , which we observe to produce .
We are looking for an upper bound for the latter, so it suffices to bound the quantity away from . Note that if we can restrict the problem to the last coordinates and get a non-trivial lower bound, we have achieved our goal.
For simplicity, we denote the last coordinates of by , of by , and of by . The quantity is a least squares error, specifically,
and on the other hand, trivially,
To complete the proof, we bound the quantity away from in terms of the smallest singular value of the lower right submatrix of the original random matrix . Indeed,
However, also recall that is exactly the lower block of . As is unitary and applied on the right, we see . Finally, it is not difficult to check that the orthogonality of implies that . Therefore, in total, we have shown
Proof of Lemma 4.2.
Let be the upper-triangular result of , where is formed by swapping column of with column . This is equivalent to saying that is the upper-triangular result of , where swaps column with column . This point of view will be more helpful in the proof. From Lemma 3.1 of [GE96],
We are particularly interested in the case when and , as will become apparent below.
Since our bound is going to use the coordinate-based inequality (9), much as in Lemma 4.1, it is again useful to change coordinates to an optimal choice. One can check that , since (9) can be viewed as a generalization of Cramer’s Rule. Therefore for orthogonal matrices of appropriate sizes , ,
Choosing now and to be given by the SVD of in appropriate column order, we can ensure that the norm of is the lower right entry of .
Suppose now that we bounded the entries of in terms of a function of the matrix that is invariant under right multiplication of by a block-orthogonal matrix . Then the bound on the bottom right entry of would apply to the operator norm. Hence, using Equation (9), our the task is to bound
where has resulted from swapping column in with column in .
With the preliminaries over, we introduce notation to assist in the proof. Using Matlab notation, let , , , ; will denote the -th column of , will denote the last column of (also of ). We make use of one projection extensively, and therefore denote it by letter . Finally, projection onto the first coordinates is used, and we denote this by .
We are ready for the main part of the lemma’s proof. First upper bound . Let be the maximal right singular unit-vector of the operator . We have
To analyze this further, decompose where and are unit-vectors orthogonal to each other. We will have to control this term later in the proof. To this end, note that we have the following least squares interpretation:
and by making and taking advantage of the fact that is in the image of so that , we obtain
In the second inequality above we have used that .
Our main upper bound on is then
Due to the presence of in the bound above, we will need to consider two different cases depending on whether this term is small or large; as such, we will need to develop two different lower bounds on . We now build the the first lower bound on .