Refinement of Low Rank Approximation of a Matrix at Sub-linear Cost

by   Victor Y. Pan, et al.
CUNY Law School

Low rank approximation (LRA) of a matrix is a hot subject of modern computations. In application to Big Data mining and analysis the input matrices are so immense that one must apply sub-linear cost algorithms, which only access and process a tiny fraction of the input entries. Under this restriction one cannot compute accurate LRA of the worst case input matrix and even of the matrices of some specific small families in our Appendix, but we recently prove that one can do this with a high probability (whp) for a random input, which is in good accordance with our tests and with more than a decade of worldwide computation of LRA at sub-linear cost by means of Cross--Approximation algorithms. A natural challenge is to complement such computation of LRA by its refinement at sub-linear cost, and we take that challenge and propose two algorithms for this task together with some recipes for a posteriori estimation of the errors of LRA at sub-linear cost.



page 1

page 2

page 3

page 4


Low Rank Approximation of a Matrix at Sub-linear Cost

A matrix algorithm performs at sub-linear cost if it uses much fewer flo...

Low Rank Approximation at Sublinear Cost by Means of Subspace Sampling

Low Rank Approximation (LRA) of a matrix is a hot research subject, fund...

Low Rank Approximation Directed by Leverage Scores and Computed at Sub-linear Cost

Low rank approximation (LRA) of a matrix is a major subject of matrix an...

Learning-Based Low-Rank Approximations

We introduce a "learning-based" algorithm for the low-rank decomposition...

Randomized Numerical Linear Algebra: Foundations Algorithms

This survey describes probabilistic algorithms for linear algebra comput...

Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation

This paper argues that randomized linear sketching is a natural tool for...

On Learned Sketches for Randomized Numerical Linear Algebra

We study "learning-based" sketching approaches for diverse tasks in nume...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) Accurate LRA at sub-linear cost: the problem and our recent progress.

Numerical Linear and Multilinear Algebra and Data Mining and Analysis, with applications ranging from machine learning theory and neural networks to term document data and DNA SNP data (see surveys

[HMT11], [M11], and [KS16]). Matrices representing Big Data are usually so immense that realistically one can only access and process a tiny fraction of their entries, but quite typically these matrices admit LRA, that is, are close to low rank matrices,222Here and throughout we use such concepts as “low”, “small”, “nearby”, etc. defined in context. with which one can operate by using sub-linear arithmetic time and memory space, that is, much fewer flops and memory cells than the matrix has entries.

Such an LRA algorithm (running at sub-linear cost) fails on a worst case input and even on the small families of matrices of our Appendix A. In worldwide computational practice and in our extensive tests, however, some sub-linear cost algorithms consistently compute reasonably accurate LRA of matrices admitting LRA, and the papers [PLSZ16], [PLSZ17], [PLSZa], [PLSZb], and [LPSa] provide some formal support for this empirical observation.

In various applications of LRA it is highly important or sometimes imperative to obtain as close LRA as possible, and then again we are challenged to achieve this at sub-linear cost. In this paper we take the challenge and propose two techniques for iterative refinement at sub-linear cost given a crude but reasonably close initial LRA.

Our first approach naturally extends the known methods for iterative refinement as a popular tool for the solution and least squares solution of a linear system of equations (see [S98, Sections 3.3.4 and 4.2.5], [H02, Chapter 12 and Section 20.5], [DHK06], [GL13, Sections 3.5.3 and 5.3.8], [B15, Sections 1.4.6, 2.1.4, and 2.3.7]), also known for other matrix computations, including nonlinear ones (see [S98, page 223 and the references on page 225]), but to the best of our knowledge, it has not been applied to LRA so far, possibly because computation of LRA at sub-linear cost has not been studied as much as it deserves.

Our second approach is dramatically different. It relies on recursive application of randomized LRA algorithms of [DMM08] by means of random sampling directed by the sampling probabilities, called leverage scores. They are computed from the top right singular space of an input matrix, but the computation allows some leverage, and so one can expect that these scores computed for a given LRA could define an even closer LRA for an input matrix.

All known algorithms for the computation of LRA by means of subspace sampling directed by leverage scores sample a large number of rows and columns of an input matrix, and so does our second algorithm for iterative refinement. This may restrict practical value of these algorithms (see [TYUC17, Section 1.7.3]), although our Algorithm 3.1 provides partial remedy at sub-linear cost.

In order to develop our refinement algorithms we supply some recipes for a posteriori error estimation of LRA at sub-linear cost; this can be of some independent value.

Our numerical tests are in good accordance with our formal study.

Our work seems to be the first step into highly important but long-ignored research area, and we hope to motivate further efforts into this direction.

Organization of our paper. In the next section we recall some background material. In Section 3 we transform any LRA into its SVD at sub-linear cost. We propose some algorithms for iterative refinement of an LRA at sub-linear cost in Sections 4 and 6 and study the problem of error estimation for LRA at sub-linear cost in Section 5. We devote Section 7, the contribution of the second author, to numerical tests. In Appendix A we specify some small families of matrices on which every LRA algorithm fails if it runs at sub-linear cost, even though every matrix of these families admits LRA. In Appendix B

we recall the known estimates for the output errors of LRA at sub-linear cost in a very special but important case where an input matrix is filled with i.i.d. values of a single random variable.

2 Some background for LRA

Definition 2.1.

(i) An matrix has rank at most , , if


(ii) Such a matrix has -rank at most for a fixed tolerance if admits its approximation within an error norm by a matrix of rank at most or equivalently if there exist three matrices , and such that


Here and hereafter unifies definitions of the spectral norm and the Frobenius norm .

The -rank is the rank; the -rank of a matrix for a small tolerance is said to be its numerical rank, hereafter denoted . A matrix admits its close approximation by a matrix of rank at most if and only if it has numerical rank at most .

A 2-factor LRA of of (2.2) can be generalized to a 3-factor LRA


for .

Remark 2.1.

The pairs of the maps and as well as and turn a 3-factor LRA of (2.3) into a 2-factor LRA of (2.2).

An important 3-factor LRA is the -top SVD where is the diagonal matrix of the

top (largest) singular values of

and and

are the unitary (orthogonal) matrices of the associated top singular vectors of

. is said to be the -truncation of , obtained from by setting to zero all its singular values but the largest ones.

The -top SVD defines an optimal 3-factor LRA under both spectral and Frobenius norms.

Theorem 2.1.

[GL13, Theorem 2.4.8]. Write Then under the spectral norm and under the Frobenius norm .

Lemma 2.1.

[GL13, Corollary 8.6.2]. For and a pair of matrices and it holds that

Lemma 2.2.

[The norm of the pseudo inverse of a matrix product.] Suppose that , and the matrices and have full rank . Then .

3 Transform of LRA into its SVD at sub-linear cost

Algorithm 3.1.

[Transform of any LRA into its SVD.]


Three matrices , , and .


and three matrices (orthogonal), (diagonal), and (orthogonal) such that .

  1. Compute the matrices , , , and being Q and R factors in thin QR factorization and or (in order to strengthen numerical stability) being Q and R factors in rank-revealing QR factorization and .

  2. Compute the matrix .

  3. Compute its full SVD where and is a nonsingular diagonal matrix..

  4. Compute the matrices and by keeping just first columns of the matrices , , respectively, and deleting the other columns. Output and the matrices , , and .

The algorithm runs at sub-linear cost if , and its correctness is readily verified.

The transition decreases the rank of LRA to at most at the price of only a mild increase of the error norm bound. Indeed triangle inequality implies that


for any matrix (cf. [TYUC17, Proposition 6.1]).

By using Remark 2.1 we can readily transform any 3-factor LRA into a 2-factor LRA; then by applying Algorithm 3.1 we can transform it further to its SVD.

We can a little simplify the transformation of a 3-factor LRA into a 2-factor LRA as follows:

(i) compute compact SVD where is a nonsingular diagonal matrix for ,

(ii) obtain two matrices and by keeping the first columns of the matrices and , respectively, and deleting the other columns, and

(iii) finally apply the recipes of Remark 2.1 to the 3-factor LRA .

4 Deterministic iterative refinement of an LRA at sub-linear cost

Next we propose an algorithm for iterative refinement of a sufficiently close LRA of (see Appendix 6 for an alternative algorithm.) At the th step we try to improve the current LRA by applying a fixed LRA algorithm to the current error matrix or to the matrix of its modification (see Remark 4.3). At every iteration the rank of the new tentative LRA is at least doubled, but we periodically cut it back to the value (see Remark 4.2).

Algorithm 4.1.

(Iterative refinement of a CUR LRA at sub-linear cost. See Remarks 4.2 and 4.3.)


Three integers , , and , , an matrix of numerical rank , a Sub-algorithm APPROX(r), which for a fixed positive integer computes a rank- approximation of its input matrix at sub-linear cost, and a Stopping Criterion, which signals when current candidate LRA is expected to be satisfactory (see the next section).




Recursively for do:

  1. Apply Sub-algorithms APPROX() to the matrix for .

  2. Let denote its output matrix of rank at most . Compute a new approximation of and the matrix , of numerical rank at most .

  3. Replace by and repeat stages 1 and 2 until either exceeds the allowed limitation, and then stop and output FAILURE, or Stopping Criterion is satisfied for some integer , and then stop and output the matrix .

Remark 4.1.

Progress in refinement. Write for all and observe that , and so if approximates closer than the matrix filled with 0s.

Remark 4.2.

Management of the rank growth. The bound on the rank of the matrices is doubled at every iteration of Algorithm 4.1; by allowing its further increase we can obtain more accurate LRA. Fast growth of the rank, however, implies fast growth of the complexity of an iteration, and so we should periodically (possibly for every ) compress the computed LRA into its -truncation by applying Algorithm 3.1. The error of LRA would grow in compression within bound (3.1), that is, less significantly if an LRA is close to an input matrix .

Remark 4.3.

Management of the precision of computing. As usual in iterative refinement we should apply the mixed precision technique, that is, perform the subtraction stage 2 with a higher precision than stage 1. Furthermore we can replace the summation by the summation of more than two terms, by representing the matrix by a pair of matrices and and possibly similarly representing the matrix . The basic observation is that we need lower precision in order to represent each term than their sums.

5 Heuristic a posteriori error estimation for LRA at sub-linear cost

Accurate a posteriori error estimation is impossible at sub-linear cost even for the small input families of Appendix A

, but next we discuss some heuristic recipes for this task.

(i) Clearly the value for every entry of the LRA error matrix is a lower bound on the norm of this matrix. One can compute entries of by using at most arithmetic operations; this cost is sub-linear if . This deterministic lower bound on the LRA error norm also implies its a posteriori randomized upper bounds if, say, the error matrix is filled with i.i.d. values of a single random variable and has sufficiently many entries, e.g., 100 entries or more (see Appendix B).

(ii) By generalizing this technique we obtain deterministic lower bounds on the error norm from the norms , or for any pair of matrices and . The computation also defines randomized upper bounds on the error norm for random matrices and/or and sufficiently large integers and/or (see [F79]) and runs at sub-linear cost if , and if the matrices and are sufficiently sparse, e.g., are sub-permutation matrices.

(iii) Suppose that we have computed a set of LRA for a matrix , by applying to it various LRA algorithms running at sub-linear cost or the same algorithm with distinct parameters, e.g., the algorithm of [TYUC17] with sparse multipliers and . Furthermore suppose that we have computed a median for some in the range . Then we can consider this median a heuristic LRA and the norms for heuristic error bounds.

For heuristic computation of a median at sub-linear cost one may select the median index for the product over all subscripts in the range for a pair of fixed common vectors and . For more dependable selection one may first compute such medians for a number of pairs of vectors and and then select a median of these medians.

(iv) In Appendix B we recall the well-known techniques for a posteriori error estimation at sub-linear cost for LRA of a matrix filled with i.i.d. values of a single variable.

All these recipes for a posteriori error estimation at sub-linear cost apply to any LRA. In [PLSZa] and [PLSZb] we deduce such estimates for LRA output by some specific sub-linear cost algorithms. We refer the reader to [B00], [BR03], and [BG06] for such recipe in the case of Adaptive Cross-Approximation iterations for LRA.

6 Randomized iterative refinement of LRA at sub-linear cost by means of refinement of leverage scores

Given a crude but reasonably close LRA such that we can try to refine it at sub-linear cost by extending the random subspace sampling algorithms of [DMM08] by Drineas et al. and of various subsequent papers (see [KS16], [BW17], and the references therein). These algorithms compute CUR LRA of a matrix at sub-linear cost if its top SVD is available. Namely, given the top SVD the algorithm computes at sub-linear cost the so called leverage scores, which serve a sampling probabilities that define random subspace sampling at sub-linear cost.

Given -top SVD , with one can fix any real , , and then define the SVD-based leverage scores satisfying


(6.1) turns into for for , but for we obtain leverage scores that we can reuse in a small neighborhood of an input matrix.

Remark 6.1.

The leverage scores are defined by the top right singular space but do not depend on the choice of its orthogonal basis; they stay invariant if the matrix

is pre-multiplied by an orthogonal matrix. By virtue of Theorem

6.1 below, they change little in a small perturbation of that space.

We next recall a bound on the perturbation of singular spaces of the -truncation of a matrix into those of a matrix . The bound is stated in terms of the minimal perturbation of unitary or orthogonal bases for the transition from the singular spaces of to those of . Compare similar results by Davis–Kahan 1970 and Wedin 1972 and 1973 involving angles between singular spaces.

Theorem 6.1.

(The impact of a perturbation of a matrix on its top singular spaces. [GL13, Theorem 8.6.5].)333[GL13, Theorem 8.6.5] cites [S73, Theorem 6.4], which claims a slightly more optimistic bound

Suppose that

Then there exist unitary or orthogonal matrix bases444Recall that such bases are not unique even where all singular values are distinct. , , , and for the left and right top singular spaces (associated with the largest singular values) of the matrices and such that

For example, if , which implies that , and if , then the upper bound on the right-hand side is approximately .

The above observations and estimates combined lead us to a heuristic recipe for iterative refinement of LRA by means of updating the leverage scores of an input matrix.

Namely suppose that an LRA is sufficiently close to the matrix . Then by virtue of Theorem 6.1 the top right singular spaces of the matrix and of its LRA are close to one another, and so the sets of their leverage scores are close to one another whp [DMM08]. At this point we propose to fix , to compute the leverage scores of the LRA at sub-linear cost, and then to apply the algorithm of [DMM08] to the matrix by using these leverage scores. For a proper choice the new LRA should improve whp if the norm is small enough. Recursively we arrive at a heuristic recipe for randomized iterative refinement of a crude but sufficiently close LRA.

We leave as research challenges (i) the relevant estimates for an upper bound on the norm that would imply its decrease, say, by twice with a probability at least 1/2 and (ii) numerical tests that would support these estimates.

It is not clear how large should be samples of columns and rows of in the tests. Large overhead constants are involved in the known upper estimates for the numbers of row and column samples in the subspace sampling algorithms of [DMM08] and all subsequent papers using leverage scores, but the tests in [DMM08] have succeeded with much fewer samples.

Remark 6.2.

The randomized algorithms of [DMM08] compute nearly optimal LRA at nearly optimal cost whp provided that they sample fairly many rows and columns of an input matrix according to leverage scores of (6.1). [TYUC17, Section 1.73] discusses deficiency of LRA based on sampling many rows and columns; our Algorithm 3.1 provides some partial remedy.

Remark 6.3.

One can refine a crude basis for -top right singular subspace of an matrix by applying orthogonal iterations (cf. [GL13, Sections 8.2.3 and 8.3.7]), but each such an iteration involves memory cells and flops, so it does not run at sub-linear cost.

7 Numerical Experiments for Algorithm 4.1

In this subsection we present the test results for Algorithm 4.1 on inputs of four types made up of synthetic and real-world data with various spectra. Our figures display the relative error ratio


where denotes the input matrix, denotes its approximation output by Algorithm 4.1, denotes the -top SVD of , and is a rank value pre-determined for each input matrix. Unless the output of the algorithm was corrupted by rounding errors, the ratio was not supposed to be noticeably exceeded by 1, and it was not in our experiments.

The algorithm was implemented in Python, and we run all experiments on a 64bit MacOS Sierra 10.12.6 machine with 1.6GHz CPU and 4GB Memory. We called scipy.linalg version 0.4.9 for numerical linear algebra routines such as QR factorization with pivoting, Moore-Penrose matrix inversion, and linear least squares regression.

Synthetic Input: We used random synthetic input matrices of two kinds – with rapidly and slowly decaying spectra. In both types we generated these matrices as products , where and were the matrices of the left and right singular vectors of a random Gaussian matrix. In the case of rapidly decaying spectrum we let , where for , for , and for . In the case of slowly decaying spectrum we let , where for , and for .

Real-wold Input: The input matrices of this category were dense matrices with real values having low numerical rank. They represent discretization of Integral Equations, provided in the built-in problems of the Regularization Tools555See and For more details see Ch4 of . We used the two test matrices. One of them, called gravity, came from a one-dimensional gravity surveying model problem and another one, called shaw, came from a one-dimensional image restoration model problem. Their distribution of singular values is displayed in figure 1

; we padded these matrices with 0s in order to increase their size to


Figure 1: Spectrums of real world input matrices

Sub-algorithm: Our sub-algorithm modifies the sketching approach of [TYUC17]. Namely, in the th iteration step we drew two multipliers and , then approximated the residual by , and finally computed the th approximation . In our tests, we used the abridged Hadamard multipliers defined in [PLSZ16] and [PLSZ17] and having size and recursion depth 3. We also included results of similar tests with Gaussian multipliers for comparison (cf. [TYUC17]).

For each input matrix, we performed 100 times the iterative refinement algorithm 4.1 and recorded the mean relative error ratio for every iteration step in figure 2.

Figure 2: Test result for Algorithm 4.1

We notice that the abridged SRHT multipliers performed similarly to Gaussian multipliers in our tests, and are only slightly worse in few places. This minor deterioration of the output accuracy was a reasonable price for using abridged (very sparse) Hadamard multipliers, with which we only access a small fraction of the input matrix at each iteration step.

It is worth noticing that even in the test with some random input matrices that have slowly decaying spectrum the relative error ratio did not decrease below 1, which could be caused by the “heavy” tail in the spectrum. In our tests with some other inputs which were not “well-mixed” in some sense, it was necessary to increase the recursion depth of the abridged Hadamard multipliers in order to bring the relative error ratio close to 1.


Appendix A Small families of hard inputs for LRA at sub-linear cost

Any sub-linear cost LRA algorithm fails on the following small families of LRA inputs.

Example A.1.

Define the following family of matrices of rank 1 (we call them -matrices): . Also include the null matrix into this family. Now fix any LRA algorithm running sub-linear cost; it does not access the th entry of its input matrices for some pair of and . Therefore it outputs the same approximation of the matrices and , with an undetected error at least 1/2. Apply the same argument to the set of small-norm perturbations of the matrices of the above family and to the sums of the latter matrices with any fixed matrix of low rank. Finally, the same argument shows that a posteriori output errors of an LRA algorithm applied to the same input families cannot be estimated at sub-linear cost.

Appendix B A posteriori error estimation sub-linear cost for LRA of a matrix filled with i.i.d. values of a single variable

In our randomized a posteriori error estimation sub-linear cost below we assume that the error matrix of an LRA has enough entries, say, 100 or more, and that they are the observed i.i.d. values of a single random variable. This is realistic, for example, where the deviation of the matrix from its rank- approximation is due to the errors of measurement or rounding.

In this case the Central Limit Theorem implies that the distribution of the variable is close to Gaussian (see

[EW07]). Fix a pair of integers and such that is large enough (say, exceeds 100), but and ; then apply our tests just to a random submatrix of the error matrix.

Under this policy we compute the error matrix at a dominated arithmetic cost in but still verify correctness with high confidence, by applying the customary rules of

hypothesis testing for the variance of a Gaussian variable.

Namely suppose that we have observed the values of a Gaussian random variable with a mean value and a variance and that we have computed the observed average value and variance

respectively. Then, for a fixed reasonably large , both

converge to 0 exponentially fast as grows to the infinity (see [C46]).

Acknowledgements: Our work has been supported by NSF Grants CCF–1116736, CCF–1563942 and CCF–1733834 and PSC CUNY Award 69813 00 48.


  • [B00] M. Bebendorf, Approximation of Boundary Element Matrices, Numer. Math., 86, 4, 565–589, 2000.
  • [B15] A. Björk, Numerical Methods in Matrix Computations, Springer, New York, 2015.
  • [BG06] M. Bebendorf, R. Grzhibovskis, Accelerating Galerkin BEM for linear elasticity using adaptive cross approximation, Math. Methods Appl. Sci., 29, 1721–-1747, 2006.
  • [BR03] M. Bebendorf, S. Rjasanow, Adaptive Low-Rank Approximation of Collocation Matrices, Computing, 70, 1, 1–24, 2003.
  • [BW17] C. Boutsidis, D. Woodruff, Optimal CUR Matrix Decompositions, SIAM Journal on Computing, 46, 2, 543–589, 2017, DOI:10.1137/140977898.
  • [C46] Harald Cramér, Mathematical Methods of Statistics, Princeton University Press, 575 pages, 1999 (first edition 1946).
  • [DHK06] J. Demmel, Y. Hida, W. Kahan, X. S. Li, S. Mukherjee, E. J. Riedy, Error bounds from extra-precise iterative refinement, ACM Trans. Math. Softw., 32, 2, 325–351, 2006.
  • [DMM08] P. Drineas, M.W. Mahoney, S. Muthukrishnan, Relative-error CUR Matrix Decompositions, SIAM Journal on Matrix Analysis and Applications, 30, 2, 844–881, 2008.
  • [EW07] A. C. Elliott, W. A. Woodward, Statistical Analysis Quick Reference Guidebook: With SPSS Examples, Sage, 2007.
  • [F79] R. Freivalds, Fast probabilistic algorithms, Mathematical Foundations of Computer Science, 57–69, Lecture Notes in Comp. Sci., 74, Springer, Berlin, Heidelberg, 1979.
  • [GL13] G. H. Golub, C. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, Maryland, 2013 (fourth edition).
  • [H02] N. J. Higham, Accuracy and Stability in Numerical Analysis, SIAM, Philadelphia, 2002 (second edition).
  • [HMT11] N. Halko, P. G. Martinsson, J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review, 53, 2, 217–288, 2011.
  • [KS16] N. Kishore Kumar, J. Schneider, Literature Survey on Low Rank Approximation of Matrices, arXiv:1606.06511v1 [math.NA] 21 June 2016.
  • [LPSa] Q. Luan, V. Y. Pan, J. Svadlenka, Low Rank Approximation Directed by Leverage Scores and Computed at Sub-linear Cost, preprint, 2019.
  • [M11] M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends in Machine Learning, NOW Publishers, 3, 2, 2011.
  • [PLSZ16] V. Y. Pan, Q. Luan, J. Svadlenka, L.Zhao, Primitive and Cynical Low Rank Approximation, Preprocessing and Extensions, arXiv 1611.01391 (November 2016).
  • [PLSZ17] V. Y. Pan, Q. Luan, J. Svadlenka, L. Zhao, Superfast Accurate Low Rank Approximation, preprint, arXiv:1710.07946 (October 2017).
  • [PLSZa] V. Y. Pan, Q. Luan, J. Svadlenka, L. Zhao, CUR Low Rank Approximation at Sub-linear Cost, preprint, 2019.
  • [PLSZb] V. Y. Pan, Q. Luan, J. Svadlenka, L. Zhao, Low Rank Approximation at Sub-linear Cost by Means of Subspace Sampling, preprint, 2019.
  • [S73]

    G. W. Stewart, Error and Perturbation Bounds for Subspaces Associated with Certain Eigenvalue Problems,

    SIAM Review, 15, 4, 727–764, 1973.
  • [S98] G. W. Stewart, Matrix Algorithms, Vol. I: Basic Decompositions, SIAM, 1998.
  • [TYUC17] J. A. Tropp, A. Yurtsever, M. Udell, V. Cevher, Practical Sketching Algorithms for Low-rank Matrix Approximation, SIAM J. Matrix Anal., 38,  4, 1454–1485, 2017.