1 Motivation
In recent years, researchers have studied many applications where a large positivesemidefinite (psd) matrix is presented as a series of linear updates. A recurring theme is that we only have space to store a small summary of the psd matrix, and we must use this information to construct an accurate psd approximation with specified rank. Here are two important cases where this problem arises.
Streaming Covariance Estimation.
Suppose that we receive a streamof highdimensional vectors. The psd sample covariance matrix of these vectors has the linear dynamics
When the dimension and the number of vectors are both large, it is not possible to store the vectors or the sample covariance matrix. Instead, we wish to maintain a small summary that allows us to compute the rank psd approximation of the sample covariance matrix at a specified instant . This problem and its variants are often called streaming PCA Gilbert et al. (2012); Mitliagkas et al. (2013); Boutsidis et al. (2015); Ghasemi et al. (2016); Jain et al. (2016); Feldman et al. (2016).
Convex LowRank Matrix Optimization with Optimal Storage. A primary application of semidefinite programming (SDP) is to search for a rank psd matrix that satisfies additional constraints. Because of storage costs, SDPs are difficult to solve when the matrix variable is large. Recently, Yurtsever et al. Yurtsever et al. (2017) exhibited the first provable algorithm, called SketchyCGM, that produces a rank approximate solution to an SDP using optimal storage.
Implicitly, SketchyCGM forms a sequence of approximate psd solutions to the SDP via the iteration
The step size , and the vectors do not depend on the matrices . In fact, SketchyCGM only maintains a small summary of the evolving solution . When the iteration terminates, SketchyCGM computes a rank psd approximation of the final iterate using the method described by Tropp et al. (2017, Alg. 9).
1.1 Notation and Background
The scalar field or . Define and . The asterisk is the (conjugate) transpose, and the dagger denotes the Moore–Penrose pseudoinverse. The notation refers to the unique psd square root of a psd matrix . For , the Schatten norm returns the
norm of the singular values of a matrix. As usual,
refers to the th largest singular value.For a nonnegative integer , the phrase “rank” and its variants mean “rank at most .” For a matrix , the symbol denotes a (simultaneous) best rank approximation of the matrix with respect to any Schatten norm. We can take to be any
truncated singular value decomposition (SVD) of
(Higham, 1989, Sec. 6). Every best rank approximation of a psd matrix is psd.2 Sketching and FixedRank PSD Approximation
We begin with a streaming data model for a psd matrix that evolves via a sequence of general linear updates, and it describes a randomized linear sketch for tracking the psd matrix. To compute a fixedrank psd approximation, we develop an algorithm based on the Nyström method Williams and Seeger (2000), a technique from the literature on kernel methods. In contrast to previous approaches, our algorithm uses a distinct mechanism to truncate the rank of the approximation.
The Streaming Model. Fix a rank parameter in the range . Initially, the psd matrix equals a known psd matrix . Then evolves via a series of linear updates:
(2.1) 
In many applications, the innovation is lowrank and/or sparse. We assume that the evolving matrix always remains psd. At one given instant, we must produce an accurate rank approximation of the psd matrix induced by the stream of linear updates.
The Sketch. Fix a sketch size parameter in the range . Independent from , we draw and fix a random test matrix
(2.2) 
See Sec. 3 for a discussion of possible distributions. The sketch of the matrix takes the form
(2.3) 
The sketch (2.3) supports updates of the form (2.1):
(2.4) 
To find a good rank approximation, we must set the sketch size larger than . But storage costs and computation also increase with . One of our main contributions is to clarify the role of .
Under the model (2.1), it is more or less necessary to use a randomized linear sketch to track Li et al. (2014). For psd matrices, sketches of the form (2.2)–(2.3) appear explicitly in Gittens’s work Gittens (2011, 2013); Gittens and Mahoney (2016). Tropp et al. (2017) relies on a more complicated sketch developed in Woolfe et al. (2008); Clarkson and Woodruff (2009).
The Nyström Approximation. The Nyström method is a general technique for lowrank psd matrix approximation. Various instantiations appear in the papers Williams and Seeger (2000); Platt (2005); Fowlkes et al. (2004); Drineas and Mahoney (2005); Halko et al. (2011); Gittens (2011); Chiu and Demanet (2013); Gittens (2013); Gittens and Mahoney (2016); Li et al. (2017).
Here is the application to the present situation. Given the test matrix and the sketch , the Nyström method constructs a rank psd approximation of the psd matrix via the formula
(2.5) 
In most work on the Nyström method, the test matrix depends adaptively on , so these approaches are not valid in the streaming setting. Gittens’s framework Gittens (2011, 2013); Gittens and Mahoney (2016) covers the streaming case.
FixedRank Nyström Approximation: Prior Art. To construct a Nyström approximation with exact rank from a sketch of size , the standard approach is to truncate the center matrix to rank :
(2.6) 
The truncated Nyström approximation (2.6) appears in the many papers, including Platt (2005); Drineas and Mahoney (2005); Chiu and Demanet (2013); Gittens and Mahoney (2013). We have found (Sec. 5) that the truncation method (2.6) performs poorly in the present setting. This observation motivated us to search for more effective techniques.
FixedRank Nyström Approximation: Proposal. The purpose of this paper is to develop, analyze, and evaluate a new approach for fixedrank approximation of a psd matrix under the streaming model. We propose a more intuitive rank approximation:
(2.7) 
That is, we report a best rank approximation of the full Nyström approximation (2.5).
This “matrix nearness” approach to fixedrank approximation appears in the papers Halko et al. (2011); Gu (2015); Tropp et al. (2017). The combination with the Nyström method (2.5) seems totally natural. Even so, we were unable to find a reference after an exhaustive literature search and inquiries to experts on this subject.
Summary of Contributions. This paper contains a number of advances over the prior art:

We propose a distinct technique (2.7) for truncating the Nyström approximation to rank . This formulation differs from earlier work on fixedrank Nyström approximations.
Psd matrix approximation is a ubiquitous problem, so we expect these results to have a broad impact.
Related Work. Randomized algorithms for lowrank matrix approximation were proposed in the late 1990s and developed into a technology in the 2000s; see Halko et al. (2011); Mahoney (2011); Woodruff (2014) for more background. In the absence of constraints, such as streaming, we recommend the generalpurpose methods from Halko et al. (2011); Li et al. (2017).
Algorithms for lowrank matrix approximation in the important streaming data setting are discussed in Woolfe et al. (2008); Clarkson and Woodruff (2009); Halko et al. (2011); Gilbert et al. (2012); Woodruff (2014); Cohen et al. (2015); Boutsidis et al. (2016); Tropp et al. (2017). Few of these methods are designed for psd matrices.
Nyström methods for lowrank psd matrix approximation appear in Williams and Seeger (2000); Platt (2005); Fowlkes et al. (2004); Drineas and Mahoney (2005); Halko et al. (2011); Gittens (2011); Kumar et al. (2012); Yang et al. (2012); Gittens (2013); Gittens and Mahoney (2016); Tropp et al. (2017). These works mostly concern kernel matrices; they do not focus on the streaming model.
We are only aware of a few papers Gittens (2011, 2013); Gittens and Mahoney (2016); Tropp et al. (2017) on algorithms for psd matrix approximation that operate under the streaming model (2.1). These papers form the comparison group.
Finally, let us mention two very recent theoretical papers Clarkson and Woodruff (2017); Musco and Woodruff (2017) that present existential results on algorithms for fixedrank psd matrix approximation. The approach in Clarkson and Woodruff (2017) is only appropriate for sparse input matrices, while the work Musco and Woodruff (2017) is not valid in the streaming setting.
3 Implementation
Distributions for the Test Matrix. To ensure that the sketch is informative, we must draw the test matrix (2.2) at random from a suitable distribution. The choice of distribution determines the computational requirements for the sketch (2.3), the linear updates (2.4), and the matrix approximation (2.7). It also affects the quality of the approximation (2.7). Let us outline some of the most useful distributions. An exhaustive discussion is outside the scope of our work, but see Liberty (2009); Halko et al. (2011); Mahoney (2011); Gittens (2013); Gittens and Mahoney (2016); Woodruff (2014); Tropp et al. (2017).
Isotropic Models. Mathematically, the most natural model is to construct a test matrix whose range is a uniformly random dimensional subspace in . There are two approaches:

Gaussian. Draw each entry of the matrix
independently at random from the standard normal distribution on
. 
Orthonormal. Draw a Gaussian matrix , as above. Compute a thin orthogonal–triangular factorization to obtain the test matrix . Discard .
Gaussian and orthonormal test matrices both require storage of floatingpoint numbers in for the test matrix and another floatingpoint numbers for the sketch . In both cases, the cost of multiplying a vector in into is floatingpoint operations.
For isotropic models, we can analyze the approximation (2.7) in detail. In exact arithmetic, Gaussian and isotropic test matrices yield identical Nyström approximations (Proposition A.2
). In floatingpoint arithmetic, orthonormal matrices are more stable for large
, but we can generate Gaussian matrices with less arithmetic and communication. References for isotropic test matrices include Martinsson et al. (2011); Halko et al. (2011); Gu (2015).Subsampled Scrambled Fourier Transform (SSFT).
One shortcoming of the isotropic models is the cost of storing the test matrix and the cost of multiplying a vector into the test matrix. We can often reduce these costs using an SSFT test matrix. An SSFT takes the form(3.1) 
The are independent, signed permutation matrices,^{1}^{1}1A signed permutation has exactly one nonzero entry in each row and column; the nonzero has modulus one. chosen uniformly at random. The matrix is a discrete Fourier transform () or a discrete cosine transform (). The matrix is a restriction to coordinates, chosen uniformly at random.
An SSFT requires only storage, but the sketch still requires storage of numbers. We can multiply a vector in into using arithmetic operations via an FFT or FCT algorithm. Thus, for most choices of sketch size , the SSFT improves over the isotropic models.
In practice, the SSFT yields matrix approximations whose quality is identical to those we obtain with an isotropic test matrix (Sec. 5). Although the analysis for SSFTs is less complete, the empirical evidence confirms that the theory for isotropic models also offers excellent guidance for SSFTs. References for SSFTs and related test matrices include Woolfe et al. (2008); Ailon and Chazelle (2009); Liberty (2009); Halko et al. (2011); Tropp (2011); Boutsidis and Gittens (2013); Cohen et al. (2016).
Numerically Stable Implementation. It requires care to compute the fixedrank approximation (2.7). App. B shows that a poor implementation may produce an approximation with 100% error!
Let us outline a numerically stable and very accurate implementation of (2.7), based on an idea from Tygert (2014); Li et al. (2017). Fix a small parameter . Instead of approximating the psd matrix directly, we approximate the shifted matrix and then remove the shift. Here are the steps:

Construct the shifted sketch .

Form the matrix .

Compute a Cholesky decomposition .

Compute by backsubstitution.

Compute the (thin) singular value decomposition .

Form .
The pseudocode addresses some additional implementation details. Related, but distinct, methods were proposed by Williams & Seeger Williams and Seeger (2000) and analyzed in Gittens’s thesis Gittens (2013).
Pseudocode. We present detailed pseudocode for the sketch (2.2)–(2.4) and the implementation of the fixedrank psd approximation (2.7) described above. For simplicity, we only elaborate the case of a random orthonormal test matrix; we have also developed an SSFT implementation for empirical testing. The pseudocode uses both mathematical notation and Matlab 2017a functions.
Algorithms and Computational Costs. Algorithm 1 constructs a random orthonormal test matrix, and computes the sketch (2.3) of an input matrix. The test matrix and sketch require the storage of floatingpoint numbers. Owing to the orthogonalization step, the construction of the test matrix requires floatingpoint operations. For a general input matrix, the sketch requires floatingpoint operations; this cost can be removed by initializing the input matrix to zero.
4 Theoretical Results
Relative Error Bound. Our first result is an accurate bound for the expected Schatten 1norm error in the fixedrank psd approximation (2.7).
Theorem 4.1 (FixedRank Nyström: Relative Error).
Assume . Let be a psd matrix. Draw a test matrix from the Gaussian or orthonormal distribution, and form the sketch . Then the approximation given by (2.5) and (2.7) satisfies
(4.1)  
(4.2) 
The quantity and
. Similar results hold with high probability.
The proof of Theorem 4.1 appears in App. A. In contrast to previous analyses of Nyström methods, Theorem 4.1 yields explicit, sharp constants. As a consequence, the formulae (4.1)–(4.2) offer an a priori mechanism for selecting the sketch size to achieve a desired error bound. In particular, for each ,
Thus, we can attain an arbitrarily small relative error in the Schatten 1norm. In the streaming setting, the scaling is optimal for this result (Ghasemi et al., 2016, Thm. 4.2). Furthermore, it is impossible (Woodruff, 2014, Sec. 6.2) to obtain “pure” relative error bounds in the Schatten norm unless .
The Role of Spectral Decay. To circumvent these limitations, it is necessary to develop a different kind of error bound. Our second result shows that the fixedrank psd approximation (2.7) automatically exploits decay in the spectrum of the input matrix.
Theorem 4.2 (FixedRank Nyström: Spectral Decay).
Instate the notation and assumptions of Theorem 4.1. Then
(4.3)  
(4.4) 
The index ranges over the natural numbers.
The proof of Theorem 4.2 appears in App. A. Here is one way to understand this result. As the index increases, the quantity increases while the rank approximation error decreases. Theorem 4.2 states that the approximation (2.7) automatically achieves the best tradeoff between these two terms. When the spectrum of decays, the rank approximation error may be far smaller than the rank approximation error. In this case, Theorem 4.2 is tighter than Theorem 4.1, although the prediction is more qualitative.
Additional Results. The proofs can be extended to obtain highprobability bounds, as well as results for other Schatten norms or for other test matrices (App. A).
5 Numerical Performance
Experimental Setup. In many streaming applications, such as Yurtsever et al. (2017), it is essential that the sketch uses as little memory as possible and that the psd approximation achieves the best possible error. For the methods we consider, the arithmetic costs of linear updates and psd approximation are roughly comparable. Therefore, we only assess storage and accuracy.
For the numerical experiments, the field except when noted explicitly. Choose a psd input matrix and a target rank . Then fix a sketch size parameter with . For each trial, draw the test matrix from the orthonormal or the SSFT distribution, and form the sketch of the input matrix. Using Algorithm 3, compute the rank psd approximation defined in (2.7). We evaluate the performance using the relative error metric:
(5.1) 
We perform 20 independent trials and report the average error.
We compare our method (2.7) with the standard truncated Nyström approximation (2.6); the best reference for this type of approach is (Gittens and Mahoney, 2013, Sec. 2.2). The approximation (2.6) is constructed from the same sketch as (2.7), so the experimental procedure is identical.
We also consider the sketching method and psd approximation algorithm (Tropp et al., 2017, Alg. 9) based on earlier work from Woolfe et al. (2008); Clarkson and Woodruff (2009); Halko et al. (2011). We implemented this sketch with orthonormal matrices and also with SSFT matrices. The sketch has two different parameters , so we select the parameters that result in the minimum relative error. Otherwise, the experimental procedure is the same.
We apply the methods to representative input matrices; see Figure B.1 for the spectra.
Synthetic Examples. The synthetic examples are diagonal with dimension ; results for larger and nondiagonal matrices are similar. These matrices are parameterized by an effective rank parameter , which takes values in . We compute approximations with rank .

LowRank + PSD Noise. These matrices take the form
The matrix has the distribution; that is, where is standard normal. The parameter controls the signaltonoise ratio. We consider three examples: LowRankLowNoise (), LowRankMedNoise (), LowRankHiNoise ().

Polynomial Decay. These matrices take the form
The parameter controls the rate of polynomial decay. We consider three examples: PolyDecaySlow (), PolyDecayMed (), PolyDecayFast ().

Exponential Decay. These matrices take the form
The parameter controls the rate of exponential decay. We consider three examples: ExpDecaySlow (), ExpDecayMed (), ExpDecayFast ().
Application Examples. We also consider nondiagonal matrices inspired by the SDP algorithm Yurtsever et al. (2017).

PhaseRetrieval: This is a psd matrix with dimension . It has exact rank , but its effective rank . We form approximations with rank . The matrix is an approximate solution to a phase retrieval SDP; it was provided by the authors of Yurtsever et al. (2017).
Experimental Results. Figures 5.1–5.2 display the performance of the three fixedrank psd approximation methods for a subcollection of the input matrices. The vertical axis is the Schatten norm relative error (5.1). The variable on the horizontal axis is proportional to the storage required for the sketch only. For the Nyströmbased approximations (2.6)–(2.7), we have the correspondence . For the approximation (Tropp et al., 2017, Alg. 9), we set .
The experiments demonstrate that the proposed method (2.7) has a significant benefit over the alternatives for input matrices that admit a good lowrank approximation. It equals or improves on the competitors for almost all other examples and storage budgets. App. B contains additional numerical results; these experiments only reinforce the message of Figures 5.1–5.2.
Conclusions. This paper makes the case for using the proposed fixedrank psd approximation (2.7) in lieu of the alternatives (2.6) or (Tropp et al., 2017, Alg. 9). Theorem 4.1 shows that the proposed fixedrank psd approximation (2.7) can attain any prescribed relative error, and Theorem 4.2 shows that it can exploit spectral decay. Furthermore, our numerical work demonstrates that the proposed approximation improves (almost) uniformly over the competitors for a range of examples. These results are timely because of the recent arrival of compelling applications, such as Yurtsever et al. (2017), for sketching psd matrices.
Acknowledgments. The authors wish to thank Mark Tygert and Alex Gittens for helpful feedback on preliminary versions of this work. JAT gratefully acknowledges partial support from ONR Award N000141712146 and the Gordon & Betty Moore Foundation. VC and AY were supported in part by the European Commission under Grant ERC Future Proof, SNF 200021146750, and SNF CRSII2147633. MU was supported in part by DARPA Award FA87501720101.
Appendix A Details of the Theoretical Analysis
This appendix contains a new theoretical analysis of the simple Nyström approximation (2.5) and the proposed fixedrank Nyström approximation (2.7).
a.1 Best Approximation in Schatten Norms
Let us introduce compact notation for the optimal rank approximation error in the Schatten norm:
(A.1) 
Ordinary singular values correspond to the case .
a.2 Analysis of the Nyström Approximation
The first result gives a very accurate error bound for the basic Nyström approximation with respect to the Schatten 1norm. This estimate is the key ingredient in the proof of Theorem 4.2.
Theorem A.1 (Error in Nyström Approximation).
Assume . Let be a psd matrix. Draw the test matrix from the Gaussian or orthonormal distribution, and form the sketch . Then the rank Nyström approximation determined by (2.5) satisfies the error bound
(A.2) 
The index ranges over natural numbers. The quantity and . The optimal rank Schatten 1norm approximation error is defined in (A.1).
Let us situate Theorem A.1 with respect to the results in Gittens’s work Gittens (2013); Gittens and Mahoney (2016). Gittens develops error bounds for the Nyström approximation (2.5) that hold with high probability, rather than in expectation. He measures errors in the Schatten norm for . He also obtains results for several types of test matrices, including isotropic models and a relative of the SSFT. In contrast to Theorem A.1, Gittens’s bounds are more complicated, and the constants are much larger.
a.3 Proof of Theorem a.1
We begin with the proof of Theorem A.1. Gittens Gittens (2011, 2013); Gittens and Mahoney (2016) uses a related argument to obtain bounds on the probability that the Nyström approximation achieves a given error.
The first step is to write the Nyström approximation in terms of an orthogonal projector. This expression allows us to exploit the analysis from Halko et al. (2011); Tropp et al. (2017).
Proposition A.2 (Representation of Nyström Approximation).
Let be the orthogonal projector onto :
(A.3) 
Then the Nyström approximation (2.5) can be expressed as
(A.4) 
In particular, the Nyström approximation only depends on through .
Proof.
This argument follows from a direct calculation:
To reach the last line, we identified the orthogonal projector (A.3). ∎
We may assume that is a Gaussian matrix because the reconstruction only depends on . The range of a random orthonormal matrix has the same distribution as a Gaussian matrix up to a set of measure zero.
Let be the orthogonal projector (A.3). In view of the formula (A.4) for , we have
(A.5) 
We can now express the Schatten 1norm of the error in terms of the Schatten 2norm:
The first identity follows from (A.5) and the fact that the orthogonal projector is idempotent.
Fix a natural number . We can use established results from the literature to control the expectation of the error. In particular, we invoke a slight generalization (Tropp et al., 2017, Fact. 8.3) of a result (Halko et al., 2011, Thm. 10.5) of Halko et al. We arrive at the bound
Combine the last two displays and minimize over eligible to complete the argument.
Remark A.3 (SpectralNorm Error).
When , we can also obtain a spectralnorm error bound by combining this argument with another result (Halko et al., 2011, Thm. 10.6) of Halko et al.:
It takes a surprising amount of additional work to obtain an accurate bound for the first moment of the error (instead of the 1/2 moment). We have chosen not to include this argument.
Remark A.4 (HighProbability Bounds).
Remark A.5 (Other Test Matrices).
As noted by Gittens Gittens (2013); Gittens and Mahoney (2016), we can obtain results for other types of test matrices by replacing parts of the analysis that depend on Gaussian matrices. These changes result in bounds that are quantitatively and qualitatively worse. The numerical evidence suggests that many types of test matrices have the same empirical performance, so we omit this development.
a.4 Theorem 4.2: Schatten 1Norm Bound
Let us continue with the proof of the Schatten 1norm bound (4.3) from Theorem 4.2. We require a basic result on rank approximation adapted from (Tropp et al., 2017, Prop. 7.1).
Proposition A.6 (FixedRank Projection).
Let and be arbitrary matrices. For each natural number and number ,
Proof.
The argument follows from a short calculation based on the triangle inequality:
In the second line, we have used the fact that is a best rank approximation of . To complete the argument, we identify the last term (A.1) as the best rank approximation error in the Schatten norm. ∎
a.5 Theorem 4.1: Schatten 1Norm Bound
Next, we turn to the proof of the Schatten 1norm bound (4.1) from Theorem 4.1. This argument is based on the same approach as Theorem A.1, but we require several additional ingredients from Halko et al. (2011); Gittens (2013); Gu (2015); Tropp et al. (2017).
As before, we may assume that is Gaussian. With probability one, the nonzero eigenvalues of are all distinct, so the best rank approximation of is determined uniquely.
Let be the orthogonal projector (A.3). According to (A.4), the Nyström approximation takes the form
Let denote the orthogonal projector onto the range of . Using the (truncated) SVD of the matrix , we can verify that the best rank approximation of satisfies
As in the proof of Theorem A.1, the Schatten 1norm of the error satisfies
Since , we can rewrite this expression as
The last identity holds because . A direct application of Gu’s result (Gu, 2015, Thm. 3.5) yields
A direct application of the result (Tropp et al., 2017, Prop. 9.2) shows that
As before, we note that
Taking an expectation and sequencing these displays, we arrive at
This is the stated result (4.1).
a.6 Theorems 4.1 and 4.2: Schatten Norm Bounds
Last, we develop the bounds (4.2) and (4.4) on the Schatten norm of the fixedrank psd approximation (2.7) using a formal argument. We require the following result.
Proposition A.8 (Reversed Eckart–Young).
Let be matrices, and assume that . Then
Proof.
As a consequence of Weyl’s inequalities (Bhatia, 1997, Thm. III.2.1), we have the bound
(A.6) 
The last identity holds because . It follows that
The first expression is the representation of the Schatten 1norm in terms of singular values. The inequality is (A.6). Finally, we identify the best Schatten 1norm error from (A.1). ∎
Appendix B Supplemental Numerics
This appendix documents additional numerical work. These experiments provide a more complete picture of the performance of the psd approximation methods.

Figures B.2–B.10 document the results of numerical experiments for the remaining parameter regimes outlined in Sec. 5. In particular, we consider all Schatten norm relative error measures for and all effective rank parameters for the synthetic data. We omit the case because the plots are uninformative.

Figure B.11 gives evidence about the numerical challenges involved in implementing Nyström approximations, such as (2.7). Our implementation in Algorithm 3 is based on the Nyström approximation routine eigenn released by Tygert Tygert (2014) to accompany the paper Li et al. (2017). We compare with another implementation strategy described in the text of the same paper (Li et al., 2017, Eqn. (13)). It is surprising to discover very different levels of precision in two implementations designed by professional numerical analysts.
References
 Ailon and Chazelle [2009] N. Ailon and B. Chazelle. The fast JohnsonLindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., 39(1):302–322, 2009.
 Bhatia [1997] R. Bhatia. Matrix analysis. SpringerVerlag, New York, 1997.
 Boutsidis and Gittens [2013] C. Boutsidis and A. Gittens. Improved matrix algorithms via the subsampled randomized Hadamard transform. SIAM J. Matrix Anal. Appl., 34(3):1301–1340, 2013.

Boutsidis et al. [2015]
C. Boutsidis, D. Garber, Z. Karnin, and E. Liberty.
Online principal components analysis.
In Proc. 26th Ann. ACMSIAM Symp. Discrete Algorithms (SODA), pages 887–901, 2015. 
Boutsidis et al. [2016]
C. Boutsidis, D. Woodruff, and P. Zhong.
Optimal principal component analysis in distributed and streaming
models.
In
Proc. 48th ACM Symp. Theory of Computing (STOC)
, 2016.  Chiu and Demanet [2013] J. Chiu and L. Demanet. Sublinear randomized algorithms for skeleton decompositions. SIAM J. Matrix Anal. Appl., 34(3):1361–1383, 2013.
 Clarkson and Woodruff [2017] K. Clarkson and D. Woodruff. Lowrank PSD approximation in inputsparsity time. In Proc. 28th Ann. ACMSIAM Symp. Discrete Algorithms (SODA), pages 2061–2072, Jan. 2017.
 Clarkson and Woodruff [2009] K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In Proc. 41st ACM Symp. Theory of Computing (STOC), 2009.

Cohen et al. [2015]
M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu.
Dimensionality reduction for kmeans clustering and low rank approximation.
In Proc. 47th ACM Symp. Theory of Computing (STOC), pages 163–172. ACM, New York, 2015.  Cohen et al. [2016] M. B. Cohen, J. Nelson, and D. P. Woodruff. Optimal Approximate Matrix Product in Terms of Stable Rank. In 43rd Int. Coll. Automata, Languages, and Programming (ICALP), volume 55, pages 11:1–11:14, 2016.
 Davis and Hu [2011] T. A. Davis and Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 3(1):1:1–1:25, 2011.
 Drineas and Mahoney [2005] P. Drineas and M. W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernelbased learning. J. Mach. Learn. Res., 6:2153–2175, 2005.
 Feldman et al. [2016] D. Feldman, M. Volkov, and D. Rus. Dimensionality reduction of massive sparse datasets using coresets. In Adv. Neural Information Processing Systems 29 (NIPS), 2016.
 Fowlkes et al. [2004] C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell., 26(2):214–225, Jan. 2004.
 Ghasemi et al. [2016] M. Ghasemi, E. Liberty, J. M. Phillips, and D. P. Woodruff. Frequent directions: Simple and deterministic matrix sketching. SIAM J. Comput., 45(5):1762–1792, 2016.
 Gilbert et al. [2012] A. C. Gilbert, J. Y. Park, and M. B. Wakin. Sketched SVD: Recovering spectral features from compressed measurements. Available at http://arXiv.org/abs/1211.0361, Nov. 2012.
 Gittens [2011] A. Gittens. The spectral norm error of the naïve Nyström extension. Available at http:arXiv.org/abs/1110.5305, Oct. 2011.
 Gittens [2013] A. Gittens. Topics in Randomized Numerical Linear Algebra. PhD thesis, California Institute of Technology, 2013.

Gittens and Mahoney [2013]
A. Gittens and M. W. Mahoney.
Revisiting the Nyström method for improved largescale machine learning.
Available at http://arXiv.org/abs/1303.1849, Mar. 2013.  Gittens and Mahoney [2016] A. Gittens and M. W. Mahoney. Revisiting the Nyström method for improved largescale machine learning. J. Mach. Learn. Res., 17:Paper No. 117, 65, 2016.
 Goemans and Williamson [1995] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach., 42(6):1115–1145, 1995.
 Gu [2015] M. Gu. Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput., 37(3):A1139–A1173, 2015.
 Halko et al. [2011] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
 Higham [1989] N. J. Higham. Matrix nearness problems and applications. In Applications of matrix theory (Bradford, 1988), pages 1–27. Oxford Univ. Press, New York, 1989.
 Jain et al. [2016] P. Jain, C. Jin, S. M. Kakade, P. Netrapalli, and A. Sidford. Streaming PCA: Matching matrix Bernstein and nearoptimal finite sample guarantees for Oja’s algorithm. In 29th Ann. Conf. Learning Theory (COLT), pages 1147–1164, 2016.
 Kumar et al. [2012] S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the Nyström method. J. Mach. Learn. Res., 13:981–1006, Apr. 2012.
 Li et al. [2017] H. Li, G. C. Linderman, A. Szlam, K. P. Stanton, Y. Kluger, and M. Tygert. Algorithm 971: An implementation of a randomized algorithm for principal component analysis. ACM Trans. Math. Softw., 43(3):28:1–28:14, Jan. 2017.
 Li et al. [2014] Y. Li, H. L. Nguyen, and D. P. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In Proc. 2014 ACM Symp. Theory of Computing (STOC), pages 174–183. ACM, 2014.
 Liberty [2009] E. Liberty. Accelerated dense random projections. PhD thesis, Yale Univ., New Haven, 2009.
 Mahoney [2011] M. W. Mahoney. Randomized algorithms for matrices and data. Found. Trends Mach. Learn., 3(2):123–224, 2011.
 Martinsson et al. [2011] P.G. Martinsson, V. Rokhlin, and M. Tygert. A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmon. Anal., 30(1):47–68, 2011.
 Mitliagkas et al. [2013] I. Mitliagkas, C. Caramanis, and P. Jain. Memory limited, streaming PCA. In Adv. Neural Information Processing Systems 26 (NIPS), pages 2886–2894, 2013.
 Musco and Woodruff [2017] C. Musco and D. Woodruff. Sublinear time lowrank approximation of positive semidefinite matrices. Available at http://arXiv.org/abs/1704.03371, Apr. 2017.

Platt [2005]
J. C. Platt.
FastMap, MetricMap, and Landmark MDS are all Nyström
algorithms.
In
Proc. 10th Int. Workshop Artificial Intelligence and Statistics (AISTATS)
, pages 261–268, 2005.  Tropp [2011] J. A. Tropp. Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal., 3(12):115–126, 2011.
 Tropp et al. [2017] J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher. Randomized singleview algorithms for lowrank matrix approximation. ACM Report 201701, Caltech, Pasadena, Jan. 2017. Available at http://arXiv.org/abs/1609.00048, v1.
 Tygert [2014] M. Tygert. Beta versions of Matlab routines for principal component analysis. Available at http://tygert.com/software.html, 2014.
 Williams and Seeger [2000] C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In Adv. Neural Information Processing Systems 13 (NIPS), 2000.
 Woodruff [2014] D. P. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci., 10(12):iv+157, 2014.
 Woolfe et al. [2008] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for the approximation of matrices. Appl. Comput. Harmon. Anal., 25(3):335–366, 2008.
 Yang et al. [2012] T. Yang, Y.F. Li, M. Mahdavi, R. Jin, and Z.H. Zhou. Nyström method vs random Fourier features: A theoretical and empirical comparison. In Adv. Neural Information Processing Systems 25 (NIPS), pages 476–484, 2012.
 Yurtsever et al. [2017] A. Yurtsever, M. Udell, J. A. Tropp, and V. Cevher. Sketchy decisions: Convex lowrank matrix optimization with optimal storage. In Proc. 20th Int. Conf. Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, May 2017.
Comments
There are no comments yet.