Subspace tracking methods are intensively used in statistical and signal processing community. Given observations of a multidimensional signal, one is interested in estimating or tracking a subspace spanning the eigenvectors corresponding to the first largest eigenvalues of the signal covariance matrix. Over the past few decades many variations of the original projection approximation subspace tracking (PAST) method were developed which found applications in data compression, filtering, speech enhancement, etc. (see  and references therein). Despite popularity of the subspace tracking methods, only partial results are known about their convergence. The asymptotic convergence of the PAST algorithm was first established in [3, 4]
using a general theory of stability for ordinary differential equations. However, no finite sample error bounds are available in the literature. Furthermore, in the case of a high-dimensional signal the empirical covariance matrix estimator performs poorly if the number of observations is small. A common way to improve the estimation quality in this case is to impose some kind of sparsity assumptions on the signal itself or on the eigensubspace of the underlying covariance matrix. In a sparse modification of the orthogonal iteration scheme for a fixed number of observations was proposed. A thorough analysis in  shows that under appropriate sparsity assumptions on the leading eigenvectors, the orthogonal iteration scheme combined with thresholding allows to perform dimension reduction in high-dimensional setting. Our main goal is to propose a novel modification of constraint projection approximation subspace tracking method (CPAST) , called sparse projection approximation subspace tracking method (SCPAST), which can be used for efficient subspace tracking in the case of high-dimensional sparse signal and small number of available observations. Another contribution of our paper is a non-asymptotic convergence analysis of CPAST and SCPAST algorithms showing the advantage of SCPAST algorithm in the case of sparse covariance structure. Last but not the least, we analyse numerical performance of SCPAST algorithm on simulated and real data. In particular, the problem of tracking the leading subspace of a music signal is considered.
The structure of the paper is as follows. In Section 2 we introduce our observational model and formulate main assumptions. Section 3 first reviews the CPAST algorithm and then provides the non-asymptotic error bounds for CPAST in a ”stationary” case. In Section 4 we introduce our sparse constraint approximation subspace tracking method and prove the non-asymptotic upper bounds for the estimation error. A numerical study of the proposed algorithm is presented in Section 5. Finally the proofs are collected in Section 6.
2 Main setup
One important problem in signal processing is adaptive estimation of a dominant subspace given incoming noisy observations. Specifically one considers a model
where the observations contain the signal
corrupted by a vectorwith independent standard Gaussian components. The signal is usually modelled as
where is a deterministic matrix of rank with and is a random vector in independent of such that and , . Under these assumptions, the process has a covariance matrix which may be decomposed in the following way
where stands for the unit matrix in Note that the matrix has the rank
and by the singular value decomposition (SVD)
where are the eigenvectors of corresponding to the eigenvalues . It follows from (2) that the first eigenvalues of are whereas the remaining eigenvalues are equal to . Since the subspace corresponding to the first eigenvectors of is identifiable. The subspace tracking methods aim to estimate the subspace based on the observations The overall number of observations is assumed to be fixed and known.
Relying on a heuristic assumption of slow (in time) varying, the subspace tracking methods use the following estimator of the covariance matrix (up to scaling)
where is the so-called forgetting factor. The estimator can adapt to the change in by discounting the past observations. In the stationary regime, that is, if is a constant matrix, one would use . It is well known, that in the case of Gaussian independent noise the estimator is consistent.
For the general model (1) and non-stationary case, constrained projection approximation subspace tracking (CPAST) method allows to iteratively compute a matrix , , containing the first leading eigenvectors of the matrix (see (3)) based on sequentially arriving observations , The procedure starts with some initial approximation and consists of the following two steps
multiplication: compute the matrix
orthogonalization: compute an estimator of the matrix containing leading eigenvectors via
In the ”stationary” case () the method may be regarded as the ”online”-version of the orthogonal iterations scheme (see ) for computing the eigen-subspace of the non-negatively definite matrix. With the use of the Sherman-Morrison-Woodbury formula for the inversion at each time one has to perform operations to compute the updated matrix given , and .
3.1 Convergence of CPAST
Throughout this section we consider the stationary case where , , , , , . In this situation one would like to keep all the available information to estimate , that is, to use the estimator (3) for with . For notational simplicity, from now on we skip the dependence on and use the notation for the empirical covariance matrix and for CPAST estimator. Thus in the stationary case the CPAST estimator takes the form
We assume that the random vectors and have independent components for . Under these assumptions the covariance matrix (2) becomes
where is matrix with columns , is diagonal matrix with on the diagonal. Note that the observational model (1) in stationary case can be alternatively written as the so-called spike model
are i.i.d. standard Gaussian random variables independent from.
For our non-asymptotic error analysis of CPAST, we assume that , and , are known. With the known we can always normalize the data and therefore without loss of generality we can assume that .
The typical condition while analyzing the quality of the eigenvectors estimation is the so-called spectral gap condition, which says that the adjacent eigenvectors explain distinguishably different portion of the variance in the data, namely there exists, such that for all
where by definition. Since our goal is the estimation of the -dimensional subspace of the first eigenvectors, and we are not interested in the estimation of each particular eigenvector, we need only the condition for the separation of this -dimensional subspace, namely that the gap between and is sufficiently large:
Define a distance between two subspaces and spanning orthonormal columns and correspondingly via
where the nuclear norm of a matrix is defined as , and and are the matrices in with orthonormal columns.
The next result shows that with high probability the subspace which spans the CPAST estimatoris close, in terms of to the subspace spanning when the number of observation is large enough. We assume that the initial estimator is constructed from first observations by means of the singular value decomposition of .
Suppose that the spectral gap condition (7) holds and
where . Then after iterations we get with probability at least
where are absolute constants and depends on
The second term on the right-hand side of (9) corresponds to the error of separating the first eigenvectors from the rest. The first term is an average error of estimating all components of leading eigenvectors. It originates from the interaction of the noise terms with the different coordinates, see .
4 Sparse CPAST
4.1 Sparsity assumptions on leading eigenvectors
We assume that in the stationary case (5) the first leading eigenvectors , of have most of their entries close to zero. Namely, we suppose that each fulfils the so-called weak- ball condition [9, 10], that is, for some
where is the -th largest coordinate of . The weak- ball condition is known to be more general than ball condition (which is for , , ), as it combines different definitions of sparsity used in statistics, see .
Define a thresholding function with a thresholding parameter and via
For example, the so-called hard-thresholding function given by
and the so-called soft-thresholding function defined as
fulfill the conditions (10). When is a vector with components , , and is a matrix with columns , , we denote by a matrix with the elements , , .
Our primal goal is to propose a subspace tracking method for estimating a -dimensional subspace of the process under weak- ball assumption on the leading eigenvectors of the covariance matrix and to analyze it’s convergence.
4.2 Initialization and main steps
Our sparse modification of CPAST relies on the orthogonal iteration scheme with an additional thresholding step (cf. [5, 10]). From now on, by a slight abuse of notation, we will denote by an iterative estimator obtained with the help of the modified CPAST, given observations. To get the initial approximation we use the following modification of a standard SPCA scheme, see [5, 10].
First compute the empirical covariance based on observations:
Define a set of indices corresponding to large enough diagonal elements of
Let be a submatrix of corresponding to the row and column indices in .
As an estimator at step zero, we take the first eigenvectors of completed with zeros in the coordinates to the vectors of length .
Now we describe a sparse modification of CPAST, which we called SCPAST. We start with obtained by the above procedure. Then for we perform the following steps
thresholding: define a matrix
where is a thresholding function satisfying (10) and is the corresponding thresholding vector;
4.3 Convergence of SCPAST
First we define the thresholding parameter as follows. For and
the components , of the vector are given by
The motivation for thresholding of the column vectors of comes from the following connection between sparsity of the leading eigenvectors , and the vector with the components ( is the variance of the -th coordinate of the signal part ): the weak- sparsity of the vector implies the weak- sparsity of , .
Suppose that and the eigenvalues are known. In the case of unknown and one might first estimate the eigenvalues of defined in the previous section and then select the largest set of eigenvalues satisfying the spectral gap condition (7) with some parameter (see  for more details).
Denote by the set of indices of “large” eigenvectors components ( stands for ”signal”), that is, for a fixed
where and In fact, the quantity is an estimate of the noise variance in the entries of the -th leading eigenvector . The number of “large” entries of the first leading eigenvectors to estimate thus might be estimated by the cardinality of , which we denote by . One can bound as
where . From Lemma 14 (see Appendix B) we see that where depends on and
Note that in the sparse case, the number of non-zero components is much smaller than . For example, if , then
The value is often referred to as an effective dimension of the vector . Thus is the number of effective coordinates of , in the case of disjoint .
Since is an upper-bound for the estimation error for the components of the first leading eigenvectors, the right hand side of the above inequality gives, up to a logarithmic term, the overall number of components of the leading eigenvectors to estimate. The next theorem gives non-asymptotic bounds for a distance between and
where depends on in (7), , , depends on . After iterations one has with probability at least
with some absolute constant
The second term in (15) is the same as in the non-sparse case, see Theorem 1. This term is always present as an error of separating the first eigenvectors from the rest eigenvectors regardless how sparse they are. The first term in (15) and (9) is responsible for the interaction of the noise with different coordinates of the signal. The average error of estimating one entry of the first leading eigenvectors based on observation can be bounded by see . The number of components to be estimated in SCPAST for each vector is bounded by (see (13)), which is small compared to in the sparse case. Thus, the first term in (15) can be significantly smaller than the first one in (9), provided the first leading eigenvectors are sparse. Note also that the computational complexity of SCPAST at each step is with probability given by Theorem (2).
5 Numerical results
5.1 Single spike
To illustrate the advantage of using SCPAST for the sparse case, we generate observations from (6) for the case of a single spike, that is, and . Our aim is to estimate the leading eigenvector . We shall use three functions depicted in subplots (a) of Fig. 1–3 with different sparsity levels in the wavelet domain.
The observations are generated for the noise variance and following cases of maximal eigenvalue . We used the Symmlet 8 basis from the Matlab package SPCALab to transform the initial data into the wavelet domain. We applied CPAST and SCPAST for the recovery of wavelet coefficients of the vector and then transformed the estimates to the initial domain and computed the error depending on the number of observations. The results for the hard thresholding (11) with the are shown in Fig. 1-3 in subplots (b)-(d). Note that one peak function has sparser wavelet coefficients than those of three peak functions and the error of the recovery with SCPAST is significantly smaller for the case of one peak function.
5.2 Real data example
Natural acoustic signals like the musical ones exhibit a highly varying temporal structure, therefore there is a need in adaptive unsupervised methods for signal processing which reduce the complexity of the signal. In 
a method was proposed which reduces the spectral complexity of music signals using the adaptive segmentation of the signal in the spectral domain for the principal component analysis for listeners with cochlear hearing loss. In the following we apply CPAST and SCPAST as an alternative method for the complexity reduction of music signals. To illustrate the use of SCPAST and CPAST we set the memory parameterto be able to adapt to the changes in the spectral domain of the signal. We focus on the first leading eigenvector recovery. As an example we consider a piece from Bach Siciliano for Oboe and Piano. A wavelet-kind CQT-transform  is computed for the signal (see a spectrogram of the transform in Fig. 4). The warmer colors correspond to the higher values of the amplitudes of the harmonics present in the signal at a particular time frame. It is clear that the signal has some regions of “stationarity” (e.g. approximately in time frame interval ). We regard the corresponding spectrogram as a matrix with observations of -dimensional signal modeled by (16) and apply SCPAST and CPAST methods to recover the leading eigenvector .
Fig. 5 contains the results of the recovery of the leading eigenvalue with components. The results show that SCPAST method allows to obtain sparse representation of the leading eigenvectors and seems to be promising for construction of the structure-preserving compressed representations of the signals.
6 Sketch of the proofs
Denote by a matrix with column vectors , which complete the orthonormal columns of the matrix to the orthonormal basis in . Denote by a matrix with the columns . From (6) one gets a representation
where , are matrices with independent entries, is the orthonormal matrix with columns , is a diagonal matrix with , on the diagonal. Denote a set of indices to the small components of leading eigenvectors as (where here stands for “noise”).
From (16) the empirical covariance matrix can be decomposed as
It is well known  that the distance (8) between subspaces and , spanning matrices with orthonormal columns and correspondingly, is related to -th principal angle between subspaces and as where the principal angles between subspaces and are recursively defined as 
From the variational characterization of the singular values and the above definition of the principal angles, the -th principal angle between subspaces spanning the columns of and has the following non-recursive definition
In the next sections we derive the error bounds for CPAST and SCPAST by looking at the change of the -th principal angle between the eigensubspace spanning the columns of and its estimators based on observation, where .
6.1 Bound for CPAST
The aim of this section is to show that with the high probability the subspace which spans CPAST estimator (6.1) is close to the subspace, which spans when the number of observations is large enough. We assume that the initial estimator is constructed from first observations with the help of SVD of . Let us first state the bound for the error which depends on the error on the previous iteration for the fixed . Denote .
For CPAST (4) with probability
The following lemma gives the bound for the error depending on the error of the previous iteration for all .
With probability greater than
Given Lemma 2 it is possible to derive a bound for . First let us state the result which allows to bound the error of the initial estimate .
Let be a matrix containing first leading eigenvectors of the matrix . Then with probability
where with .
The following Lemma gives the error of CPAST after observing vectors , based on the recursive bound (21). Note that the proof of Lemma 4 also insures that the denominator in (21) is bounded away from zero.
Suppose that , where
6.2 Bound for SCPAST
Define , the oracle version of and the corresponding expectation as follows
where and are the sub-matrices of the size with column and row indices from
. The identity matrixhas the size . Here we assumed without loss of generality that indices in are always smaller than ones in .
First we obtain the oracle sequence of the solutions by iterating SCPAST with matrices instead of . We define the initial estimate with the steps (a)-(d) in the section 6.1 applied to the matrix . And then bound . Denote the result of the thresholding the columns of the matrix with the thresholding parameters given by the vector as
Denote the submatrix obtained by selecting the rows of with indices in . Denote by , the rows of (recall that the columns are , ). For the estimators of we omit the dependence of on as the estimator itself depends on , that is, is a matrix of the rows of with indices from .
The following bound for the oracle error of SCPAST method is analogous to Lemma 2.
For with probability greater than the following bound holds true
, where and are constants and depends on , , , .
In the sparse case the similar result to Lemma 3 holds true giving a bound on the error of the initial oracle estimator.
The error of initial oracle estimation is bounded as follows with probability , where
where and are constants and depends on .
Thus after iterations, , with the probability one has
where depends on , , , , and depends on .
The convergence of the oracle scheme doesn’t immediately imply the convergence of the SCPAST estimators. The following two lemmas state that with the high probability . Thus the bound in Lemma 7 holds for SCPAST and the Theorem 2 is justified.
For with probability the initial oracle estimate coincide with the initial SPCA estimate, that is, .
With probability for the oracle SCPAST and SCPAST solutions coincide .
We developed a new method SCPAST based on constraint projection approximation subspace tracking method for subspace tracking in the sparsity assumptions on the underlying signal eigen subspace. The thesholding step was introduced in order to ensure the sparsity of the solution. We presented the non-asymptotical bounds for the errors of subspace recovery with SCPAST and CPAST as well as the empirical studies of the methods. The results of experiments show that SCPAST method allows to obtain sparse representation of the leading eigenvector of music signals and might be used for adaptive compression of the musical signal in the spectral domain.
Appendix A. Proofs of Lemmas
Proof of Lemma 1
From the definition (19) . Using the latter maximum can be bounded by