# Sparse constrained projection approximation subspace tracking

In this paper we revisit the well-known constrained projection approximation subspace tracking algorithm (CPAST) and derive, for the first time, non-asymptotic error bounds. Furthermore, we introduce a novel sparse modification of CPAST which is able to exploit sparsity in the underlying covariance structure. We present a non-asymptotic analysis of the proposed algorithm and study its empirical performance on simulated and real data.

## Authors

• 13 publications
• 4 publications
• ### A Short Note on Improved ROSETA

This note presents a more efficient formulation of the robust online sub...
10/16/2017 ∙ by Hassan Mansour, et al. ∙ 0

• ### Fast L1-Minimization Algorithm for Sparse Approximation Based on an Improved LPNN-LCA framework

The aim of sparse approximation is to estimate a sparse signal according...
05/30/2018 ∙ by Hao Wang, et al. ∙ 0

• ### Minimax sparse principal subspace estimation in high dimensions

We study sparse principal components analysis in high dimensions, where ...
11/02/2012 ∙ by Vincent Q. Vu, et al. ∙ 0

• ### Online Low-Rank Subspace Learning from Incomplete Data: A Bayesian View

Extracting the underlying low-dimensional space where high-dimensional s...
02/11/2016 ∙ by Paris V. Giampouras, et al. ∙ 0

• ### Weighted Sparse Subspace Representation: A Unified Framework for Subspace Clustering, Constrained Clustering, and Active Learning

Spectral-based subspace clustering methods have proved successful in man...
06/08/2021 ∙ by Hankui Peng, et al. ∙ 0

• ### ROAST: Rapid Orthogonal Approximate Slepian Transform

In this paper, we provide a Rapid Orthogonal Approximate Slepian Transfo...
12/13/2017 ∙ by Zhihui Zhu, et al. ∙ 0

• ### Fast Algorithms for Delta-Separated Sparsity Projection

We describe a fast approximation algorithm for the Δ-separated sparsity ...
12/18/2017 ∙ by Henning Bruhn, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Subspace tracking methods are intensively used in statistical and signal processing community. Given observations of a multidimensional signal, one is interested in estimating or tracking a subspace spanning the eigenvectors corresponding to the first largest eigenvalues of the signal covariance matrix. Over the past few decades many variations of the original projection approximation subspace tracking (PAST) method

[1] were developed which found applications in data compression, filtering, speech enhancement, etc. (see [2] and references therein). Despite popularity of the subspace tracking methods, only partial results are known about their convergence. The asymptotic convergence of the PAST algorithm was first established in [3, 4]

using a general theory of stability for ordinary differential equations. However, no finite sample error bounds are available in the literature. Furthermore, in the case of a high-dimensional signal the empirical covariance matrix estimator performs poorly if the number of observations is small. A common way to improve the estimation quality in this case is to impose some kind of sparsity assumptions on the signal itself or on the eigensubspace of the underlying covariance matrix. In

[5] a sparse modification of the orthogonal iteration scheme for a fixed number of observations was proposed. A thorough analysis in [5] shows that under appropriate sparsity assumptions on the leading eigenvectors, the orthogonal iteration scheme combined with thresholding allows to perform dimension reduction in high-dimensional setting. Our main goal is to propose a novel modification of constraint projection approximation subspace tracking method (CPAST) [6], called sparse projection approximation subspace tracking method (SCPAST), which can be used for efficient subspace tracking in the case of high-dimensional sparse signal and small number of available observations. Another contribution of our paper is a non-asymptotic convergence analysis of CPAST and SCPAST algorithms showing the advantage of SCPAST algorithm in the case of sparse covariance structure. Last but not the least, we analyse numerical performance of SCPAST algorithm on simulated and real data. In particular, the problem of tracking the leading subspace of a music signal is considered.

The structure of the paper is as follows. In Section 2 we introduce our observational model and formulate main assumptions. Section 3 first reviews the CPAST algorithm and then provides the non-asymptotic error bounds for CPAST in a ”stationary” case. In Section 4 we introduce our sparse constraint approximation subspace tracking method and prove the non-asymptotic upper bounds for the estimation error. A numerical study of the proposed algorithm is presented in Section 5. Finally the proofs are collected in Section 6.

## 2 Main setup

One important problem in signal processing is adaptive estimation of a dominant subspace given incoming noisy observations. Specifically one considers a model

 x(t)=s(t)+σ(t)ξ(t),t=1,…,T, (1)

where the observations contain the signal

corrupted by a vector

with independent standard Gaussian components. The signal is usually modelled as

 s(t)=A(t)η(t),

where is a deterministic matrix of rank with and is a random vector in independent of such that and , . Under these assumptions, the process has a covariance matrix which may be decomposed in the following way

 Σ(t)=E[x(t)x⊤(t)]=A(t)A⊤(t)+σ2(t)In, (2)

where stands for the unit matrix in Note that the matrix has the rank

and by the singular value decomposition (SVD)

 A(t)A⊤(t) =d∑i=1λi(t)vi(t)v⊤i(t),

where are the eigenvectors of corresponding to the eigenvalues . It follows from (2) that the first eigenvalues of are whereas the remaining eigenvalues are equal to . Since the subspace corresponding to the first eigenvectors of is identifiable. The subspace tracking methods aim to estimate the subspace based on the observations The overall number of observations is assumed to be fixed and known.

Relying on a heuristic assumption of slow (in time) varying

, the subspace tracking methods use the following estimator of the covariance matrix (up to scaling)

 ˆΣγ(t)=t∑i=0γt−ix(i)x⊤(i), (3)

where is the so-called forgetting factor. The estimator can adapt to the change in by discounting the past observations. In the stationary regime, that is, if is a constant matrix, one would use . It is well known, that in the case of Gaussian independent noise the estimator is consistent.

## 3 Cpast

For the general model (1) and non-stationary case, constrained projection approximation subspace tracking (CPAST) method allows to iteratively compute a matrix , , containing the first leading eigenvectors of the matrix (see (3)) based on sequentially arriving observations , The procedure starts with some initial approximation and consists of the following two steps

• multiplication: compute the matrix

 ˆΣγ,V(t)=ˆΣγ(t)ˆVγ(t−1);
• orthogonalization: compute an estimator of the matrix containing leading eigenvectors via

 ˆVγ(t)=ˆΣγ,V(t)[ˆΣ⊤γ,V(t)ˆΣγ,V(t)]−1/2.

In the ”stationary” case () the method may be regarded as the ”online”-version of the orthogonal iterations scheme (see [7]) for computing the eigen-subspace of the non-negatively definite matrix. With the use of the Sherman-Morrison-Woodbury formula for the inversion at each time one has to perform operations to compute the updated matrix given , and .

### 3.1 Convergence of CPAST

Throughout this section we consider the stationary case where , , , , , . In this situation one would like to keep all the available information to estimate , that is, to use the estimator (3) for with . For notational simplicity, from now on we skip the dependence on and use the notation for the empirical covariance matrix and for CPAST estimator. Thus in the stationary case the CPAST estimator takes the form

 ˆV(t)=[ˆΣ(t)ˆV(t−1)][ˆV⊤(t−1)⊤ˆΣ2(t)ˆV(t−1)]−1/2. (4)

We assume that the random vectors and have independent components for . Under these assumptions the covariance matrix (2) becomes

 Σ=d∑i=1λiviv⊤i+σ2In=VΛdV⊤+σ2In, (5)

where is matrix with columns , is diagonal matrix with on the diagonal. Note that the observational model (1) in stationary case can be alternatively written as the so-called spike model

 x(t)=d∑i=1√λiui(t)vi+σξ(t), (6)

where

are i.i.d. standard Gaussian random variables independent from

.

For our non-asymptotic error analysis of CPAST, we assume that , and , are known. With the known we can always normalize the data and therefore without loss of generality we can assume that .

The typical condition while analyzing the quality of the eigenvectors estimation is the so-called spectral gap condition, which says that the adjacent eigenvectors explain distinguishably different portion of the variance in the data, namely there exists

, such that for all

 τ(λj−λj+1)≥λ1,

where by definition. Since our goal is the estimation of the -dimensional subspace of the first eigenvectors, and we are not interested in the estimation of each particular eigenvector, we need only the condition for the separation of this -dimensional subspace, namely that the gap between and is sufficiently large:

 τλd≥λ1. (7)

Define a distance between two subspaces and spanning orthonormal columns and correspondingly via

 l(W,Q)=l(W,Q)=∥WW⊤−QQ⊤∥2, (8)

where the nuclear norm of a matrix is defined as , and and are the matrices in with orthonormal columns.

The next result shows that with high probability the subspace which spans the CPAST estimator

is close, in terms of to the subspace spanning when the number of observation is large enough. We assume that the initial estimator is constructed from first observations by means of the singular value decomposition of .

###### Theorem 1.

Suppose that the spectral gap condition (7) holds and

 √t0≥4√2Rmaxλ1+1λd,

where . Then after iterations we get with probability at least

 l(V,ˆV(t))≤C1λd+1λ2dn−dt+C2λ1+1λ2dlog(n∨t)t, (9)

where are absolute constants and depends on

###### Remark 1.

The second term on the right-hand side of (9) corresponds to the error of separating the first eigenvectors from the rest. The first term is an average error of estimating all components of leading eigenvectors. It originates from the interaction of the noise terms with the different coordinates, see [8].

## 4 Sparse CPAST

### 4.1 Sparsity assumptions on leading eigenvectors

We assume that in the stationary case (5) the first leading eigenvectors , of have most of their entries close to zero. Namely, we suppose that each fulfils the so-called weak- ball condition [9, 10], that is, for some

 |vi|(k)≤sik−1/r,k=1,…,n.

where is the -th largest coordinate of . The weak- ball condition is known to be more general than ball condition (which is for , , ), as it combines different definitions of sparsity used in statistics, see [11].

Define a thresholding function with a thresholding parameter and via

 x−β≤g(x,β)≤x+β,g(x,β)1|x|≤β=0. (10)

For example, the so-called hard-thresholding function given by

 gH(x,β)=x1(|x|≤β) (11)

and the so-called soft-thresholding function defined as

 gS(x,β)=(x−β)+⋅sign(x)

fulfill the conditions (10). When is a vector with components , , and is a matrix with columns , , we denote by a matrix with the elements , , .

Our primal goal is to propose a subspace tracking method for estimating a -dimensional subspace of the process under weak- ball assumption on the leading eigenvectors of the covariance matrix and to analyze it’s convergence.

### 4.2 Initialization and main steps

Our sparse modification of CPAST relies on the orthogonal iteration scheme with an additional thresholding step (cf. [5, 10]). From now on, by a slight abuse of notation, we will denote by an iterative estimator obtained with the help of the modified CPAST, given observations. To get the initial approximation we use the following modification of a standard SPCA scheme, see [5, 10].

1. First compute the empirical covariance based on observations:

2. Define a set of indices corresponding to large enough diagonal elements of

 G={k:ˆΣkk(t0)>1+γ0log(n∨t0)t0}

for .

3. Let be a submatrix of corresponding to the row and column indices in .

4. As an estimator at step zero, we take the first eigenvectors of completed with zeros in the coordinates to the vectors of length .

Now we describe a sparse modification of CPAST, which we called SCPAST. We start with obtained by the above procedure. Then for we perform the following steps

1. multiplication:

2. thresholding: define a matrix

 ˆΥβ(t)=g(ˆΥ(t),β(t)),

where is a thresholding function satisfying (10) and is the corresponding thresholding vector;

3. orthogonalization:

 ˆV(t)=ˆΥβ(t)[ˆΥβ⊤(t)ˆΥβ(t)]−1/2.

### 4.3 Convergence of SCPAST

First we define the thresholding parameter as follows. For and

 a≥3√2log(n∨T)log(n∨t0)

the components , of the vector are given by

 βi(t)=a√(λi+1)log(n∨t)t. (12)

The motivation for thresholding of the column vectors of comes from the following connection between sparsity of the leading eigenvectors , and the vector with the components ( is the variance of the -th coordinate of the signal part [12]): the weak- sparsity of the vector implies the weak- sparsity of , .

Suppose that and the eigenvalues are known. In the case of unknown and one might first estimate the eigenvalues of defined in the previous section and then select the largest set of eigenvalues satisfying the spectral gap condition (7) with some parameter (see [5] for more details).

Denote by the set of indices of “large” eigenvectors components ( stands for ”signal”), that is, for a fixed

 S(t)=⎧⎨⎩j:|vij|≥bhi√log(n∨t)t,for % some i=1,…,d⎫⎬⎭,

where and In fact, the quantity is an estimate of the noise variance in the entries of the -th leading eigenvector [8]. The number of “large” entries of the first leading eigenvectors to estimate thus might be estimated by the cardinality of , which we denote by . One can bound as

 card(S(t))≤d∑i=1card(Sj(t)),

where . From Lemma 14 (see Appendix B) we see that where depends on and

 M(t)=n∧⎡⎢⎣d∑j=1srjhrj(log(n∨t)t)−r/2⎤⎥⎦. (13)

Note that in the sparse case, the number of non-zero components is much smaller than . For example, if , then

 M(t)≤n∧dsrhrd(log(n∨t)t)−r/2.

The value is often referred to as an effective dimension of the vector . Thus is the number of effective coordinates of , in the case of disjoint .

Since is an upper-bound for the estimation error for the components of the first leading eigenvectors, the right hand side of the above inequality gives, up to a logarithmic term, the overall number of components of the leading eigenvectors to estimate. The next theorem gives non-asymptotic bounds for a distance between and

###### Theorem 2.

Let

 √t0≥(C1hdM1/2(T)+C2)λ1+1λd√log(n∨T), (14)

where depends on in (7), , , depends on . After iterations one has with probability at least

 l(V,ˆV(t))≤C1h2dM(t)log(n∨t)t+C2λ1+1λ2dlog(n∨t)t. (15)

with some absolute constant

###### Remark 2.

The second term in (15) is the same as in the non-sparse case, see Theorem 1. This term is always present as an error of separating the first eigenvectors from the rest eigenvectors regardless how sparse they are. The first term in (15) and (9) is responsible for the interaction of the noise with different coordinates of the signal. The average error of estimating one entry of the first leading eigenvectors based on observation can be bounded by see [13]. The number of components to be estimated in SCPAST for each vector is bounded by (see (13)), which is small compared to in the sparse case. Thus, the first term in (15) can be significantly smaller than the first one in (9), provided the first leading eigenvectors are sparse. Note also that the computational complexity of SCPAST at each step is with probability given by Theorem (2).

## 5 Numerical results

### 5.1 Single spike

To illustrate the advantage of using SCPAST for the sparse case, we generate observations from (6) for the case of a single spike, that is, and . Our aim is to estimate the leading eigenvector . We shall use three functions depicted in subplots (a) of Fig. 13 with different sparsity levels in the wavelet domain.

The observations are generated for the noise variance and following cases of maximal eigenvalue . We used the Symmlet 8 basis from the Matlab package SPCALab to transform the initial data into the wavelet domain. We applied CPAST and SCPAST for the recovery of wavelet coefficients of the vector and then transformed the estimates to the initial domain and computed the error depending on the number of observations. The results for the hard thresholding (11) with the are shown in Fig. 1-3 in subplots (b)-(d). Note that one peak function has sparser wavelet coefficients than those of three peak functions and the error of the recovery with SCPAST is significantly smaller for the case of one peak function.

### 5.2 Real data example

Natural acoustic signals like the musical ones exhibit a highly varying temporal structure, therefore there is a need in adaptive unsupervised methods for signal processing which reduce the complexity of the signal. In [14]

a method was proposed which reduces the spectral complexity of music signals using the adaptive segmentation of the signal in the spectral domain for the principal component analysis for listeners with cochlear hearing loss. In the following we apply CPAST and SCPAST as an alternative method for the complexity reduction of music signals. To illustrate the use of SCPAST and CPAST we set the memory parameter

to be able to adapt to the changes in the spectral domain of the signal. We focus on the first leading eigenvector recovery. As an example we consider a piece from Bach Siciliano for Oboe and Piano. A wavelet-kind CQT-transform [15] is computed for the signal (see a spectrogram of the transform in Fig. 4). The warmer colors correspond to the higher values of the amplitudes of the harmonics present in the signal at a particular time frame. It is clear that the signal has some regions of “stationarity” (e.g. approximately in time frame interval ). We regard the corresponding spectrogram as a matrix with observations of -dimensional signal modeled by (16) and apply SCPAST and CPAST methods to recover the leading eigenvector .

Fig. 5 contains the results of the recovery of the leading eigenvalue with components. The results show that SCPAST method allows to obtain sparse representation of the leading eigenvectors and seems to be promising for construction of the structure-preserving compressed representations of the signals.

## 6 Sketch of the proofs

Denote by a matrix with column vectors , which complete the orthonormal columns of the matrix to the orthonormal basis in . Denote by a matrix with the columns . From (6) one gets a representation

 X(t)=VΛ1/2dU⊤(t)+σΞ(t),t=1,…,T, (16)

where , are matrices with independent entries, is the orthonormal matrix with columns , is a diagonal matrix with , on the diagonal. Denote a set of indices to the small components of leading eigenvectors as (where here stands for “noise”).

From (16) the empirical covariance matrix can be decomposed as

 ˆΣ(t)=1tVΛ1/2dU⊤(t)U(t)Λ1/2dV⊤+1tΞ(t)Ξ⊤(t)+1tVΛ1/2dU⊤(t)Ξ⊤(t)+1tΞ(t)U(t)Λ1/2dV⊤. (17)

It is well known [7] that the distance (8) between subspaces and , spanning matrices with orthonormal columns and correspondingly, is related to -th principal angle between subspaces and as where the principal angles between subspaces and are recursively defined as [16]

 ϕi(W,Q)=arccos⟨xi,yi⟩∥xi∥2∥yi∥2,% where{xi,yi}=argminx∈W,y∈Q,x⊥xj,y⊥yj,j

From the variational characterization of the singular values and the above definition of the principal angles, the -th principal angle between subspaces spanning the columns of and has the following non-recursive definition

 cosϕd(W,Q)= =min∥x∥2=1,x∈Rd∥W⊤Qx∥∥Qx∥, (18) tanϕd(W,Q)= max∥x∥2=1,x∈Rd∥¯W⊤Qx∥∥W⊤Qx∥. (19)

In the next sections we derive the error bounds for CPAST and SCPAST by looking at the change of the -th principal angle between the eigensubspace spanning the columns of and its estimators based on observation, where .

### 6.1 Bound for CPAST

The aim of this section is to show that with the high probability the subspace which spans CPAST estimator (6.1) is close to the subspace, which spans when the number of observations is large enough. We assume that the initial estimator is constructed from first observations with the help of SVD of . Let us first state the bound for the error which depends on the error on the previous iteration for the fixed . Denote .

###### Lemma 1.

For CPAST (4) with probability

 r(t)≤(λd+1+1)tanϕd(ˆV(t−1),V)λd+1−(λ1+1)E(t)secϕd(ˆV(t−1),V)+(λ1+1)1/2E(t)secϕd(ˆV(t−1),V)λd+1−(λ1+1)E(t)secϕd(ˆV(t−1),V), (20)

where

 E(t)=5√n−dt+5√6√log(n∨t)t.

The following lemma gives the bound for the error depending on the error of the previous iteration for all .

###### Lemma 2.

With probability greater than

 1−C0(n∨t)−3

for all

 r(u)≤α0r(u−1)+α1R(t)√u√1−r2(u−1)−α2R(t)√u, (21)

where ,

 R(t)=5√n−d+5√6√log(n∨t),
 α0=1λd+1,α1=√λ1+1λd+1,α2=λ1+1λd+1. (22)

Given Lemma 2 it is possible to derive a bound for . First let us state the result which allows to bound the error of the initial estimate .

###### Lemma 3.

Let be a matrix containing first leading eigenvectors of the matrix . Then with probability

 r2(t0)=l(ˆV(t0),V)≤α21t0,

where with .

The following Lemma gives the error of CPAST after observing vectors , based on the recursive bound (21). Note that the proof of Lemma 4 also insures that the denominator in (21) is bounded away from zero.

###### Lemma 4.

Suppose that , where

 √t0≥8Rmaxα2(1−α0)3/2.

Then for

 r(t0+K)≤2α1α011−α0R(t0+K)√K+t0. (23)

The statement of the Theorem 1 follows from Lemma 2 applied to the inequality (21) with (22), which holds with probability with the initial conditions given by Lemma 3.

### 6.2 Bound for SCPAST

Define , the oracle version of and the corresponding expectation as follows

 ˆΣ∘(t)=[ˆΣS(t)00IN(t)],Σ∘(t)=[ΣS(t)00IN(t)], (24)

where and are the sub-matrices of the size with column and row indices from

. The identity matrix

has the size . Here we assumed without loss of generality that indices in are always smaller than ones in .

First we obtain the oracle sequence of the solutions by iterating SCPAST with matrices instead of . We define the initial estimate with the steps (a)-(d) in the section 6.1 applied to the matrix . And then bound . Denote the result of the thresholding the columns of the matrix with the thresholding parameters given by the vector as

 ˆΥ∘,β(t)=g(ˆΥ∘(t),β(t)).

Denote the submatrix obtained by selecting the rows of with indices in . Denote by , the rows of (recall that the columns are , ). For the estimators of we omit the dependence of on as the estimator itself depends on , that is, is a matrix of the rows of with indices from .

The following bound for the oracle error of SCPAST method is analogous to Lemma 2.

###### Lemma 5.

For with probability greater than the following bound holds true

 r(u)≤α0r(u−1)+α1R∘(t)√u√1−r2(u−1)−α2R∘(t)√u,where (25)
 α0=(λd+1)−1,α1=α2=(λ1+1)/(λd+1), (26)

, where and are constants and depends on , , , .

In the sparse case the similar result to Lemma 3 holds true giving a bound on the error of the initial oracle estimator.

###### Lemma 6.

The error of initial oracle estimation is bounded as follows with probability , where

 α=(1λdC1λ1hdM1/2(t0)+C2λ1+1λd)√log(n∨T),

where and are constants and depends on .

###### Lemma 7.

Thus after iterations, , with the probability one has

 l(V,ˆV∘(t))≤C1h2dM(t)log(n∨t)t+C2λ1+1λ2dlog(n∨t)t,

where depends on , , , , and depends on .

The convergence of the oracle scheme doesn’t immediately imply the convergence of the SCPAST estimators. The following two lemmas state that with the high probability . Thus the bound in Lemma 7 holds for SCPAST and the Theorem 2 is justified.

###### Lemma 8.

For with probability the initial oracle estimate coincide with the initial SPCA estimate, that is, .

###### Lemma 9.

With probability for the oracle SCPAST and SCPAST solutions coincide .

## Conclusions

We developed a new method SCPAST based on constraint projection approximation subspace tracking method for subspace tracking in the sparsity assumptions on the underlying signal eigen subspace. The thesholding step was introduced in order to ensure the sparsity of the solution. We presented the non-asymptotical bounds for the errors of subspace recovery with SCPAST and CPAST as well as the empirical studies of the methods. The results of experiments show that SCPAST method allows to obtain sparse representation of the leading eigenvector of music signals and might be used for adaptive compression of the musical signal in the spectral domain.

## Appendix A. Proofs of Lemmas

### Proof of Lemma 1

From the definition (19) . Using the latter maximum can be bounded by

 max∥x∥2=1∥¯V⊤ˆV(t−1)x∥+∥¯V⊤(ˆΣ(t)−Σ)ˆV(t−1)x∥(λd+1)∥V⊤ˆV(t−1)x∥−∥V⊤(ˆΣ(t)−Σ)ˆV(t−1)x∥.

Using (