# Sparse Principal Component Analysis via Rotation and Truncation

Sparse principal component analysis (sparse PCA) aims at finding a sparse basis to improve the interpretability over the dense basis of PCA, meanwhile the sparse basis should cover the data subspace as much as possible. In contrast to most of existing work which deal with the problem by adding some sparsity penalties on various objectives of PCA, in this paper, we propose a new method SPCArt, whose motivation is to find a rotation matrix and a sparse basis such that the sparse basis approximates the basis of PCA after the rotation. The algorithm of SPCArt consists of three alternating steps: rotate PCA basis, truncate small entries, and update the rotation matrix. Its performance bounds are also given. SPCArt is efficient, with each iteration scaling linearly with the data dimension. It is easy to choose parameters in SPCArt, due to its explicit physical explanations. Besides, we give a unified view to several existing sparse PCA methods and discuss the connection with SPCArt. Some ideas in SPCArt are extended to GPower, a popular sparse PCA algorithm, to overcome its drawback. Experimental results demonstrate that SPCArt achieves the state-of-the-art performance. It also achieves a good tradeoff among various criteria, including sparsity, explained variance, orthogonality, balance of sparsity among loadings, and computational speed.

## Authors

• 1 publication
• 20 publications
• 4 publications
• 2 publications
• ### A New Basis for Sparse PCA

The statistical and computational performance of sparse principal compon...
07/01/2020 ∙ by Fan Chen, et al. ∙ 17

• ### Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data

Principal Component Analysis (PCA) has been used to study the pathogenes...
05/28/2019 ∙ by Chun-Mei Feng, et al. ∙ 1

• ### A recursive divide-and-conquer approach for sparse principal component analysis

In this paper, a new method is proposed for sparse PCA based on the recu...
11/30/2012 ∙ by Qian Zhao, et al. ∙ 0

• ### Large-Scale Sparse Principal Component Analysis with Application to Text Data

Sparse PCA provides a linear combination of small number of features tha...
10/26/2012 ∙ by Youwei Zhang, et al. ∙ 0

• ### Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

Given a multivariate data set, sparse principal component analysis (SPCA...
12/17/2012 ∙ by Peter Richtarik, et al. ∙ 0

• ### A Randomized Rounding Algorithm for Sparse PCA

We present and analyze a simple, two-step algorithm to approximate the o...
08/13/2015 ∙ by Kimon Fountoulakis, et al. ∙ 0

• ### Sparse PCA from Sparse Linear Regression

Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression ...
11/25/2018 ∙ by Guy Bresler, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In many research areas, the data we encountered are usually of very high dimensions, for examples, signal processing, machine learning, computer vision, document processing, computer network, and genetics etc. However, almost all data in these areas have much lower intrinsic dimensions. Thus, how to handle these data is a traditional problem.

### 1.1 Pca

Principal component analysis (PCA) [1]

Commonly, the dimensions of the original data have some physical explanations. For example, in financial or biological applications, each dimension may correspond to a specific asset or gene [2]. However, the loadings obtained by PCA are usually dense, so the principal component, got by inner product, is a mixture of all dimensions, which makes it difficult to interpret. If most of the entries in the loadings are zeros (sparse), each principal component becomes a linear combination of a few non-zero entries. This facilitates the understanding of the physical meaning of the loadings as well as the principal components [1]. Further, the physical interpretation would be clearer if different loadings have different non-zero entries, i.e. corresponding to different dimensions.

### 1.2 Sparse PCA

Sparse PCA aims at finding a sparse basis to make the result more interpretable [3]. At the same time, the basis is required to represent the data distribution faithfully. Thus, there is a tradeoff between the statistical fidelity and the interpretability.

During the past decade, a variety of methods for sparse PCA have been proposed. Most of them have considered the tradeoff between sparsity and explained variance. However, there are three points that have not received enough attentions yet: the orthogonality between loadings, the balance of sparsity among loadings, and the pitfall of deflation algorithms.

• Balance of sparsity. There should not be any member of the loadings highly dense, particularly those leading ones that take account of most variance, otherwise it is meaningless. We emphasize this point, because quite a few of existing algorithms yield loadings with the leading ones highly dense (close to those of PCA) while the minor ones highly sparse; so sparsity is achieved by the minor ones while variance is explained by the dense ones. This is unreasonable.

Finally we mention that by deflation the obtained loadings are nearly orthogonal, while the block group usually does not equip with mechanism to ensure the orthogonality.

### 1.3 Our Method: SPCArt

In this paper, we propose a new approach called SPCArt (Sparse PCA via rotation and truncation). In contrast to most of traditional work which are based on adding some sparsity penalty on the objective of PCA, the motivation of SPCArt is distinctive. SPCArt aims to find a rotation matrix and a sparse basis such that the sparse basis approximates the loadings of PCA after the rotation. The resulting algorithm consists of three alternative steps: rotate PCA loadings, truncate small entries of them, and update the rotation matrix.

SPCArt turns out to resolve or alleviate the previous three points. It has the following merits. (1) It is able to explain as much variance as the PCA loadings, since the sparse basis spans almost the same subspace as the PCA loadings. (2) The new basis is close to orthogonal, since it approximates the rotated PCA loadings. (3) The truncation tends to produce more balanced sparsity, since vectors of the rotated PCA loadings are of equal length. (4) It is not greedy compared with the deflation group, it belongs to the block group.

The contributions of this paper are four-fold: (1) we propose an efficient algorithm SPCArt achieving good performance over a series of criteria, some of which have been overlooked by previous work; (2) we devise various truncation operations for different situations and provide performance analysis; (3) we give a unified view for a series of previous sparse PCA approaches, together with ours; (4) under the unified view, we find the relation between GPower, rSVD, and our method, and extend GPower [6] and rSVD [7] to a new implementation, called rSVD-GP, to overcome their drawbacks–parameter tuning problem and imbalance of sparsity among loadings.

The rest of the paper is organized as follows: Section 2 introduces representative work on sparse PCA. Section 3 presents our method SPCArt and four types of truncation operations, and analyzes their performance. Section 4 gives a unified view for a series of previous work. Section 5 shows the relation between GPower, rSVD, and our method, and extends GPower and rSVD to a new implementation, called rSVD-GP. Experimental results are provided in Section 6. Finally, we conclude this paper in Section 7.

## 2 Related Work

Various sparse PCA methods have been proposed during the past decade. We give a brief review below.

2. Covariance matrix maximization. More recently, systematic approaches based on solving explicit objectives were proposed, starting from SCoTLASS [3] which optimizes the classical objective of PCA, i.e. maximizing the quadratic form of covariance matrix, while additionally imposing a sparsity constraint on each loading.

3. Matrix approximation. SPCA [9] formulates the problem as a regression-type optimization, so as to facilitate the use of LASSO [12] or elastic-net [13] techniques to solve the problem. rSVD [7] and SPC [14] obtain sparse loadings by solving a sequence of rank-1 matrix approximations, with sparsity penalty or constraint imposed.

4. Semidefinite convex relaxation. Most of the methods proposed so far are local ones, which suffer from getting trapped in local minima. DSPCA [15] transforms the problem into a semidefinite convex relaxation problem, thus global optimality of solution is guaranteed. This distinguishes it from most of the other local methods. Unfortunately, its computational complexity is as high as ( is the number of variables), which is expensive for most applications. Later, a variable elimination method [16] of complexity was developed in order to make the application on large scale problem feasible.

5. Greedy methods. In [17], greedy search and branch-and-bound methods are used to solve small instances of the problem exactly. Each step of the algorithm has a complexity , leading to a total complexity of for a full set of solutions (solutions of cardinality ranging from 1 to ). Later, this bound is improved in the classification setting [18]. In another way, a greedy algorithm PathSPCA [2] was presented to further approximate the solution process of [17], resulting in a complexity of for a full set of solutions. For a review of DSPCA, PathSPCA, and their applications, see [19].

6. Power methods. The GPower method [6] formulates the problem as maximization of a convex objective function and the solution is obtained by generalizing the power method [20] that is used to compute the PCA loadings. Recently, a new power method TPower [11], and a somewhat different but related power method ITSPCA [21] that aims at recovering sparse principal subspace, were proposed.

7. Augmented lagrangian optimization. ALSPCA [10] solves the problem based on an augmented lagrangian optimization. The most special feature of ALSPCA is that it simultaneously considers the explained variance, orthogonality, and correlation among principal components.

Among them only LR [5], SCoTLASS [3], ALSPCA [10] have considered the orthogonality of loadings. SCoTLASS, rSVD [7], SPC [14], the greedy methods [17, 2], one version of GPower [6], and TPower [11] belong to the deflation group. Only DSPCA’s solution [15] is ensured to be globally optimal.

The computational complexities of some of the above algorithms are summarized in Table 1.

## 3 SPCArt: Sparse PCA via Rotation and Truncation

We first give a brief overview of SPCArt, next introduce the motivation, and then the objective and optimization, and then the truncation types, and finally provide performance analysis.

The idea of SPCArt is simple. Since any rotation of the PCA loadings constitutes an orthogonal basis spanning the same subspace, (, ), we want to find a rotation matrix through which is transformed to a sparsest basis . It is difficult to solve this problem directly, so instead we would find a rotation matrix and a sparse basis such that the sparse basis approximates the PCA loadings after the rotation .

The major notations used are listed in Table 2.

### 3.1 Motivation

Our method is motivated by the solution of the Eckart-Young theorem [22]. This theorem considers the problem of approximating a matrix by the product of two low-rank ones.

###### Theorem 1.

(Eckart-Young Theorem) Assume the SVD of a matrix is , in which , , is diagonal with , and . A rank () approximation of is to solve the following problem:

 minY,X∥A−YXT∥2F,s.t.XTX=I, (1)

where and . A solution is

 X∗=V1:r,Y∗=AX∗, (2)

where is the first columns of .

Alternatively, the solution can be expressed as

 Y∗=U1:rΣ1:r,X∗=Polar(ATY∗), (3)

where is the orthonormal component of the polar decomposition of a matrix [6]. From the more familiar SVD perspective, its equivalent definition is provided in Table 2.

Note that if the row vectors of have been centered to have mean zero, are the loadings obtained by PCA. Clearly, and , and is also a solution of (1). This implies that any rotation of the

is a solution of the best rank approximation of . That is, any orthonormal basis in the corresponding eigen-subspace is capable of representing the original data distribution as well as the original basis. Thus, a natural idea for sparse PCA is to find a rotation matrix so that becomes sparse, i.e.,

 minR∈Rr×r∥V1:rR∥0,s.t.RTR=I, (4)

where denotes the sum of (pseudo) norm of the columns of a matrix, i.e. it counts the non-zeros of a matrix.

### 3.2 Objective and optimization

Unfortunately, the above problem is hard to solve. So we approximate it instead. Since , we want to find a rotation matrix through which a sparse basis approximates the original PCA loadings. Without confusion, we use to denote hereafter. For simplicity, the version will be postponed to next section, we consider the version first:

 minX,R12∥V−XR∥2F+λ∑i∥Xi∥1,s.t.∀i,∥Xi∥2=1,RTR=I. (5)

is the norm of a vector, i.e. sum of absolute values. It is well-known that norm is sparsity inducing, which is a convex surrogate of the norm [23]. Under this objective, the solution may not be orthogonal, and may deviate from the eigen-subspace spanned by . However, if the approximation is accurate enough, i.e., , then would be nearly orthogonal and explain similar variance as . Note that the above objective turns out to be a matrix approximation problem as Eckart-Young theorem. The key difference is that sparsity penalty is added. But the solutions still share some common features.

There is no closed-form solutions for and simultaneously. We can solve the problem by fixing one and optimizing the other alternately. Both subproblems have closed-form solutions.

#### 3.2.1 Fix X, solve R

When is fixed, it becomes a Procrustes problem [9]:

 minR∥V−XR∥2F,s.t.RTR=I. (6)

. It has the same form as the right of (3).

#### 3.2.2 Fix R, solve X

When is fixed, it becomes

 minX12∥VRT−X∥2F+λ∑i∥Xi∥1,s.t.∀i,∥Xi∥2=1. (7)

There are independent subproblems, one for each column: , where . It is equivalent to . The solution is [6]. is entry-wise soft thresholding, defined in Table 2. This is truncation type T-: soft thresholding.

It has the following physical explanations. is rotated PCA loadings, it is orthonormal. is obtained via truncating small entries of . On one hand, because of the unit length of each column in , a single threshold is feasible to make the sparsity distribute evenly among the columns in ; otherwise we have to apply different thresholds for different columns which are hard to determine. On the other hand, because of the orthogonality of and small truncations, is still possible to be nearly orthogonal. These are the most distinctive features of SPCArt. They enable easy analysis and parameter setting.

The algorithm of SPCArt is presented in Algorithm 1, where the truncation in line 7 can be any type, including T- and the others that will be introduced in next section.

The computational complexity of SPCArt is shown in Table 1. Except the computational cost of PCA loadings, SPCArt scales linearly about data dimension. When the number of loadings is not too large, it is efficient.

### 3.3 Truncation Types

In this section, given rotated PCA loadings , we introduce the truncation operation , where is any of the following four types: T- soft thresholding , T- hard thresholding , T-sp truncation by sparsity , and T-en truncation by energy . T- has been introduced in last section, which is resulted from penalty.

T-: hard thresholding. Set the entries below threshold to be zero: . is defined in Table 2. It is resulted from penalty:

 minX,R∥V−XR∥2F+λ2∑i∥Xi∥0,s.t.RTR=I, (8)

The optimization is similar to the case. Fixing , . Fixing , the problem becomes . Let , it can be decomposed to entry-wise subproblems, and the solution is apparent: if , then , otherwise . Hence the solution can be expressed as .

There is no normalization for compared with the case. This is because if unit length constraint is added, there will be no closed form solution. However, in practice, we still let for consistency, since empirically no significant difference is observed.

Note that both and penalties only result in thresholding operation on and nothing else (only make line 7 of Algorithm 1

different). Hence, we may devise other heuristic truncation types irrespective of explicit objective:

T-sp: truncation by sparsity. Truncate the smallest entries: , . Table 2 gives the precise definition of . It can be shown that this heuristic type is resulted from the constraint:

 minX,R∥V−XR∥2F,s.t.∀i,∥Xi∥0≤p−λ,∥Xi∥2=1,RTR=I. (9)

When is fixed, the solution is the same as and cases above. When is fixed, the solution is , where . The proof is put in Appendix A.

T-en: truncation by energy. Truncate the smallest entries whose energy (sum of square) take up percentage: . is described in Table 2. However, we are not aware of any objective associated with this type.

Algorithm 1 describes the complete algorithm of SPCArt with any truncation type.

### 3.4 Performance Analysis

This section discusses the performance bounds of SPCArt with each truncation type. For , , we study the following problems:

(1) How much sparsity of is guaranteed?

(2) How much deviates from ?

(3) How is the orthogonality of ?

(4) How much variance is explained by ?

The performance bounds derived are functions of . Thus, we can directly or indirectly control sparsity, orthogonality, and explained variance via .111Theorem 13 is specific to SPCArt, which concerns the important explained variance. The other results apply to more general situations: Proposition 6-11 apply to any orthonormal , Theorem 12 applies to any matrix . To obtain results specific to SPCArt, we may have to make assumption of the data distribution. Nevertheless, they are still the absolute performance bounds of SPCArt and can guide us to set for some performance guarantee. We give some definitions first.

###### Definition 2.

, the sparsity of is the proportion of zero entries:

###### Definition 3.

, , , the deviation of from is , where is the included angle between and , . If , is defined to be .

###### Definition 4.

, , , the nonorthogonality between and is , where is the included angle between and .

###### Definition 5.

Given data matrix containing samples of dimension , basis , , the explained variance of is . Let be any orthonormal basis in the subspace spanned by , then the cumulative percentage of explained variance is [7].

Intuitively, larger leads to higher sparsity and larger deviation. When two truncated vectors deviate from their originally orthogonal vectors, in the worst case, the nonorthogonality of them degenerates as the ‘sum’ of their deviations. In another way, if the deviations of a sparse basis from the rotated loadings are small, we expect the sparse basis still represents the data well, and the explained variance or cumulative percentage of explained variance maintains similar level to that of PCA. So, both the nonorthogonality and the explained variance depend on the deviations, and the deviation and sparsity in turn are controlled by . We now go into details. The proofs of some of the following results are included in Appendix B.

#### 3.4.1 Orthogonality

###### Proposition 6.

The relative upper bound of nonorthogonality between and , , is

 |cos(θ(Xi,Xj))|≤ (10) {sin(θ(Xi,Zi)+θ(Xj,Zj)),θ(Xi,Zi)+θ(Xj,Zj)≤π2,1,otherwise.

The bounds can be obtained by considering the two conical surfaces generated by axes with rotational angles . The proposition implies the nonorthogonality is determined by the sum of deviated angles. When the deviations are small, the orthogonality is good. The deviation depends on , which is analyzed below.

#### 3.4.2 Sparsity and Deviation

The following results only concern a single vector of the basis. We will denote by , and by for simplicity, and derive bounds of sparsity and deviation for each . They depend on a key value , which is the entry value of a uniform vector.

###### Proposition 7.

For T-, the sparsity bounds are

 ⎧⎨⎩0≤s(x)≤1−1p, λ<1√p% ,1−1pλ2

Deviation , where is the truncated part: if , and otherwise. The absolute bounds are:

 0≤sin(θ(x,z))≤⎧⎨⎩√p−1λ, λ<1√p,1, λ≥1√p. (12)

All the above bounds are achievable.

Because when , there is no sparsity guarantee, is usually set to be in practice. Generally it works well.

###### Proposition 8.

For T-, the bounds of and lower bound of are the same as T-. In addition, there are relative deviation bounds

 ∥¯z∥2≤sin(θ(x,z))<√∥¯z∥22+λ2∥x∥0. (13)

It is still an open question that whether T- has the same upper bound of as T-. By the relative lower bounds, we have

###### Corollary 9.

The deviation due to soft thresholding is always larger than that of hard thresholding, if the same is applied.

This implies that results got by T- have potentially greater sparsity and less explained variance than those of T-.

###### Proposition 10.

For T-sp, , and

 0≤sin(θ(x,z))≤√λ/p. (14)

Except the unusual case that has many zeros, . The main advantage of T-sp lies in its direct control on sparsity. If specific sparsity is wanted, it can be applied.

###### Proposition 11.

 ⌊λp⌋/p≤s(x)≤1−1/p. (15)

If , there is no sparsity guarantee. When is moderately large, .

Due to the discrete nature of operand, the actually truncated energy can be less than . But in practice, especially when is moderately large, the effect is negligible. So we usually have . The main advantage of T-en is that it has direct control on deviation. Recall that the deviation has direct influence on the explained variance. Thus, if it is desirable to gain specific explained variance, T-en is preferable. Besides, if is moderately large, T-en also gives nice control on sparsity.

#### 3.4.3 Explained Variance

Finally, we derive bounds on the explained variance . Two results are provided. The first one is general and is applicable to any basis not limited to sparse ones. The second one is tailored to SPCArt.

###### Theorem 12.

Let rank- SVD of be , . Given , assume SVD of is , , . Then

 d2min⋅EV(V)≤EV(X), (16)

and .

The theorem can be interpreted as follows. If is a basis that approximates rotated PCA loadings well, then will be close to one, and so the variance explained by is close to that explained by PCA. Note that variance explained by PCA loadings is the largest value that is possible to be achieved by orthonormal basis. Conversely, if deviates much from the rotated PCA loadings, then tends to zero, so the variance explained by is not guaranteed to be much. We see that the less the sparse loadings deviates from rotated PCA loadings, the more variance they explain.

When SPCArt converges, i.e. , , and

hold simultaneously, we have another estimation. It is mainly valid for T-en.

###### Theorem 13.

Let , i.e. , and let be with diagonal elements removed. Assume and , , then

 (cos2(θ)−√r−1sin(2θ))⋅EV(V)≤EV(X). (17)

When is sufficiently small,

 (cos2(θ)−O(θ))⋅EV(V)≤EV(X). (18)

The assumptions and , , are roughly satisfied by T-en using small . Uniform deviation , , can be achieved by T-en as indicated by Proposition 11. means the sum of projected length is less than 1, when is projected onto each . It must be satisfied if is exactly orthogonal, whereas it is likely to be satisfied if is nearly orthogonal (note may not lie in the subspace spanned by ), which can be achieved by setting small according to Proposition 6. In this case, about is guaranteed.

In practice, we prefer CPEV [7] to EV. CPEV measures the variance explained by subspace rather than basis. Since it is also the projected length of onto the subspace spanned by , the higher CPEV is, the better represents the data. If is not an orthogonal basis, EV may overestimates or underestimates the variance. However, if is nearly orthogonal, the difference is small, and it is nearly proportional to CPEV.

## 4 A Unified View to Some Prior Work

A series of methods: PCA [1], SCoTLASS [3], SPCA [9], GPower [6], rSVD [7], TPower [11], SPC [14], and SPCArt, though proposed independently and formulated in various forms, can be derived from the common source of Theorem 1, the Eckart-Young Theorem. Most of them can be seen as the problems of matrix approximation (1), with different sparsity penalties. Most of them have two matrix variables, and the solutions of them usually can be obtained by an alternating scheme: fix one and solve the other. Similar to SPCArt, the two subproblems are a sparsity penalized/constrained regression problem and a Procrustes problem.

PCA [1]. Since , substituting into (1) and optimizing , the problem is equivalent to

 maxXtr(XTATAX),s.t.XTX=I. (19)

The solution is provided by Ky Fan theorem [24]: , . If has been centered to have mean zero, the special solution are exactly the loadings obtained by PCA.

SCoTLASS [3]. Constraining to be sparse in (19), we get SCotLASS

 maxXtr(XTATAX),s.t.XTX=I,∀i,∥Xi∥1≤λ. (20)

However, the problem is not easy to solve.

SPCA [9]. If we substitute into (1) and separate the two ’s into two independent variables and (so as to solve the problem via alternating), and then impose some penalties on , we get SPCA

 minZ,X∥A−AZXT∥2F+λ∥Z∥2F+∑iλ1i∥Zi∥1,s.t.XTX=I, (21)

where is treated as target sparse loadings and ’s are weights. When is fixed, the problem is equivalent to elastic-net problems: . When is fixed, it is a Procrustes problem: , and .

GPower [6]. Except some artificial factors, the original GPower solves the following and versions of objectives:

 maxY,W∑i(YTiAWi)2−λi∥Wi∥0,s.t.YTY=I,∀i,∥Wi∥2=1, (22)
 maxY,W∑iYTiAWi−λi∥Wi∥1,s.t.YTY=I,∀i,∥Wi∥2=1. (23)

They can be seen as derived from the following more fundamental ones (details are included in Appendix C).

 minY,X∥A−YXT∥2F+∑iλi∥Xi∥0,s.t.YTY=I, (24)
 minY,X12∥A−YXT∥2F+∑iλi∥Xi∥1,s.t.YTY=I. (25)

These two objectives can be seen as derived from (1): a mirror version of Theorem 1 exists , , where is still seen as a data matrix containing samples of dimension . The solution is and . Adding sparsity penalties to , we get (24) and (25).

Following the alternating optimization scheme. When is fixed, in both cases . When is fixed, the case becomes . Let , then ; the case becomes , . The th loading is obtained by normalizing to unit length.

The iterative steps combined together produce essentially the same solution processes to the original ones in [6]. But, the matrix approximation formulation makes the relation of GPower to SPCArt and others apparent. The three methods rSVD, TPower, and SPC below can be seen as special cases of GPower.

rSVD [7]. rSVD can be seen as a special case of GPower, i.e. the single component case . Here reduces to unit length normalization. More loadings can be got via deflation [4, 7], e.g. update and run the procedure again. Now, since , the subsequent loadings obtained are nearly orthogonal to .

If the penalties in rSVD are replaced with constraints, we obtain TPower and SPC.

TPower [11]. The case is

 miny∈Rn,x∈Rp∥A−yxT∥2F,s.t.∥x∥0≤λ,∥y∥2=1. (26)

There are closed form solutions , . sets the smallest entries to zero.222[7] did implement this version for rSVD, but using as a heuristic trick. By iteration, , which indicates equivalence to the original TPower.

SPC [14]. The case is , , , , . serves as the length of in (26). If the other variables are fixed, . If is fixed, the problem is: , , , . A small modification leads to SPC:

 maxy,xtr(yTAx),s.t.∥x∥1≤λ,∥y∥2≤1,∥x∥2≤1,

which is biconvex. . However, there is no analytic solution for ; it is solved by linear searching.

## 5 Relation of GPower to SPCArt and an Extension

### 5.1 Relation of GPower to SPCArt

Note that (24) and (25) are of similar forms to (8) and (5) respectively. There are two important points of differences. First, SPCArt deals with orthonormal PCA loadings rather than original data. Second, SPCArt takes rotation matrix rather than merely orthonormal matrix as variable. These differences are the key points for the success of SPCArt.

### 5.2 Extending rSVD and GPower to rSVD-GP

A major drawback of rSVD and GPower is that they cannot use uniform thresholds when applying thresholding . The problem does not exist in SPCArt since the inputs are of unit length. But, we can extend the similar idea to GPower and rSVD: let , which is equivalent to truncating according to its length, or using adaptive thresholds . The other truncation types T-en and T-sp can be introduced into GPower too. T-sp is insensitive to length, so there is no trouble in parameter setting; and the deflation version happens to be TPower.

The deflation version of the improved algorithm rSVD-GP is shown in Algorithm 2, and the block version rSVD-GPB is shown in Algorithm 3. rSVD-GPB follows the optimization described in Section 4. For rSVD-GP, since reduces to normalization of vector, and the extended truncation is insensitive to the length of input, we can combine the step with the step and ignore the length during the iterations. Besides, it is more efficient to work with the covariance matrix, if .

## 6 Experiments

The data sets used include: (1) a synthetic data with some underlying sparse loadings [9]; (2) the classical Pitprops data [25]; (3) a natural image data with moderate dimension and relatively large sample size, on which comprehensive evaluations are conducted; (4) a gene data with high dimension and small sample size [26]; (5) a set of random data with increasing dimensions for the purpose of speed test.

We compare our methods with five methods: SPCA [9], PathSPCA [2], ALSPCA [10], GPower [6], and TPower [11]. For SPCA, we use toolbox [27], which implements and constraint versions. We use GPowerB to denote the block version of GPower, as rSVD-GPB. We use SPCArt(T-) to denote SPCArt using T-; the other methods use the similar abbreviations. Note that, rSVD-GP(T-sp) is equivalent to TPower [11]. Except our SPCArt and rSVD-GP(B), the codes of the others are downloaded from the authors’ websites.