# Group-sparse SVD Models and Their Applications in Biological Data

Sparse Singular Value Decomposition (SVD) models have been proposed for biclustering high dimensional gene expression data to identify block patterns with similar expressions. However, these models do not take into account prior group effects upon variable selection. To this end, we first propose group-sparse SVD models with group Lasso (GL1-SVD) and group L0-norm penalty (GL0-SVD) for non-overlapping group structure of variables. However, such group-sparse SVD models limit their applicability in some problems with overlapping structure. Thus, we also propose two group-sparse SVD models with overlapping group Lasso (OGL1-SVD) and overlapping group L0-norm penalty (OGL0-SVD). We first adopt an alternating iterative strategy to solve GL1-SVD based on a block coordinate descent method, and GL0-SVD based on a projection method. The key of solving OGL1-SVD is a proximal operator with overlapping group Lasso penalty. We employ an alternating direction method of multipliers (ADMM) to solve the proximal operator. Similarly, we develop an approximate method to solve OGL0-SVD. Applications of these methods and comparison with competing ones using simulated data demonstrate their effectiveness. Extensive applications of them onto several real gene expression data with gene prior group knowledge identify some biologically interpretable gene modules.

## Authors

• 5 publications
• 17 publications
• 15 publications
• ### L0-norm Sparse Graph-regularized SVD for Biclustering

Learning the "blocking" structure is a central challenge for high dimens...
03/19/2016 ∙ by Wenwen Min, et al. ∙ 0

• ### Group Lasso with Overlaps: the Latent Group Lasso approach

We study a norm for structured sparsity which leads to sparse linear pre...
10/03/2011 ∙ by Guillaume Obozinski, et al. ∙ 0

• ### Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure

Genome-wide association studies (GWA studies or GWAS) investigate the re...
09/12/2017 ∙ by Tao Yang, et al. ∙ 0

• ### Variable selection via Group LASSO Approach : Application to the Cox Regression and frailty model

In the analysis of survival outcome supplemented with both clinical info...
02/23/2018 ∙ by Jean Claude Utazirubanda, et al. ∙ 0

• ### Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint

Deep Belief Networks (DBN) have been successfully applied on popular mac...
01/16/2013 ∙ by Xanadu Halkias, et al. ∙ 0

• ### A first-order optimization algorithm for statistical learning with hierarchical sparsity structure

In many statistical learning problems, it is desired that the optimal so...
01/10/2020 ∙ by Dewei Zhang, et al. ∙ 0

• ### LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

We study k-SVD that is to obtain the first k singular vectors of a matri...
07/12/2016 ∙ by Zeyuan Allen-Zhu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Singular Value Decomposition (SVD) is one of the classical matrix decomposition models [1]. It is a useful tool for data analysis and low-dimensional data representation in many different fields such as signal processing, matrix approximation and bioinformatics [2, 3, 4]

. However, the non-sparse singular vectors with all variables are difficult to be explained intuitively. In the recent years, sparse models have been widely applied in computational biology to improve biological interpretation

[5, 6, 7]. In addition, many researchers applied diverse sparse penalties onto singular vectors in SVD and developed multiple sparse SVD models to improve their interpretation and capture inherent structures and patterns from the input data [8, 9]. For example, sparse SVD provides a new way for exploring bicluster patterns of gene expression data. Suppose denotes a gene expression matrix with genes and samples. Biologically, a subset of patients and genes can be clustered together as a coherent bicluster or block pattern with similar expressions. Previous studies have reported that such a bicluster among gene expression data can be identified by low-rank sparse SVD models [10, 11, 12]

. However, these sparse models ignore prior information of gene variables, and usually assume that each gene is selected in a bicluster with equal probability. Actually, one gene may belong to multiple pathways in biology

[13]. As far as we know, there is not yet a model for biclustering gene expression data by integrating gene pathway information. Group sparse penalties [14, 15] should be used to induce the structured sparsity of variables for variable selection. Several studies have explored the (overlapping) group Lasso in regression tasks [16, 17]

. However, little work focus on developing structured sparse SVD for biclustering high-dimensional data (e.g., biclustering gene expression data via integrating prior gene group knowledge).

In this paper, motivated by the development of sparse coding and structured sparse penalties, we propose several group-sparse SVD models for pattern discovery in biological data. We first introduce the group-sparse SVD model with group Lasso () penalty (-SVD) to integrate non-overlapping structure of variables. Compared to -norm, -norm is a more natural sparsity-inducing penalty. Thus, we also propose an effective group-sparse SVD via replacing -norm with -norm, called -SVD, which uses a mix-norm by combining the group Lasso and -norm penalty. However, the non-overlapping group structure limits their applicabilities in diverse fields. We consider a more general situation, where we assume that either groups of variables are potentially overlapping (e.g., a gene may belong to multiple pathways (groups)). We also propose two group-sparse SVD models with overlapping group Lasso (-SVD) and overlapping group -norm penalty (-SVD).

To solve these models, we design an alternating iterative algorithm to solve -SVD based on a block coordinate descent method and -SVD based on a projection method. Furthermore, we develop a more general approach based on Alternating Direction Method of Multipliers (ADMM) to solve -SVD. In addition, we extend -SVD to -SVD, which is a regularized SVD with overlapping grouped -norm penalty. The key of solving -SVD is also a proximal operator with overlapping group -norm penalty. We propose a greedy method to solve it and obtain its approximate solution. Finally, applications of these methods and comparison with the state-of-the-art ones using a set of simulated data demonstrate their effectiveness and computational efficiency. Extensive applications of them onto the high-dimensional gene expression data show that our methods could identify more biologically relevant gene modules, and improve their biological interpretations.

Related Work We briefly review the regularized low rank- SVD model as follows:

 minimizeU,D,V ∥X−UDVT∥2F (1) subject to ∥Ui∥2≤1,Ω1(Ui)≤ci1,∀i ∥Vi∥2≤1,Ω2(Vi)≤ci2,∀i

where with features and samples, , and is diagonal matrix. () corresponds to the -th column of (

), which is a column orthogonal matrix. To solve the above optimization problem, we introduce a general regularized rank-one SVD model:

 minimizeu,v,d ∥X−duvT∥2F (2) subject to ∥u∥2≤1,Ω1(u)≤c1, ∥v∥2≤1,Ω2(v)≤c2,

where is a positive singular value, is a -dimensional column vector, and is a -dimensional column vector. and are two penalty functions, and

are two hyperparameters. In a Bayesian view, different prior distribution functions of

and correspond to different regularized functions. For example, -norm is a very popular sparsity-inducing norm [18] and has been used to obtain sparse solutions in a large number of statistical models including the regression model [18, 19], SVD [20], PCA [21], LDA [22][23], etc.

Recently, some sparse SVD models have been proposed for coherent sub-matrix detection [20, 10, 11]. For example, Witten et al. [20] developed a penalized matrix decomposition (PMD) method, which regularizes the singular vectors with Lasso and fussed Lasso to induce sparsity. Lee et al. [10] proposed a rank-one sparse SVD model with adaptive Lasso () (-SVD) of the singular vectors for biclustering of gene expression data. Some generalized sparsity penalty functions (e.g., group Lasso [14] and sparse group lasso [24]

) have been widely used in many regression models for feature selection by integrating group information of variables. However, it is a challenging issue to use these generalized penalty functions such as group Lasso and overlapping group Lasso

[15, 25] in the SVD framework with effective algorithms. To this end, we develop several group-sparse SVD models with different group-sparse penalties including , , and to integrate diverse group structures of variables for pattern discovery in biological data (see TABLE 1).

## 2 Group-sparse SVD Models

In this section, we propose four group sparse SVD models with respect to different structured penalties (TABLE 1). For a given data (e.g., gene expression data), we can make proper adjustments to get one-sided group-sparse SVD models via using (overlapping) group-sparse penalties for the right (or left) singular vector. For example, SVD(, ) is a group-sparse SVD model, which uses the overlapping group -penalty for and -penalty for respectively.

Below we will introduce these models and their algorithms in detail.

### 2.1 GL1-Svd

Suppose the left singular vector and right singular vector can be respectively divided into and non-overlapping groups: and . Here, we consider the (adaptive) group Lasso () penalty [26] for and as follows:

 ΩGL1(u)=L∑l=1wl∥u(l)∥2 and ΩGL1(v)=M∑m=1τm∥v(m)∥2, (3)

where both and are adaptive weight parameters. Suppose and for group sizes, the penalty reduces to a traditional group Lasso.

Based on the definition of penalty, we propose the first group-sparse SVD with group Lasso penalty (-SVD), also namely SVD(, ):

 minimizeu,v,d ∥X−duvT∥2F (4) subject to ∥u∥2≤1,ΩGL1(u)≤c1, ∥v∥2≤1,ΩGL1(v)≤c2.

Since . Minimizing is equivalent to minimizing , and once the and are determined, the value is determined by . We obtain the Lagrangian form of -SVD model as follows:

 L(u,v)=−uTXv+λ1ΩGL1(u)+λ2ΩGL1(v)+η1∥u∥2+η2∥v∥2, (5)

where , , and are Lagrange multipliers. To solve the problem (5), we apply an alternating iterative algorithm to optimize for a fixed and vice versa.

#### 2.1.1 Learning u

Fix and let , minimizing Eq. (5) is equivalent to minimizing the following criterion:

 L(u,λ,η)=−uTz+λL∑l=1wl∥u(l)∥2+ηL∑l=1u(l)Tu(l), (6)

where and , for simplicity. It is obvious that is convex with respect , and we develop a block coordinate descent algorithm [27, 28, 29, 30] to minimize Eq. (6), i.e. one group of is updated at a time. For a single group with fixed for all and , the subgradient equations (see [31]) of Eq. (6) with respect to is written as:

 ∇u(l)L=−z(l)+λwls(l)+2ηu(l)=0, (7)

where is the subgradient vector of and it meets

 s(l)=⎧⎨⎩u(l)∥u(l)∥2,if u(l)≠0,∈{s(l): ∥s(l)∥2≤1},otherwise. (8)

Based on Eq. (7), we have .

If , then we have . Since and . Thus, we have and .

If , then . In short, we obtain the following update rule for (),

 u(l)={12η(1−λwl∥z(l)∥2)z(l),if ∥z(l)∥2>λwl,0,otherwise. (9)

Since Eq. (6) is strictly convex and separable, the block coordinate descent algorithm must converge to its optimal solution [27]. Finally, we can choose an to guarantee (normalizing condition).

#### 2.1.2 Learning v

In the same manner, we fix in Eq. (5) and let . Similarly, we can also obtain the coordinate update rule for .

 v(m)={12η(1−λτm∥z(m)∥2)z(m),if ∥z(m)∥2>λτm,0,otherwise. (10)

Furthermore, to meet the normalizing condition, we chose an to guarantee . Besides, if here each group only contains one element, then the group Lasso penalty reduces to the Lasso penalty. Accordingly, we get another update formula:

 vi={12η(1−λ∥zi∥2)zi,if |zi|>λ,0,% otherwise. (11)

#### 2.1.3 Gl1-SVD Algorithm

Based on Eqs. (9) and (10), we propose an alternating iterative algorithm (Algorithm 1) to solve the -SVD model and its time complexity is , where is the number of iterations. We can control the iteration by monitoring the change of .

In order to display the penalty function for left and right singular vectors, -SVD can also be written in another form SVD(, ), denoting that the left singular vector is regularized by penalty and the right singular vector is regularized by penalty, respectively. Similarly, we can simply modify Algorithm 1 to solve SVD(, ) model, which applies Lasso as the penalty for .

### 2.2 GL0-Svd

Unlike penalty, below we consider a group -norm penalty () of and as follows:

 ΩGL0(u)=∥ϕ(u)∥0 and ΩGL0(v)=∥ϕ(v)∥0, (12)

where and .

Based on the above definition of penalty, we propose the second group-sparse SVD model with penalty, namely -SVD or SVD(, ):

 minimizeu,v,d ∥X−duvT∥2F (13) subject to ∥u∥2≤1,ΩGL0(u)≤ku, ∥v∥2≤1,ΩGL0(v)≤kv.

Here, we employ an alternating iterative strategy to solve problem (13). Fix (or ), the problem (13) reduces to a projection problem with group -norm penalty.

#### 2.2.1 Learning u

Since . Fix and let , Eq. (13) reduces to a group-sparse projection operator with respect to :

 minimize∥u∥≤1 −zTuu,  s.t. ΩGL0(u)≤ku. (14)

We present Theorem 1 to solve problem (14).

###### Theorem 1.

The optimum solution of Eq. (14) is , where is a column-vector and meets

 [PGL0(zu)](g)={zu(g),if g∈supp(ϕ(zu),ku),0,% otherwise, (15)

where is a sub-vector from the -th group, and denotes the set of indexes of the largest elements of .

The objective function of (14) can be simplified as . Theorem 1 shows that solving problem (14) is equivalent to forcing the elements in groups of with the smallest group-norm values to be zeros. We can easily prove that Theorem 1 is true. Here we omit the prove process.

#### 2.2.2 Learning v

In the same manner, fix and let , thus problem (13) can be written as a similar subproblem with respect to :

 minimize∥u∥≤1 −zTvv,  s.t. ΩGL0(v)≤kv. (16)

Similarly, based on Theorem 1, we can obtain the estimator of

as .

#### 2.2.3 Gl0-SVD Algorithm

Finally, we propose an alternating iterative method (Algorithm 2) to solve the optimization problem (13). The time complexity of Algorithm 2 is , where is the number of iterations.

Note that once the number of elements of every group equals 1 (i.e., for ), the group -norm penalty reduces to -norm penalty. Moreover, Algorithm 2 with a small modification can be used to solve SVD(, ), which applies -norm as the penalty for the right singular vector . In addition, compared to adaptive group lasso [26], we may consider a weighted (adaptive) group -penalty. We rewrite in Eq. (15), where is a weight coefficient to balance different group-size and it is defined by , and is the number of elements in group .

### 2.3 OGL1-Svd

In some situations, the non-overlapping group structure in group Lasso limits its applicability in practice. For example, a gene can participate in multiple pathways. Several studies have explored the overlapping group Lasso in regression tasks [16, 17]. However, structured sparse SVD with overlapping group structure remains to be solved.

Here we consider the overlapping group situation, where a variable may belong to more than one group. Suppose corresponds to the row-variables of with overlapping groups and corresponds to the column-variables of with overlapping groups . In other words, and can be respectively divided into and groups, which can be represented by and . We define an overlapping group Lasso () penalty of as follows [15, 16, 32]:

 ΩOGL1(u)=minimizeJ⊆Gu,% supp(ϕ(u))⊆J L∑l=1wl∥uGl∥, (17)

where denotes the index set of non-zero elements for a given vector.

is a specific penalty function for structured sparsity. It can lead to the sparse solution, whose supports are unions of predefined overlapping groups of variables. Based on the definition of , we propose the third group-sparse SVD model as follows:

 minimizeu,v,d ∥X−duvT∥2F (18) subject to ∥u∥2≤1,ΩOGL1(u)≤cu, ∥v∥2≤1,ΩOGL1(v)≤cv,

where and are two hyperparameters. We first introduce two latent vectors and . Let and set , which is a column vector with size of . Similarly, we can get based on . In addition, we can extend the rows and columns of of to obtain a new matrix with size of , whose row and column variables are non-overlapping. Thus, solving the problem (18) is approximately equivalent to solving a SVD(, ) for non-overlapping . We can obtain an approximate solution of (18) by using Algorithm 1. However, if a variable belongs to many different groups, it leads to a large computational burden. For example, given a protein-protein interaction (PPI) network, which contains about 13,000 genes and 250,000 edges. If we consider each edge of the PPI network as a group, then we would construct a high-dimensional matrix , which contains 500,0000 rows.

To address this issue, we develop a method based on alternating direction method of multipliers (ADMM) [33, 34] to directly solve problem (18). Similar with Eq. (5), we first redefine problem (18) with its Lagrange form:

 L(u,v)= −uTXv+λ1ΩOGL1(u) (19) +λ2ΩOGL1(v)+η1uTu+η2uTv,

where parameters , , and are Lagrange multipliers. Inspired by [35], we develop an alternating iterative algorithm to minimize it. That is, we optimize the above problem with respect to by fixing and vice versa. Since and are symmetrical in problem (19), we only need to consider a subproblem with respect to as follows:

 minimizeu  −uTz+λΩOGL1(u)+η∥u∥2, (20)

where . Since the overlapping Lasso penalty is a convex function [36]. We can apply ADMM [33, 34] to solve the above problem (20). To obtain the learning algorithm of (20), we first introduce an auxiliary and redefine the above problem as follows:

 minimizeu −uTz+λL∑l=1wl∥y(l)∥2+η∥u∥2 (21) subject to y(l)=uGl, l=1,⋯,L.

So the augmented Lagrangian of (21) can be written as follows:

 Lρ(u,y,θ)= −uTz+η∥u∥2+L∑l=1θ(l)T(y(l)−uGl) (22) +λL∑l=1wl∥y(l)∥2+ρ2L∑l=1∥y(l)−uGl∥22,

where Lagrange multipliers and are two column vectors with non-overlapping groups. For convenience, we first define some column-vectors , and (), and they have the same size and group structures as , where meets that if and otherwise; meets that if and otherwise; meets that if and otherwise. Note that , and () respectively represent the elements of -th group of , and . Thus, we have and . So we can obtain the gradient equations with respect to in Eq. (22) as follows:

 ∇uLρ=2ηu−z−L∑l=1~θ(l)+ρ(L∑l=1~e(l))∙u−ρL∑l=1~y(l)=0, (23)

where “” performs element-by-element multiplication. Thus, we can obtain the update rule for and ensure it is a unit vector:

 u←ˆu∥ˆu∥, where ˆu=z+L∑l=1~θ(l)+ρL∑l=1~y(l). (24)

We also obtain the subgradient equations (see [31]) with respect to in Eq. (22) as follows:

 ∇y(l)Lρ=λwl⋅s(l)+θ(l)+ρ(y(l)−uGl)=0, (25)

where , if , then , otherwise is a vector with . For convenience, let , we thus develop a block coordinate descent method to learn Lagrange multipliers . Since () are independent. Thus, () can be updated in parallel according to the following formula:

 y(l)←⎧⎪⎨⎪⎩1ρ(1−λwl∥t(l)∥2)t(l),if ∥t(l)∥2>λwl,0,otherwise. (26)

Based on ADMM [34], we also obtain the update rule for as follows:

 θ(l)←θ(l)+ρ(y(l)−uGl),  l=1,⋯,L. (27)

Combining Eqs. (24), (26) and (27), we thus get an ADMM based method to solve problem (21) (Algorithm 3). Note that the output of Algorithm 3 is a set of selected group indexes, defined as . For example, if , , , and , then .

In summary, based on the ADMM algorithm (Algorithm 3), we adopt an alternating iterative strategy (Algorithm 4) to solve SVD(,). In Algorithm 4, the operation denotes if group , then , and the remaining elements of are zero.

### 2.4 OGL0-Svd

Here we define an overlapping group -norm penalty () of as follows:

 ΩOGL0(u)=minimizeJ⊆Gu, supp(ϕ(u))⊆J L∑l=11(∥uGl∥≠0), (28)

where denotes the index set of non-zero elements for a given vector.

Based on the definition of , we propose the fourth group-sparse SVD model with overlapping group -norm penalty (-SVD) as follows:

 minimizeu,v,d ∥X−duvT∥2F (29) subject to ∥u∥2≤1,ΩOGL0(u)≤ku, ∥v∥2≤1,ΩOGL0(v)≤kv.

Similarly, we solve the above problem by using an alternating iterative method. Fix (or ), we transform the original optimization problem into a projection problem with overlapping group -norm penalty.

Fix in problem (29) and let , thus the problem can be written into a projection problem with overlapping group -norm penalty:

 minimize∥u∥≤1 −zTu,  s% .t. ΩOGL0(u)≤ku. (30)

To solve the above problem, we introduce and obtain the above problem in a new way:

 minimize∥u∥≤1,y −zTu,  s.t. ΩGL0(y)≤ku, y(l)=uGl, (31)

where and .

The above problem contains overlapping group-sparse induced penalty with -norm. Thus, it is difficult to solve the exact solution of problem (31). To this end, we use an approximate method, which replaces by using . Since in problem (31), we have . Thus, problem (31) approximately reduces to the below problem,

 minimize∥u∥≤1,y −∑lzTGly(l),  s.t. ΩGL0(y)≤ku, y(l)=uGl. (32)

Since contains a non-overlapping structure, we can easily get the optimal solution of the above problem on and . To sum up, we obtain an approximate solution of (30) as Theorem 2 suggests.

###### Theorem 2.

The approximate solution of (30) is