Singular Value Decomposition (SVD) is one of the classical matrix decomposition models . It is a useful tool for data analysis and low-dimensional data representation in many different fields such as signal processing, matrix approximation and bioinformatics [2, 3, 4]
. However, the non-sparse singular vectors with all variables are difficult to be explained intuitively. In the recent years, sparse models have been widely applied in computational biology to improve biological interpretation[5, 6, 7]. In addition, many researchers applied diverse sparse penalties onto singular vectors in SVD and developed multiple sparse SVD models to improve their interpretation and capture inherent structures and patterns from the input data [8, 9]. For example, sparse SVD provides a new way for exploring bicluster patterns of gene expression data. Suppose denotes a gene expression matrix with genes and samples. Biologically, a subset of patients and genes can be clustered together as a coherent bicluster or block pattern with similar expressions. Previous studies have reported that such a bicluster among gene expression data can be identified by low-rank sparse SVD models [10, 11, 12]
. However, these sparse models ignore prior information of gene variables, and usually assume that each gene is selected in a bicluster with equal probability. Actually, one gene may belong to multiple pathways in biology. As far as we know, there is not yet a model for biclustering gene expression data by integrating gene pathway information. Group sparse penalties [14, 15] should be used to induce the structured sparsity of variables for variable selection. Several studies have explored the (overlapping) group Lasso in regression tasks [16, 17]
. However, little work focus on developing structured sparse SVD for biclustering high-dimensional data (e.g., biclustering gene expression data via integrating prior gene group knowledge).
In this paper, motivated by the development of sparse coding and structured sparse penalties, we propose several group-sparse SVD models for pattern discovery in biological data. We first introduce the group-sparse SVD model with group Lasso () penalty (-SVD) to integrate non-overlapping structure of variables. Compared to -norm, -norm is a more natural sparsity-inducing penalty. Thus, we also propose an effective group-sparse SVD via replacing -norm with -norm, called -SVD, which uses a mix-norm by combining the group Lasso and -norm penalty. However, the non-overlapping group structure limits their applicabilities in diverse fields. We consider a more general situation, where we assume that either groups of variables are potentially overlapping (e.g., a gene may belong to multiple pathways (groups)). We also propose two group-sparse SVD models with overlapping group Lasso (-SVD) and overlapping group -norm penalty (-SVD).
To solve these models, we design an alternating iterative algorithm to solve -SVD based on a block coordinate descent method and -SVD based on a projection method. Furthermore, we develop a more general approach based on Alternating Direction Method of Multipliers (ADMM) to solve -SVD. In addition, we extend -SVD to -SVD, which is a regularized SVD with overlapping grouped -norm penalty. The key of solving -SVD is also a proximal operator with overlapping group -norm penalty. We propose a greedy method to solve it and obtain its approximate solution. Finally, applications of these methods and comparison with the state-of-the-art ones using a set of simulated data demonstrate their effectiveness and computational efficiency. Extensive applications of them onto the high-dimensional gene expression data show that our methods could identify more biologically relevant gene modules, and improve their biological interpretations.
Related Work We briefly review the regularized low rank- SVD model as follows:
where with features and samples, , and is diagonal matrix. () corresponds to the -th column of (
), which is a column orthogonal matrix. To solve the above optimization problem, we introduce a general regularized rank-one SVD model:
where is a positive singular value, is a -dimensional column vector, and is a -dimensional column vector. and are two penalty functions, and
are two hyperparameters. In a Bayesian view, different prior distribution functions ofand correspond to different regularized functions. For example, -norm is a very popular sparsity-inducing norm  and has been used to obtain sparse solutions in a large number of statistical models including the regression model [18, 19], SVD , PCA , LDA 
, K-means, etc.
Recently, some sparse SVD models have been proposed for coherent sub-matrix detection [20, 10, 11]. For example, Witten et al.  developed a penalized matrix decomposition (PMD) method, which regularizes the singular vectors with Lasso and fussed Lasso to induce sparsity. Lee et al.  proposed a rank-one sparse SVD model with adaptive Lasso () (-SVD) of the singular vectors for biclustering of gene expression data. Some generalized sparsity penalty functions (e.g., group Lasso  and sparse group lasso 
) have been widely used in many regression models for feature selection by integrating group information of variables. However, it is a challenging issue to use these generalized penalty functions such as group Lasso and overlapping group Lasso[15, 25] in the SVD framework with effective algorithms. To this end, we develop several group-sparse SVD models with different group-sparse penalties including , , and to integrate diverse group structures of variables for pattern discovery in biological data (see TABLE 1).
2 Group-sparse SVD Models
In this section, we propose four group sparse SVD models with respect to different structured penalties (TABLE 1). For a given data (e.g., gene expression data), we can make proper adjustments to get one-sided group-sparse SVD models via using (overlapping) group-sparse penalties for the right (or left) singular vector. For example, SVD(, ) is a group-sparse SVD model, which uses the overlapping group -penalty for and -penalty for respectively.
Below we will introduce these models and their algorithms in detail.
Suppose the left singular vector and right singular vector can be respectively divided into and non-overlapping groups: and . Here, we consider the (adaptive) group Lasso () penalty  for and as follows:
where both and are adaptive weight parameters. Suppose and for group sizes, the penalty reduces to a traditional group Lasso.
Based on the definition of penalty, we propose the first group-sparse SVD with group Lasso penalty (-SVD), also namely SVD(, ):
Since . Minimizing is equivalent to minimizing , and once the and are determined, the value is determined by . We obtain the Lagrangian form of -SVD model as follows:
where , , and are Lagrange multipliers. To solve the problem (5), we apply an alternating iterative algorithm to optimize for a fixed and vice versa.
Fix and let , minimizing Eq. (5) is equivalent to minimizing the following criterion:
where and , for simplicity. It is obvious that is convex with respect , and we develop a block coordinate descent algorithm [27, 28, 29, 30] to minimize Eq. (6), i.e. one group of is updated at a time. For a single group with fixed for all and , the subgradient equations (see ) of Eq. (6) with respect to is written as:
where is the subgradient vector of and it meets
Based on Eq. (7), we have .
If , then we have . Since , , and . Thus, we have and .
In the same manner, we fix in Eq. (5) and let . Similarly, we can also obtain the coordinate update rule for .
Furthermore, to meet the normalizing condition, we chose an to guarantee . Besides, if here each group only contains one element, then the group Lasso penalty reduces to the Lasso penalty. Accordingly, we get another update formula:
2.1.3 -SVD Algorithm
Based on Eqs. (9) and (10), we propose an alternating iterative algorithm (Algorithm 1) to solve the -SVD model and its time complexity is , where is the number of iterations. We can control the iteration by monitoring the change of .
In order to display the penalty function for left and right singular vectors, -SVD can also be written in another form SVD(, ), denoting that the left singular vector is regularized by penalty and the right singular vector is regularized by penalty, respectively. Similarly, we can simply modify Algorithm 1 to solve SVD(, ) model, which applies Lasso as the penalty for .
Unlike penalty, below we consider a group -norm penalty () of and as follows:
where and .
Based on the above definition of penalty, we propose the second group-sparse SVD model with penalty, namely -SVD or SVD(, ):
Since . Fix and let , Eq. (13) reduces to a group-sparse projection operator with respect to :
We present Theorem 1 to solve problem (14).
The optimum solution of Eq. (14) is , where is a column-vector and meets
where is a sub-vector from the -th group, and denotes the set of indexes of the largest elements of .
2.2.3 -SVD Algorithm
Note that once the number of elements of every group equals 1 (i.e., for ), the group -norm penalty reduces to -norm penalty. Moreover, Algorithm 2 with a small modification can be used to solve SVD(, ), which applies -norm as the penalty for the right singular vector . In addition, compared to adaptive group lasso , we may consider a weighted (adaptive) group -penalty. We rewrite in Eq. (15), where is a weight coefficient to balance different group-size and it is defined by , and is the number of elements in group .
In some situations, the non-overlapping group structure in group Lasso limits its applicability in practice. For example, a gene can participate in multiple pathways. Several studies have explored the overlapping group Lasso in regression tasks [16, 17]. However, structured sparse SVD with overlapping group structure remains to be solved.
Here we consider the overlapping group situation, where a variable may belong to more than one group. Suppose corresponds to the row-variables of with overlapping groups and corresponds to the column-variables of with overlapping groups . In other words, and can be respectively divided into and groups, which can be represented by and . We define an overlapping group Lasso () penalty of as follows [15, 16, 32]:
where denotes the index set of non-zero elements for a given vector.
is a specific penalty function for structured sparsity. It can lead to the sparse solution, whose supports are unions of predefined overlapping groups of variables. Based on the definition of , we propose the third group-sparse SVD model as follows:
where and are two hyperparameters. We first introduce two latent vectors and . Let and set , which is a column vector with size of . Similarly, we can get based on . In addition, we can extend the rows and columns of of to obtain a new matrix with size of , whose row and column variables are non-overlapping. Thus, solving the problem (18) is approximately equivalent to solving a SVD(, ) for non-overlapping . We can obtain an approximate solution of (18) by using Algorithm 1. However, if a variable belongs to many different groups, it leads to a large computational burden. For example, given a protein-protein interaction (PPI) network, which contains about 13,000 genes and 250,000 edges. If we consider each edge of the PPI network as a group, then we would construct a high-dimensional matrix , which contains 500,0000 rows.
To address this issue, we develop a method based on alternating direction method of multipliers (ADMM) [33, 34] to directly solve problem (18). Similar with Eq. (5), we first redefine problem (18) with its Lagrange form:
where parameters , , and are Lagrange multipliers. Inspired by , we develop an alternating iterative algorithm to minimize it. That is, we optimize the above problem with respect to by fixing and vice versa. Since and are symmetrical in problem (19), we only need to consider a subproblem with respect to as follows:
where . Since the overlapping Lasso penalty is a convex function . We can apply ADMM [33, 34] to solve the above problem (20). To obtain the learning algorithm of (20), we first introduce an auxiliary and redefine the above problem as follows:
So the augmented Lagrangian of (21) can be written as follows:
where Lagrange multipliers and are two column vectors with non-overlapping groups. For convenience, we first define some column-vectors , and (), and they have the same size and group structures as , where meets that if and otherwise; meets that if and otherwise; meets that if and otherwise. Note that , and () respectively represent the elements of -th group of , and . Thus, we have and . So we can obtain the gradient equations with respect to in Eq. (22) as follows:
where “” performs element-by-element multiplication. Thus, we can obtain the update rule for and ensure it is a unit vector:
where , if , then , otherwise is a vector with . For convenience, let , we thus develop a block coordinate descent method to learn Lagrange multipliers . Since () are independent. Thus, () can be updated in parallel according to the following formula:
Based on ADMM , we also obtain the update rule for as follows:
Combining Eqs. (24), (26) and (27), we thus get an ADMM based method to solve problem (21) (Algorithm 3). Note that the output of Algorithm 3 is a set of selected group indexes, defined as . For example, if , , , and , then .
In summary, based on the ADMM algorithm (Algorithm 3), we adopt an alternating iterative strategy (Algorithm 4) to solve SVD(,). In Algorithm 4, the operation denotes if group , then , and the remaining elements of are zero.
Here we define an overlapping group -norm penalty () of as follows:
where denotes the index set of non-zero elements for a given vector.
Based on the definition of , we propose the fourth group-sparse SVD model with overlapping group -norm penalty (-SVD) as follows:
Similarly, we solve the above problem by using an alternating iterative method. Fix (or ), we transform the original optimization problem into a projection problem with overlapping group -norm penalty.
Fix in problem (29) and let , thus the problem can be written into a projection problem with overlapping group -norm penalty:
To solve the above problem, we introduce and obtain the above problem in a new way:
where and .
The above problem contains overlapping group-sparse induced penalty with -norm. Thus, it is difficult to solve the exact solution of problem (31). To this end, we use an approximate method, which replaces by using . Since in problem (31), we have . Thus, problem (31) approximately reduces to the below problem,
Since contains a non-overlapping structure, we can easily get the optimal solution of the above problem on and . To sum up, we obtain an approximate solution of (30) as Theorem 2 suggests.
The approximate solution of (30) is