Robust Integrative Biclustering for Multi-view Data

by   W. Zhang, et al.
University of Minnesota

In many biomedical research, multiple views of data (e.g., genomics, proteomics) are available, and a particular interest might be the detection of sample subgroups characterized by specific groups of variables. Biclustering methods are well-suited for this problem as they assume that specific groups of variables might be relevant only to specific groups of samples. Many biclustering methods exist for identifying row-column clusters in a view but few methods exist for data from multiple views. The few existing algorithms are heavily dependent on regularization parameters for getting row-column clusters, and they impose unnecessary burden on users thus limiting their use in practice. We extend an existing biclustering method based on sparse singular value decomposition for single-view data to data from multiple views. Our method, integrative sparse singular value decomposition (iSSVD), incorporates stability selection to control Type I error rates, estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. Simulations and real data analyses show that iSSVD outperforms several other single- and multi-view biclustering methods and is able to detect meaningful biclusters. iSSVD is a user-friendly, computationally efficient algorithm that will be useful in many disease subtyping applications.



There are no comments yet.


page 3

page 33

page 34

page 35


Double-matched matrix decomposition for multi-view data

We consider the problem of extracting joint and individual signals from ...

Multi-view Clustering with the Cooperation of Visible and Hidden Views

Multi-view data are becoming common in real-world modeling tasks and man...

Latent Complete Row Space Recovery for Multi-view Subspace Clustering

Multi-view subspace clustering has been applied to applications such as ...

SMSSVD - SubMatrix Selection Singular Value Decomposition

High throughput biomedical measurements normally capture multiple overla...

Flexible Low-Rank Statistical Modeling with Side Information

We propose a general framework for reduced-rank modeling of matrix-value...

ScreeNOT: Exact MSE-Optimal Singular Value Thresholding in Correlated Noise

We derive a formula for optimal hard thresholding of the singular value ...

Conjoined Dirichlet Process

Biclustering is a class of techniques that simultaneously clusters the r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Biclustering (or two-way clustering, co-clustering, two-mode clustering) is a popular statistical method for simultaneously identifying groups of samples (rows) and groups of variables (columns) characterizing different sample groups. These clusters of rows and columns are known as biclusters. Biclustering methods are especially appealing for complex disease subtyping as they seek to identify homogeneous subgroups of people characterized by highly specific groups of biological features. A main limitation of one-way clustering algorithms such as hierarchical or k-means clustering when applied to high-dimensional molecular data for disease subtyping is that cluster assignment of samples is based on the assumption that all molecular features are relevant to the sample groups or disease subtypes. But specific groups of genes, for instance, may be co-regulated within one disease subtype, and not another subtype. In such cases, biclustering methods are well-suited.

Generally speaking, a biclustering algorithm finds the associations between observations (rows) and attributes (columns) in a data matrix. More recently, because of the availability of three-dimensional data such as gene-sample-time data in biomedical research, a number of tri-clustering methods have been proposed to identify homogeneous three-dimensional (3D) subspaces in a given 3D data set. As noted by Henriques and Madeira Henriques and Madeira (2018), tri-clustering algorithms face a number of major challenges such as robustness and efficiency. In this paper we focus on biclustering, which aims to identify row-column associations in a two-dimensional (2D) data matrix.

A number of biclustering methods have been proposed over the past two decades and they can be broadly categorized into four groups: a) combinatorial methods such as CTWC (Getz et al., 2000), OPSM (Ben-Dor et al., 2002), BIMAX (Prelić et al., 2006), association analysis based RAP (Pandey et al., 2009), COALESCE (Huttenhower et al., 2009), QUBIC (Li et al., 2009), and QUBIC2 (Xie et al., 2020); b) probabilistic and generative methods such as SAMBA(Tanay et al., 2002), FABIA (Hochreiter et al., 2010), BicMix (Gao et al., 2016), COBRA (Chi et al., 2017), GBC (Li et al., 2018), and plaid models (Lazzeroni and Owen, 2002); c) matrix factorization approaches that include SSVD (Lee et al., 2010), S4VD (Sill et al., 2011), biclustering via non-negative matrix factorization (Tepper and Sapiro, 2016), and BEM (Liang et al., 2020)

; and d) deep learning with neural networks such as AutoDecoder

(Sun et al., 2013). Other biclustering methods have been proposed to tackle specific problems such as missing data (Li et al., 2020), heterogeneous and temporal medical data (Vandromme et al., 2020), and heterogeneous data from multiple views (Li et al., 2018).

So far, most existing biclustering methods aim to detect row-column clusters in a single data matrix (i.e., single-view or data from one view). The integration of multi-view data in biomedical research has garnered considerably interests nowadays, thanks to advancements in technology and data preprocessing (Zhao et al., 2017). Such methods exploit the strengths in individual data as well as the overall dependency structure among multiple views. Biclustering methods that leverage the rich information in diverse but related data (e.g., genomics, proteomics) have potential to identify multidimensional view-specific features characterizing sample subgroups common to all views. Figure 1 is a pictorial illustration of biclustering for two views. For instance, by integrating genetic and clinical data, we leverage the strengths in molecular data and the advantages of clinical data to define homogeneous groups of people with common molecular information and clinical factors likely contributing to disease outcome. Yet, there are only few biclustering methods that have been developed for data from multiple views (Bunte et al., 2016; Sun et al., 2015, 2014; Li et al., 2018). For data from two views, existing tri-clustering algorithms could be used to identify homogeneous subspaces. Bunte et al. (Bunte et al., 2016) developed a Bayesian approach for joint biclustering of data from multiple views that is based on group factor analysis. In Sun et al. (2014) (Sun et al., 2014), a multi-view biclustering method based on sparse singular decomposition (Lee et al., 2010) was proposed. The authors in Sun et al. (2015) (Sun et al., 2015) proposed a multi-view biclustering method that is based on low-rank matrix approximation and a proximal alternating linearized minimization algorithm was developed to solve their optimization problem. Recently, a Bayesian biclustering method for integrating data from multiple views that allow for each data to have a different distribution has been proposed(Li et al., 2018).

Figure 1: Pictorial illustration of integrative biclustering with two views. A bicluster is comprised of rows (or samples) and columns (or variables) from each view, with the samples in view 1 being the same as the samples in view 2. The samples in each bicluster can overlap or not. Similarly, the variables in each bicluster within a view can overlap or not. In this figure, the samples and variables in each bicluster do not overlap.

The sparse singular value decomposition (SSVD) biclustering method for single-view data (Lee et al., 2010)

obtains a sparse rank-one approximation of the data. To obtain the first bicluster, the authors minimize the Frobenius norm between the data and the rank one approximation of the data while regularizing both the left and right singular vectors using adaptive lasso penalties

(Zou, 2006). The degree of sparsity of the singular vectors depend strongly on the choice of the regularization parameters. Several techniques including Bayesian information criterion (BIC), Akaike Information criterion (AIC) and cross-validation have been proposed in the literature to select regularization parameters. The Bayesian information criterion was used to choose the optimal tuning parameters in Lee et al. (2010) (Lee et al., 2010). However, the authors in Sill et al. (2011) (Sill et al., 2011) observed that the BIC oftentimes resulted in low degree of sparsity and they proposed to use stability selection (Meinshausen and Bühlmann, 2010) to choose the regularization parameters and to determine stable biclusters. Stability selection is a subsampling procedure that was originally proposed to select stable variables for penalized regression models and since then it has been used successfully in other settings. This approach allows for choosing penalization parameters and further controlling Type I error rates.

In this article, we extend the SSVD method (Lee et al., 2010) and the SSVD method with stability selection (Sill et al., 2011) to identify biclusters in multi-view data. Our main contribution, compared to other multi-view biclustering methods (Sun et al., 2015, 2014; Li et al., 2018) is that we use stability selection to identify stable and robust subject and variable clusters. We note two other contributions to existing biclustering methods that we believe are important. Compared to the methods in Sill et al. (2011) (Sill et al., 2011), Sun et al. (2014) (Sun et al., 2014), and Sun et al. (2015) (Sun et al., 2015), we estimate subsequent biclusters using the whole data. To ensure that each sample belongs to only one subject cluster, the aforementioned methods use only unclustered samples when deriving subsequent sample and column clusters. This is concerning since smaller sample sizes are used to estimate subsequent biclusters. We track samples clustered and assign weights that ensure that clustered samples have zero coefficients in subsequent biclusters. Also, we develop efficient and user-friendly algorithms for the proposed method. We use extensive simulations to compare the performance of our proposed method in both biclusters detection and computational time. We apply the method to RNA sequencing and proteomics data from the Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study (COPDGene) (Regan et al., 2011) to identify subject and molecular clusters. The sample groups identified are compared across several demographic, clinical, and lung function variables.

The rest of the paper is organized as follows. In Section 2, we introduce the proposed method. In Section 3, we conduct simulation studies to assess the performance of our method in comparison with other methods in the literature. In Section 4, we apply our proposed method to a real data. We end with some brief discussion in Section 5.

2 Methods

Our primary goal is to define a biclustering method that leverages the wealth of information from diverse but related data to identify stable and robust sample clusters characterized by sample-specific variables. In this section, we first briefly summarize previous biclustering methods for one view that are based on singular value decomposition (SVD) which motivates our proposed work. Then, we extend previous work to multiple views. Finally, we incorporate the concept of stability selection to identify robust biclusters and control Type 1 error rates of falsely selecting samples and variables in a bicluster.

2.1 Biclustering for a single view

Let be a data matrix where represents samples and variables. We assume that the data can be approximated by sparse rank () left and right singular vectors and and corresponding singular value , i.e., . Then, the non-zero entries in represent a sample subgroup in the th left singular vector. Similarly, the non-zero entries in represent a variable subgroup in the th right singular vector. Together, these represent the th bicluster. Lee et al. (Lee et al., 2010) proposed to obtain the best sparse rank one approximation of the data by solving the following optimization problem:


Here, the first term measures the approximation error using Frobenius norm of the difference; are sparsity-inducing terms, and and control the level of sparsity. For fixed , the authors minimize (2.1) with respect to . Similarly, for fixed they minimize (2.1) with respect to . Adaptive lasso penalties (Zou, 2006) were used to induce sparsity on the vectors, i.e.,


where ’s and ’s are adaptive weights (obtained from the data itself). The adaptive lasso penalties become lasso penalties when , . To obtain and , Lee et al. (Lee et al., 2010) iteratively solved the optimization problem (2.1) until convergence. The estimated singular value is then . The reconstructed matrix, after convergence, given by is one sparse SVD layer. The non-zero entries in and form the first bicluster. Subsequent biclusters may be obtained by a sparse rank one approximation of the deflated data (), i.e., by repeatedly solving problem (2.1) using deflated data.

2.2 Biclustering for multiple views

We extend the biclustering problem to data from multiple views. Suppose that data are available from different views, and each view is arranged in an matrix , where the superscript correspond to the th view. For instance, for the same set of individuals, matrix consists of RNA sequencing data and consists of proteomics data, for views. We wish to cluster the variables, and the subjects so that each subject subgroup is associated with specific variable subgroup of relevant variables. Similar to the method proposed by Liu et al. (Liu et al., 2016), we posit an unobserved common factors and view-specific factors that approximate each view, i.e., . Here, is an matrix of common latent components that connects the views and induces dependencies across the views. Following problem (2.1), the best sparse rank one approximation for the views may be obtained by solving the optimization problem:


In order to obtain sparse approximations, we use lasso penalties (Tibshirani, 1996) on both and for . That is,


where and are regularization parameters. The penalties are sums of absolute values of the elements in the first singular vectors. For fixed , minimizing (2.2) is equivalent to minimizing equations with similar forms, with respect to :


The solution of this optimization problem may be obtained by soft-thresholding (Lee et al., 2010):


Then, is an estimate for the product of the first right singular vector and the first singular value. As in Lee et al. (2010)Lee et al. (2010), we obtain an estimated sparse rank one right singular vector for the th view as . The rank one sparse estimate for the left singular vector can be obtained in a similar way by concatenating the views. Let be the concatenated data. Also, let be a collection of the rank one right singular vectors for all views, i.e., . For fixed, we solve optimization problem (2.2) for :


where and is the concatenated data set. Using soft-thresholding again, the component-wise solution of (7) is


As before, the corresponding sparse rank one left singular vector is . Subsequent update of is given by . Then we deflate the th data as for . For subsequent biclusters, we repeatedly solve problems (5) and (7) using deflated data, i.e. we find sparse rank one approximations of the deflated data.

The regularization parameters, and control the degree of sparsity (i.e., the number of non-zero elements in and must be chosen). Lee et al. (Lee et al., 2010) proposed a Bayesian Information Criterion (Schwarz et al., 1978) to obtain the optimal regularization parameters. Sill et al. (Sill et al., 2011) proposed to use stability selection techniques for robust biclusters. Similar to that, we choose the regularization parameters using stability selection (Meinshausen and Bühlmann, 2010).

2.3 Multi-view biclustering with stability selection

Stability selection method, proposed by Meinshausen and Bühlmann(Meinshausen and Bühlmann, 2010), has been used for variable selection Tibshirani (2011) problems such as regularized regression and even for sparse singular value decomposition (Sill et al., 2011). Stability selection essentially combines resampling with variable selection so that the probability that a variable is selected is based on its relative frequency. Meinshausen and Bühlmann(Meinshausen and Bühlmann, 2010) provide a theoretical justification to show that by selecting variables based on the maximum of these probabilities, the Type 1 error rates of falsely selecting variables is controlled. For completeness sake, we briefly summarize the stability selection method and we describe how we use it in our application.

We consider estimating the left singular vector and inferring the non-zero coefficients or identifying samples that form a sample cluster. We subsample variables in each view times without replacement, while ensuring that each view contains the same set of samples. For each regularization parameter from a set of regularization parameters , we solve the optimization problem (7) for each subsampled data set. Each leads to a different set of non-zero coefficients. We estimate the selection probability for each sample as the number of times sample is selected from applications of equation (7) for a fixed . Denote the selection probability corresponding to for sample as . Then for an arbitrary threshold, , the stable path for , , (the set of stable samples) is the set of non-zero coefficients with selection probabilities at least . Essentially, samples with high selection probability are kept, and those with low selection probabilities are disregarded. Then, given the union of the selected samples from all , we can estimate the average number of selected samples (i.e., non-zero coefficients) for the regularization region , denoted as . From Theorem 1 in Meinshausen and Bühlmann (2010) (Meinshausen and Bühlmann, 2010), the expected number of falsely selected samples, with stability selection is bounded by:


Thus, by reducing the average number of selected samples (i.e., ) or by increasing the threshold , we reduce the expected number of falsely selected samples or the per-family error rate, or if we divide by , the per-comparison error rate (PCER) (Dudoit et al., 2003). It is noted by the authors that the threshold value range tend to yield similar results. For fixed, if we choose the average number of selected samples to be at most (and hence the regularization region ), we control the family-wise error rate for some . Following ideas in Sill et al. (2011) (Sill et al., 2011), we estimate with the smallest regularization parameter value in the regularization region that ensures that . Thus, the component-wise estimate for is given by:


We estimate the right singular vectors and infer the non-zero coefficients or variables that form a variable cluster in a similar way. Specifically, for each possible , we draw subsamples without replacement and we estimate the selection probabilities for each variable as the number of times variable is selected from applications of equation 5. Denote the selection probability corresponding to for variable as . Given a threshold and the desired Type 1 error value , we obtain the regularization region such that , where . Then the stable set for , , are the non-zero coefficients or variables with selection probabilities at least . Given the smallest regularization parameter value in the regularization region, the component-wise estimate for is given by:


At convergence, the components of become , where is an indicator function. Similarly, the components of become .

The algorithm iterates between and until there is convergence. Refer to Algorithm 1 for more details. For convergence, we estimate 1) the relative difference between the objectives ( at previous and current iterations, and 2) . The algorithm converges if either 1) or 2) is less than a pre-specified threshold (e.g., 0.0001).

We note that our method, referred to as integrative sparse singular value decomposition (iSSVD), is different from the one proposed in Liu et al. (2016) because we are concerned with the problem of simultaneously identifying row (sample) and column (variable) clusters. As such, we regularize both and , while Liu et al. (Liu et al., 2016) regularize and apply -means clustering on after convergence. Our approach allows us to define subgroups in rows and columns of our data simultaneously and makes it appealing to identify sample subgroups characterized by specific groups of variables. Further, compared with several existing methods (Liu et al., 2016; Shen et al., 2009; Sun et al., 2014, 2015), we use stability selection to identify stable and robust sample and variable clusters, while controlling for Type 1 error of falsely selecting samples and variables in a bicluster. In addition, to ensure that each sample belong to only one subject cluster, the authors in Sun et al. (2014, 2015) and Sill et al. (2011) proposed to use only unclustered samples when estimating subsequent biclusters. This is concerning to us since smaller sample sizes are used to estimate subsequent biclusters. Instead, we track samples that are clustered and we assign weights to ensure that those samples have zero coefficients in subsequent biclusters. We do the same for variable clusters if it is desired to have non-overlapping variable clusters. Of note, concatenating the views and applying the biclustering method with stability selection proposed by Sill et al. Sill et al. (2011) assumes that the regularization parameters are the same for each view, i.e., . This assumption may result in choosing tuning parameters that are either too small or too large for a particular view; this can lead to a solution that is trivial or not sparse, and can inflate Type 1 error (refer to Supplementary Material for more details).
Remark 1 Point-wise Control: Searching for a plausible range of tuning parameters is computationally demanding. For computational efficiency, we implemented the point-wise control methods (Meinshausen and Bühlmann, 2010; Sill et al., 2011). Specifically, we considered a path that shortens the time to find the optimal or . We adopted the point-wise error control approach implemented in s4vd (Sill et al., 2011) and expanded it to be feasible for multi-view data. For example, if solving for , we can look for a regularization path with a single tuning parameter and draw subsamples to calculate the average number of selected coefficient , then we can estimate the selection threshold by


We define a region for the threshold to be and we search for a that falls into this range. We start from the median value of the tuning parameter range, and this range is updated based on the reconstructed threshold. Next calculation uses the median value of the new tuning range and this continues until the reconstructed threshold satisfies the aforementioned range. Thus, instead of calculating the entire stability paths in each iteration, the algorithm finds appropriate parameters with fewer calculations. We incorporated this algorithm into iSSVD and compared the run time with s4vd.

Remark 2 Choosing the number of biclusters: To choose the number of biclusters (i.e., ), we implemented the following approaches: a.) For each view, we calculate the proportions of variation explained by its singular values and select the number of singular values associated with the variation proportion that is larger than a threshold (for example larger than 70 percent); then we set the number of biclusters to be the maximum number of singular values plus one. b.) The user can specify the number of biclusters to be detected beforehand. Then the algorithm will set the maximum number of biclusters to be the smaller number from either a.) or b.). Table S2.3 gives the proportion of the true number of biclusters detected using the first criteria.

Input: , (optional), Type 1 errors and , Output: , , ,   and are vectors of bicluster membership. Samples that are not clustered will form a zero cluster.
1 Initialize: Apply standard SVD to the concatenated data . Let be the first SVD triplet. (Note: ). repeat
2          Solve for , and hence begin
3                   (a) Draw subsamples and estimate . Define such that . (b) Estimate the set of non-zero sample coefficients, (c) Solve for using equation (10). Let , and (d) Set
4          end
6         for  do
7                   Solve for , and hence begin
8                            (a) Draw subsamples and estimate . Define such that . (b) Estimate the set of non-zero variable coefficients, (c) Solve for using equation (11). Let , (d) Set ;
9                   end
11          end for
until  convergenceSet ;  ; For subsequent biclusters repeat steps 3 to 22 using deflated data. Stop if either or
Algorithm 1 Integrative Sparse Singular Value Decomposition Algorithm with Stability Selection (iSSVD)

3 Simulations

We consider two main scenarios to assess the proposed method in detecting biclusters from multi-view data. In both Scenarios, we simulate two views and . In Scenario One, we allow for some samples to not belong to any sample cluster. In Scenario Two, each sample belongs to a bicluster. This Scenario is especially relevant in disease subtyping where it is important that each sample belongs to only one sample cluster. In each Scenario, we generate fifty Monte Carlo datasets for each view. The parameter settings used can be found in supplementary material Table S1.5.

3.1 Scenario One

In this Scenario, data matrices and are generated, respectively, as


Hence, the concatenated data is . We set the number of biclusters and the dimensions of data to be and . Each bicluster has 10 rows and 100 columns. As such, there are only 40 samples that belong to the sample clusters with the remaining 60 samples not belonging to any cluster. Similarly, there are 400 signal variables characterizing the sample clusters, and the remaining 600 variables are noise. The left singular matrix is the common left singular matrix for the two views. Since we design four integrative biclusters, we randomly select 10 rows in each column of the first four columns in matrix

and assign data values generated from a uniform distribution

, while ensuring there is no overlapping samples. The remaining rows in the first four columns are assigned zero values. For the entries of the remaining

columns, we use data generated from the normal distribution with mean 0 and variance 1. We obtain the right singular matrix for each of the two views,

, as follows. We randomly select 100 rows in each column of the first 4 columns in matrix and fill the corresponding elements from the uniform distribution , while also ensuring there are no overlapping rows (correspondingly no overlapping variables characterizing the sample clusters). As before, we fill out the entries of the remaining columns in with data generated from the standard normal distribution. For the singular values , we set the first four entries to 27, 20, 18 and 10 respectively, and the rest with small values . Thus, where ; these singular values were chosen to make the biclusters detectable. Therefore, each view is reconstructed as . We then add random noise generated from a normal distribution with mean 0 and variance to each view. We will assess the performance of the method for small to large variances. Since the two views often tend to have different scales in real data analysis, we consider different scalings for , as done in Liu et al. (2016).

3.1.1 Case 1

In this case, the two views have different scales. Specifically, the concatenated data is of the form where ; this allows us to investigate the performance of the proposed and existing methods in situations of unbalanced scales. This is commonly the case in multi-view data and it can be challenging for single-view methods.

3.1.2 Case 2

Case 2 is similar to Case 1 but we fix the scalar

and study the performance of a method under different levels of noise. We vary the standard deviation


3.1.3 Case 3

In Case 3 we expand the dimensions of data to and . Here, each bicluster has 50 rows and 200 columns. The remaining rows do not belong to any cluster, and the remaining , , variables are noise. We use this case to study the computational efficiency when .

3.2 Scenario Two

In the first Scenario, we allowed for sample overlaps. In this scenario, each sample belongs to a bicluster. As before, we have two views and they have dimensions , . There are four integrative biclusters for each view; each sample cluster has 50 samples, each variable cluster has 100 variables and the remaining variables are noise. The singular matrices are , and , and . The fifty entries for each column in are generated from the uniform distribution U(0.5,1), while ensuring that there are no overlapping entries. The remaining 150 entries in for each column is set to . For the right singular matrix, , , we generate 100 entries for each column from U(0.5,1). The remaining entries in each column are set to zero. For the singular values, we set it to 27, 20, 18 and 10; thus . The entries in and are all generated as random samples from , where .

3.3 Competing Methods

We compare the performance of our method with four biclustering methods for multi-view data and one biclustering method with stability selection developed for data from one view. For the multi-view-based methods, we consider the proximal co-clustering (Sun et al., 2015) [mvProx], the multi-view svd [mvSVD] (Sun et al., 2014), and the generalized biclustering (GBC) (Li et al., 2018) methods. We perform mvProx and mvSVD using the R-package mvcluster (Version 1.0). This R package includes the mvSVD method as well as two proximal co-clustering methods using -norm regularization [mvProxL1] and -norm regularization [mvProxL0] respectively. The algorithms from the R package mvcluster (mvSVD, mvProxL0 and mvProxL1) detect one integrative bicluster at a time. To detect subsequent biclusters for the mvcluster (mvSVD, mvProxL0 and mvProxL1) methods, we follow suggestions in Sun et al. (2015, 2014) and we manually subset the data after each run and we delete the samples (rows) that are detected previously. We stack the views when applying the biclustering with stability selection [s4vd] method, and we use the R-package s4vd

(Version 1.1.1). GBC is a Bayesian biclustering method for detecting biclusters from multiple views that allow each view to have a different probability distribution. Please refer to Section 1 in the supplementary material for description of these methods. The simulations have been carried out using the Minnesota Supercomputing Institute Mangi compute cluster. Simulations of mvSVD, mvProxL0, mvProxL1, and s4vd have been implemented with R 4.0.4, and simulations of iSSVD have been implemented with Python 3.7.

3.4 Evaluation Criteria

We evaluate the proposed and existing methods based on bicluster similarity measures, F-score, and variable and sample selection. These are widely used measures in the statistical and machine learning literature for assessing biclustering methods

(Sill et al., 2011; Sun et al., 2013). For similarity measures, (i.e., similarity between the algorithm-generated biclusters and true biclusters from the same data) we consider bicluster relevance and recovery, which are defined as follows.

Suppose is the set of estimated biclusters and is the set of true biclusters, each containing a set of columns and a set of rows . Let and where and , and

denotes the Cartesian product of the sets of rows and columns. Then the Jaccard index for two biclusters, each obtained from the Cartesian product is

Similarly as in Sill et al. (2011), the average relevance and recovery scores are defined as:

Essentially, the relevance score shows how well the detected biclusters represent the true ones while the recovery score evaluates to what degree the true biclusters are recovered by the algorithm.

For a combined effect of relevance and recovery measures, we consider the F-score, the harmonic mean of average relevance and recovery

(Sun et al., 2013):

For samples and variables selected, we consider false positives (FP) and false negatives (FN). The FP is defined as the ratio of number of falsely selected non-zero elements outside of true bicluster against number of elements in the true bicluster. Conversely, the FN is defined as the ratio of number of non-zeros computed by the algorithm in the true bicluster against the number of elements in the true bicluster.

3.5 Simulation Results

3.5.1 Scenario One

Unbalanced scales
In the first case of scenario one, we vary the scalar to be 1, 2, 5 and 10 to evaluate the methods in situations of unbalanced scales. The average recovery and relevance scores are shown in Figure 2 for fixed noise level . For scalar , i.e. both views on the same scale, iSSVD and s4vd had higher average recovery scores, whereas mvSVD and GBC had lower scores and mvProxL0 and mvProxL1 performed worse. The performance of iSSVD is slightly better and more stable than s4vd based on the scores of average recovery, since the median is higher and the variability is lower. From the relevance scores (Figure 2, right panel), s4vd has a higher performance. This is because iSSVD sometimes identified more than 4 biclusters, and even though it detected all true biclusters, it also assigned some noise to biclusters. As we increase the scales from 2 to 10, the two views become more and more unbalanced. We observe that the average recovery scores of iSSVD are still higher, suggesting that our method can perform well when data are unbalanced. However, for s4vd and mvSVD, their abilities to detect biclusters decrease dramatically; the performance of GBC increased for and but decreased for . Furthermore, mvProxL0 and mvProxL1 perform poorly in all the situations. The average relevance scores of these methods show a similar trend but mvProxL0 tend to be better than mvProxL1. The relevance scores of iSSVD are suboptimal compared to recovery scores. The relevance scores of s4vd are higher in less unbalanced scales () settings. However, when the two views are more unbalanced, the biclusters detected by iSSVD are more representative of the true biclusters. In the unbalanced scales, s4vd mostly is able to detect the biclusters from the more dominant view, which in this case is the second view. Supplementary Figures S4-S11 give results for Scenario 1, for Case 1 and 2, when data are standardized, centered, or scaled. Compared to Figures 2 and 3, Figures S4-S11 suggest that centering as well as standardizing data so that each variable has mean zero and variance one prior to implementing the biclustering algorithms result in poor bicluster detection performance. Further, scaling each variable to have variance 1 or each View to have Frobenius norm 1 and implementing iSSVD yield results that are comparable to the original data.

Figure 2: Simulation results for case 1. The boxplots show the distributions of average relevance and average recovery over 50 randomly simulated multi-view data. Scalar indicates the scalar multiplied with the second view .

Varying noise level
The results for case 2 are shown in Figure 3 and Table S1.4 in the Supplementary Material. For noise level 0.1, the average recovery score for iSSVD is almost 1 whereas that of other methods is considerably lower. This indicates that iSSVD is able to detect all four integrative biclusters in the data. As the noise level increases so that the data become more corrupted, the performance of all methods deteriorate. When is larger than 0.5, the median of average relevance of iSSVD is below 0.5. In noisy settings, the methods tend to detect more samples and variables outside the true biclusters as signals and propose them as biclusters, while detecting the true biclusters less frequently. When is larger than 0.8, the average relevance scores are almost close to 0, indicating that the abilities of these methods to detect biclusters that are representative of the true biclusters are largely impaired by noise. The average relevance scores show a similar trend, but these are lower than the recovery scores; the methods start to assign noise as other biclusters after detecting the true ones since we set the maximal number of biclusters to be detected as 5. In situations of noisy data, the algorithms tend to assign noise as biclusters and are less capable to detect true hidden structures.

Figure 3: Simulation results for case 2. The boxplots show the distributions of average relevance and recovery over 50 randomly dimulated multi-view data. Sigmas indicate the value of noise level . Here we fix the scalar to be . The values of these indices and F-scores, FP and FN rates are described in supplementary material Table S1.4.

Run times
Next we show the runtimes measured in seconds for the biclustering algorithms in Supplementary Table S1.1. The time is captured from running every data set in Case 1, i.e. with dimensions and , and averaged out to get the runtimes of each data set. The algorithms are carried out using the Minnesota Supercomputing Institute Mangi compute cluster. We use pointwise control for iSSVD and s4vd. The speed of iSSVD is comparable with mvProxL0, mvProL1 and mvSVD which are written in C++, but s4vd took approximately 10 times longer to finish running the same job. GBC took approximately 70 times longer to execute the same job. In case 3, we use ultra-high dimensional setting to evaluate the computational efficiency of the algorithms. The dimensions of the data are and . On average, iSSVD takes about 2 to 3 minutes for one Monte Carlo simulation while s4vd takes more than 10 minutes (Table S1.2). Furthermore, mvSVD, mvProxL0 and mvProxL1 all have difficulty to converge in this situation. GBC is computationally expensive and thus not practical to run for this case.  

3.5.2 Scenario Two

The simulation results of Scenario Two are shown in Figure 4. We set the maximum number of biclusters to be detected to 5, which is similar to Scenario 1. When the noise level , iSSVD has higher relevance and recovery scores; the medians are almost 1. When , almost all samples are put into their right clusters by iSSVD and only approximately 8 samples are unclustered when (Table S2.1). For s4vd, the recovery scores are high but the relevance scores are significantly lower. This is because s4vd can only detect one integrative bicluster, remaining about three fourths of the samples unclustered. The performance of mvSVD is also worse than iSSVD. It leaves about one fourth of the samples unclustered and its scores for relevance and recovery are only around 0.5. Other multi-view methods could not detect meaningful integrative biclusters in this scenario, similarly to scenario one. As the noise level increases, s4vd has an unstable performance, as seen by the large variation in the boxplots. Meanwhile, the medians of iSSVD remain the highest among the five methods compared. We can still observe downward trend in the scores as increases because the data become messier. We also note that the average relevance scores for s4vd increases, albeit lower than iSSVD, but it has at least half of the samples unclustered over the range of (Table S2.1). For iSSVD, about half of the samples are unclustered when . Note that mvSVD can assign many samples to biclusters (as observed from the number of unclustered samples in Table S2.1) but it still achieves lower average relevance and recovery scores, since the samples are not correctly assigned to the true sample clusters.

Simulation results for Scenarios One and Two suggest that the proposed method, iSSVD, is better at detecting true biclusters, in both balanced and unbalanced scales settings, when compared to existing multi-view biclustering methods. Further, the proposed method is better for unbalanced scale settings when compared to the biclustering method with stability selection that is applicable to data from one view.

Figure 4: Simulation results for Scenario Two. The boxplots show the distributions of average relevance and recovery over 50 randomly simulated multi-view data. Sigmas indicate the value of noise level . Each randomly simulated data is of size and . Each bicluster in one view has 50 rows and 100 columns. We ensure that every sample (row) belongs to one of the four biclusters and that there are no overlaps. The average number of unclustered samples, F-scores, FP and FN rates are described in supplementary material.

4 Real Data Analysis

Chronic obstructive pulmonary disease (COPD) is a chronic progressive disease that affects at least 16 million adults in the US and presents a substantial economic and social burden (Wheaton et al., 2015). Like many complex diseases, COPD is characterized by a high degree of heterogeneity with many patients differing in their prognosis and response to therapy. Although the most common cause of COPD is cigarette smoking, not all smokers go on to develop COPD. Identifying subgroups of people who may be at risk for developing COPD, and who are characterized by specific molecular features is important to understand the heterogeneity in COPD and to improve outcomes for COPD patients. For instance, in Chang et al. (2016), gene expression data and a network-based clustering method was used to identify molecular subtypes for COPD.

In this article, we carried out a study that integrated RNA sequencing and proteomics data from the COPDGene Study to identify COPD subgroups using the proposed biclustering method. The original data set has 1,317 proteins and 21,669 genes. There were 463 samples with proteomics, RNA sequencing and clinical/dempographic variables. Prior to applying the proposed and existing methods, we used univariate filtering to reduce the dimensionality of the data. In particular, we regressed each gene and protein with FEV (percent predicted force expiratory volume in one second), and we retained genes and proteins with p-value after adjustment for age, gender, and race. After filtering, we had 229 proteins and 4,415 genes for our analyses. The final datasets for the proteomics and RNA sequencing data were and , respectively.

We applied our algorithm and the other algorithms considered in the simulation section to the filtered data. s4vd was applied to the concatenated filtered RNA sequencing and proteomics data. For iSSVD and s4vd, we allowed the algorithms to detect the number of biclusters. We also did not allow for overlaps in rows and columns; i.e., each sample and variable is assigned to only one bicluster. This will allow us to identify distinct patient clusters that are characterized by distinct biomarkers. Four biclusters were detected by iSSVD and 10 samples were unclustered. s4vd did not detect any stable cluster. For mvProxL0, mvProxL1, and mvSVD, we manually subset samples that are clustered when obtaining subsequent biclusters. In the end, 5 biclusters were obtained for mvProxL0, with 13 samples that were not clustered. mvSVD and mvProxL1 detected 4 and 2 biclusters, respectively. One sample was not clustered for mvProxL1. GBC identified 4 biclusters but many samples identified by GBC belonged to multiple clusters. To avoid overlaps and to facilitate comparisons, we made random bicluster assignment for samples that overlapped. Please refer to the Supplementary Table S1.6 for our parameter settings. GBC identified 5 biclusters.

Clinical characteristics of sample clusters identified

For all methods, we assigned the unclustered samples to the detected biclusters as follows. For each method, we obtained the first principal component (PC) from principal component analysis (PCA) of data for each sample cluster detected. That is, for each sample cluster, we used only the variables characterizing that sample to obtain the principal components. The first PC for a sample cluster explains the maximum variation and can be used to summarize the information for that cluster. We correlated the first PC for each sample cluster with data for each sample that was not clustered. We then assigned that sample to the bicluster with the highest correlation. Thus, in the end, all samples belonged to one bicluster. The first iSSVD bicluster consisted of 109 samples, 213 genes, and 27 proteins. The second bicluster was made of 111 samples, 195 genes, and 25 proteins. The third bicluster had 127 samples, 165 genes, and 25 proteins. The fourth bicluster comprised of 116 samples, 169 genes, and 19 proteins.

Compared to iSSVD, all other competing methods with the exception of mvProxL0, did not find meaningful sample subgroups from the data. The sample clusters detected by mvProxL1, mvSVD and GBC were not differentiated on lung function (as measured by FEV/FVC, FEV [% predicted]), demographics/clinical variables, and symptoms (Tables S4.1 - and S4.4 in the supplementary material). The sample clusters detected by mvProxL0 showed differences in some variables, but were not different across key lung function variables such as FEV/FVC and FEV [% predicted]). Compared to mvProxL0, the sample clusters detected by our method showed differences in more variables.

The four sample clusters identified by iSSVD where well-differentiated on some key demographic, clinical, and outcome variables such as age, FEV/FVC ratio, and BODE index (Table 1). The FEV/FVC ratio is widely used to diagnose COPD. FEV (forced expiratory volume in one second) is the volume of breath exhaled in one second, and it is used to gauge severity of COPD. The BODE (body mass index, obstruction, dyspnea, and excercise) index is a multidimensional assessment of an individual’s risk of death (Celli et al., 2004); higher values suggest increased risk. From Table 1, individuals in Biclusters 1, 3, and 4 have lower lung function as depicted by FEV compared to individuals in Biluster 2. Bicluster 1 has a lower mean FEV/FVC ratio and lower mean FEV value, followed by Biclusters 3 and 4. Individuals in Bicluster 2 have better lung function as can be observed by the higher FEV/FVC ratio, FEV percent predicted and lower mean dyspnea and BODE index values. Using the terminology in Chang et al. (2016), we refer to Bicluster 1 as the “severely affected” group, Bicluster 3 as the “moderately affected” group, and “Bicluster 4” as the less affected group. Also, we refer to Bicluster 2 as the “preserved lung function” group since the mean FEV/FVC ratio and FEV values were on average within normal limits. Bicluster 1, the severely affected group, also had a higher BODE index and a higher dyspnea score. They were more likely to be old, males and they tended to be previous smokers. Bicluster 3, the moderately affected group, had the next highest dypsnea score, BODE index, and frequency of exacerbation, followed by Bicluster 4, the less affected group. The preserved lung function group, Bicluster 2, was predominantly females, who were less likely to have COPD as defined by FEV/FVC ratio, reported less symptoms (e.g., lower emphysema as measured by Thirona and Percent 15), had lower mean age, and tended to be current smokers.

Five-year follow-up data were available for individuals. We observe that the subgroups identified by iSSVD are again well-differentiated on some key outcomes, clinical variables and symptoms (Table S4.5). Biclusters 1, 3, and 4 again had lower FEV/FVC, lower FEV percent predicted, higher BODE index and higher dyspnea score. The group with preserved lung function (Bicluster 2) had lower dyspnea score, lower BODE index, better lung function, and were less likely to have visited the emergency room (ER) or to be hospitalized for lung problems.

Subtype-specific biologic pathway and disease enrichment analysis for variable clusters characterizing sample clusters identified by iSSVD

We used the Ingenuity Pathway Analysis (IPA) software to investigate the molecular and cellular functions, pathways, and diseases enriched in the proteins and genes identified for each bicluster. IPA searches the ingenuity pathway knowledge base, which is manually curated from scientific literature and over 30 databases, for gene interaction. We focused on the variable clusters identified by iSSVD since the sample subgroups identified by this method were well-differentiated on many clinical, demographic, and outcome variables. We observed strong enrichment of functional pathways (Supplementary Material Tables S4.6-S4.7). Some of the significantly enriched canonical pathways that mapped to the gene and protein lists for the severely affected group (Bicluster 1) included IL-8 signaling, mTOR pathway, and Intrinsic Prothrombin activation pathway. The mTOR signaling pathway is involved in many cellular processes such as cell growth, metabolism, and survival (Jewell and Guan, 2013). Research suggests that the activation of the mTOR signaling pathway can induce cell senescence in the lung, which in turn can result in COPD (Houssaini et al., 2018).

Some pathways enriched for the moderately affected group (Bicluster 3) included airway pathology in COPD and oxidative phosphorylation. Also, the inflammasome pathway, airway pathology in COPD, and IL-17 signaling pathways were over-represented in our gene and protein lists for the less affected group (Bicluster 4). Some pathways enriched for the preserved lung function group (Bicluster 2) included taurine biosynthesis and enhanced cardiac hypertrophy signaling pathway. While there were overlaps in some of the enriched pathways for the clusters, there were also unique pathways identified.

In addition to IPA canonical pathways, proteins and genes were also categorized to related diseases and functions. Again, there were some overlaps in the top 5 enriched diseases and functions for the clusters (Tables S4.11 - S4.12). The severely affected group (Bicluster 1) was characterized by diseases that included cardiovascular (such as atherosclerosis, ventricular dysfunction and peripheral vascular disease) and inflammatory (such as chronic inflammatory disorder) diseases. Some of the diseases characterizing the moderately affected group (Bicluster 3) included cardiovascular (such as infarction and ischemia of brain) and cancer. The less affected group (Bicluster 4) was characterized by neurological (including abnormal regeneration by peripheral nervous system and cerebrovascular dysfunction) and hereditary disorder. Also, the preserved lung function group (Bicluster 2) was characterized by inflammatory response (such as inflammation of lung) and cancer. We also used IPA for network analysis to connect key genes, proteins, and enriched categories of diseases and functions. Our results showed that the severely affected group was characterized by the cardiovascular disease, organismal injury and abnormalities, hematological system development and function (based on our protein list). The preserved lung function group was characterized by the dermatological diseases and conditions, organismal injury and abnormalities, organismal development network (from our protein list).

Bicluster (N) 1 (109) 2 (111) 3(127) 4 (116) P-value
Demographics and Clinical
Age (years) 71.73 (7.00) 63.28 (8.21) 66.80 (8.47) 67.85 (8.04) <.00001
Body Mass Index 29.46 (5.80) 29.17 (6.01) 28.34 (6.56) 29.77 (6.55) 0.2333
Pack Years 50.77 (26.53) 39.57 (19.85) 43.94 (24.85) 46.46 (26.69) 0.01433
Duration of smoking (years) 37.82 (12.01) 36.22 (12.11) 36.60 (11.58) 36.76 (12.36) 0.8997
    Males 65 (60%) 52 (47%) 58 (46%) 60 (52%) 0.1402
    Females 44 (40%) 59 (53%) 69 (54%) 56 (48%)
Smoking Status
    Former 94 (86%) 66 (59%) 98 (77%) 94 (81%) <0.0001
    Current 15 (14%) 45 (41%) 29 (23%) 22 (19%)
Dyspnea Score (MMRC) 1.34 (1.45) 0.77 (1.17) 1.18 (1.28) 1.02 (1.27) 0.0048
BODE Index 2.20 (2.44) 0.90 (1.51) 1.48 (2.07) 1.18 (1.64) 0.0005
FEV/FVC 0.60 (0.17) 0.72(0.11) 0.67(0.16) 0.66(0.15) <0.00001
FEV (% predicted) 70.43(27.88) 85.23 (21.04) 78.63 (25.75) 77.07 (24.51) 0.0026
COPD status
    No 35 (34%) 56 (57%) 57 (51%) 48 (47%) 0.0096
    Yes 67 (66%) 42 (43%) 54 (49%) 54 (53%)
Exacerbation Frequency 0.29 (0.86) 0.19 (0.56) 0.20 (0.58) 0.16 (0.49) 0.6571
Percent Emphysema (Thirona) 9.66 (11.19) 3.45 (5.25) 8.52 (12.76) 7.26 (10.52) <0.0001
Percent 15 -931.71 (25.00) -919.20 (19.48) -926.60 (28.42) -925.15 (25.57) 0.0021
Ever had Asthma
    No 104 (95.4%) 109 (98.2%) 125 (98.4%) 112 (96.6%) 0.4798
    Yes 5 (4.6%) 2 (1.8%) 2 (1.6%) 4 (3.4%)
Gastroesophageal Reflux
    No 69 (63%) 86 (77%) 79 (62%) 74 (64%) 0.0473
    Yes 40 (37%) 25 (23%) 48 (38%) 42 (36%)
Been to ER or hospitalised for lung problems
    No 98 (90%) 106 (95.5%) 116 (91.3%) 108 (93.1%) 0.4274
    Yes 11 (10%) 5 (4.5%) 11 (8.7%) 8 (6.9%)
Table 1:

Clinical characteristics of iSSVD biclusters of COPD data. Values are mean (SD) for continuous variables, and N (percentages) for binary/categorical variables. P-values are from comparing all four groups using Kruskal-Wallis test for continuous variables, and Chi-square test for categorical variables. MMRC- modified Medical Research Council dyspnea score (0-4, 4= most severe symptoms). GOLD stages 1-4 are combined to form COPD cases; GOLD stage 0 form COPD controls.

5 Discussion and Conclusion

In this manuscript, we extended existing biclustering method based on sparse singular value decomposition for data from one view (Lee et al., 2010) to data from multiple views. Our method followed ideas in Sill et al. (2011) and incorporated stability selection (Meinshausen and Bühlmann, 2010), a subsampling based variable selection that allows to control Type I error rates. The proposed algorithm estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. The proposed algorithm, developed in Python 3, is computationally efficient and user-friendly and will be useful in many disease subtyping applications. Through simulation studies, we showed that the proposed method outperforms several other single- and multi-view biclustering methods in detecting artificial biclusters.

When our method was applied to RNA sequencing and proteomics data from the COPDGene study, we detected four biclusters that were well-differentiated by some demographics and clinical variables as well as key COPD outcomes. Three biclusters which we call severely, moderately, and less affected groups, seemed to have poor lung function and clinical outcomes, while one bicluster, which we call preserved lung function group seemed to have better (preserved) lung function and clinical outcomes. We also performed enrichment analysis of the genes and proteins characterizing the sample clusters. While certain biological processes were most enriched in specific biclusters, there was also notable overlap in processes across biclusters. Particularly enriched molecular and cellular functions included cellular movement, cellular growth and proliferation and cell death and survival.

The limitations of our work include the following. First, when applied to real data sets, a smaller preset Type-I error rate tends to yield small biclusters, which results in more unclustered samples. However, increasing the error rate might compromise the strength against noise. Second, we mention ways to choose the number of biclusters but these could be improved, especially in situations of noisy data. Third, we noticed that under regular settings, the run times of iSSVD and s4vd tend to be considerably longer than point-wise control settings. However, since we are not searching for the entire range of regularization parameters in point-wise control, it is likely that we overlook the optimal regularization parameters. Thus, future work can seek to improve the computational time to accommodate the computational demands for stability selection that searches the entire range of regularization parameters.

In conclusion, we have developed a biclustering method for multi-view data capable of detecting stable row and column clusters. The encouraging simulation and real data findings motivate further applications to identify disease subtypes and subtype-specific molecular features.

Funding and Acknowledgements

The project described was supported by grant 5KL2TR002492-03 from the National Institutes of Health and by Award Number 1R35GM142695-01 from the National Institute Of General Medical Sciences, Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. COPDGene is also supported by the COPD Foundation through contributions made to an Industry Advisory Board that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer- Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion.

Disclaimer: The views expressed in this article are those of the authors and do not reflect the views of the United States Government, the Department of Veterans Affairs, the funders, the sponsors, or any of the authors’ affiliated academic institutions.

Declaration of Conflicting Interests: The Authors declare that there is no conflict of interest.


  • Henriques and Madeira (2018) Henriques R and Madeira SC (2018) Triclustering algorithms for three-dimensional data analysis: A comprehensive survey. ACM Comput. Surv. 51(5). DOI:10.1145/3195833. URL
  • Getz et al. (2000)

    Getz G, Levine E and Domany E (2000) Coupled two-way clustering analysis of gene microarray data.

    Proceedings of the National Academy of Sciences 97(22): 12079–12084.
  • Ben-Dor et al. (2002) Ben-Dor A, Chor B, Karp R and Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the sixth annual international conference on Computational biology. pp. 49–57.
  • Prelić et al. (2006) Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L and Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9): 1122–1129.
  • Pandey et al. (2009) Pandey G, Atluri G, Steinbach M, Myers CL and Kumar V (2009) An association analysis approach to biclustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 677–686.
  • Huttenhower et al. (2009) Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG and Coller HA (2009) Detailing regulatory networks through large scale data integration. Bioinformatics 25(24): 3267–3274.
  • Li et al. (2009) Li G, Ma Q, Tang H, Paterson AH and Xu Y (2009) Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic acids research 37(15): e101–e101.
  • Xie et al. (2020) Xie J, Ma A, Zhang Y, Liu B, Cao S, Wang C, Xu J, Zhang C and Ma Q (2020) Qubic2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale rna-seq data. Bioinformatics 36(4): 1143–1149.
  • Tanay et al. (2002) Tanay A, Sharan R and Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl_1): S136–S144.
  • Hochreiter et al. (2010) Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W et al. (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12): 1520–1527.
  • Gao et al. (2016) Gao C, McDowell IC, Zhao S, Brown CD and Engelhardt BE (2016) Context specific and differential gene co-expression networks via bayesian biclustering. PLoS computational biology 12(7): e1004791.
  • Chi et al. (2017) Chi EC, Allen GI and Baraniuk RG (2017) Convex biclustering. Biometrics 73(1): 10–19.
  • Li et al. (2018) Li Z, Chang C, Kundu S and Long Q (2018) Bayesian generalized biclustering analysis via adaptive structured shrinkage. Biostatistics 21(3): 610–624. DOI:10.1093/biostatistics/kxy081. URL
  • Lazzeroni and Owen (2002) Lazzeroni L and Owen A (2002) Plaid models for gene expression data. Statistica sinica : 61–86.
  • Lee et al. (2010) Lee M, Shen H, Huang JZ and Marron J (2010) Biclustering via sparse singular value decomposition. Biometrics 66(4): 1087–1095.
  • Sill et al. (2011) Sill M, Kaiser S, Benner A and Kopp-Schneider A (2011) Robust biclustering by sparse singular value decomposition incorporating stability selection. Bioinformatics (Oxford, England) 27: 2089–97. DOI:10.1093/bioinformatics/btr322.
  • Tepper and Sapiro (2016) Tepper M and Sapiro G (2016) Compressed nonnegative matrix factorization is fast and accurate. IEEE Transactions on Signal Processing 64(9): 2269–2283. DOI:10.1109/TSP.2016.2516971.
  • Liang et al. (2020) Liang L, Zhu K and Lu S (2020) Bem: Mining coregulation patterns in transcriptomics via boolean matrix factorization. Bioinformatics 36(13): 4030–4037.
  • Sun et al. (2013) Sun H, Miao G and Yan X (2013) Noise-resistant bicluster recognition. In: 2013 IEEE 13th International Conference on Data Mining. IEEE, pp. 707–716.
  • Li et al. (2020) Li J, Reisner J, Pham H, Olafsson S and Vardeman S (2020) Biclustering with missing data. Information Sciences 510: 304–316.
  • Vandromme et al. (2020) Vandromme M, Jacques J, Taillard J, Jourdan L and Dhaenens C (2020) A biclustering method for heterogeneous and temporal medical data. IEEE Transactions on Knowledge and Data Engineering : 1–1DOI:10.1109/TKDE.2020.2983692.
  • Zhao et al. (2017) Zhao J, Xie X, Xu X and Sun S (2017) Multi-view learning overview: Recent progress and new challenges. Information Fusion 38: 43–54.
  • Sun et al. (2014) Sun J, Bi J and Kranzler HR (2014) Multi-view singular value decomposition for disease subtyping and genetic associations. BMC genetics 15(1): 73.
  • Sun et al. (2015) Sun J, Lu J, Xu T and Bi J (2015) Multi-view sparse co-clustering via proximal alternating linearized minimization. In: International Conference on Machine Learning. pp. 757–766.
  • Bunte et al. (2016) Bunte K, Leppäaho E, Saarinen I and Kaski S (2016) Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics 32(16): 2457–2463.
  • Zou (2006) Zou H (2006) The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476): 1418–1429.
  • Meinshausen and Bühlmann (2010) Meinshausen N and Bühlmann P (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72(4): 417–473.
  • Regan et al. (2011) Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, Curran-Everett D, Silverman EK and Crapo JD (2011) Genetic epidemiology of copd (copdgene) study design. COPD: Journal of Chronic Obstructive Pulmonary Disease 7(1): 32–43.
  • Liu et al. (2016) Liu B, Shen X and Pan W (2016) Integrative and regularized principal component analysis of multiple sources of data. Statistics in medicine 35(13): 2235–2250.
  • Tibshirani (1996) Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1): 267–288.
  • Schwarz et al. (1978) Schwarz G et al. (1978) Estimating the dimension of a model. The annals of statistics 6(2): 461–464.
  • Tibshirani (2011) Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(3): 273–282.
  • Dudoit et al. (2003) Dudoit S, Shaffer JP and Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statistical Science : 71–103.
  • Shen et al. (2009) Shen R, Olshen AB and Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22): 2906–2912.
  • Wheaton et al. (2015) Wheaton AG, Cunningham TJ, Ford ES and Croft JB (2015) Employment and activity limitations among adults with chronic obstructive pulmonary disease—united states, 2013. MMWR. Morbidity and mortality weekly report 64(11): 289.
  • Chang et al. (2016) Chang Y, Glass K, Liu YY, Silverman EK, Crapo JD, Tal-Singer R, Bowler R, Dy J, Cho M and Castaldi P (2016) Copd subtypes identified by network-based clustering of blood gene expression. Genomics 107(2-3): 51–58.
  • Celli et al. (2004) Celli BR, Cote CG, Marin JM, Casanova C, Montes de Oca M, Mendez RA, Pinto Plata V and Cabral HJ (2004) The body-mass index, airflow obstruction, dyspnea, and exercise capacity index in chronic obstructive pulmonary disease. New England Journal of Medicine 350(10): 1005–1012.
  • Jewell and Guan (2013) Jewell JL and Guan KL (2013) Nutrient signaling to mtor and cell growth. Trends in biochemical sciences 38(5): 233–242.
  • Houssaini et al. (2018) Houssaini A, Breau M, Kanny Kebe SA, Marcos E, Lipskaia L, Rideau D, Parpaleix A, Huang J, Amsellem V, Vienney N et al. (2018) mtor pathway activation drives lung cell senescence and emphysema. JCI insight 3(3).

Data Availability

The data used were provided by the COPDGene Study group. COPDGene clinical and RNASeq data are available on dbGap. We provide a Python package, iSSVD, to facilitate the use of our method. Its source codes, along with a README file are available via this link:

Supplementary Data

In the online Supplementary Materials, we provide more results from simulations and real data analyses.