MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

09/13/2018
by   Amichai Painsky, et al.
0

The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem for gene expression data sets, in which each row can only be a member of a single bicluster while columns can participate in multiple ones. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters through a combination of existing biclustering algorithms and combinatorial auction techniques. We devise an approach for tuning the threshold for our algorithm based on comparison to a null model in the spirit of the Gap statistic approach. We demonstrate our approach on both synthetic and real-world gene expression data and show its power in identifying large span non-overlapping rows sub matrices, while considering their unique nature. The Gap statistic approach succeeds in identifying appropriate thresholds in all our examples.

READ FULL TEXT

page 1

page 8

research
09/09/2020

Biclustering with Alternating K-Means

Biclustering is the task of simultaneously clustering the rows and colum...
research
08/30/2019

Network Elastic Net for Identifying Smoking specific gene expression for lung cancer

Survival month for non-small lung cancer patients depend upon which stag...
research
02/08/2020

Conjoined Dirichlet Process

Biclustering is a class of techniques that simultaneously clusters the r...
research
02/07/2020

Bidimensional linked matrix factorization for pan-omics pan-cancer analysis

Several modern applications require the integration of multiple large da...
research
11/30/2021

SurvODE: Extrapolating Gene Expression Distribution for Early Cancer Identification

With the increasingly available large-scale cancer genomics datasets, ma...
research
11/10/2017

A Novel Bayesian Multiple Testing Approach to Deregulated miRNA Discovery Harnessing Positional Clustering

MicroRNAs (miRNAs) are endogenous, small non-coding RNAs that function a...
research
09/25/2017

Mining a Sub-Matrix of Maximal Sum

Biclustering techniques have been widely used to identify homogeneous su...

Please sign up or login with your details

Forgot password? Click here to reset