Modelling-based experiment retrieval: A case study with gene expression clustering

05/19/2015
by   Paul Blomstedt, et al.
0

Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. k-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. Availability: The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

Inferring probabilistic Boolean networks from steady-state gene data samples

Probabilistic Boolean Networks have been proposed for estimating the beh...
research
01/06/2021

Classification of chemical compounds based on the correlation between in vitro gene expression profiles

Toxicity evaluation of chemical compounds has traditionally relied on an...
research
02/19/2014

Retrieval of Experiments by Efficient Estimation of Marginal Likelihood

We study the task of retrieving relevant experiments given a query exper...
research
01/08/2013

An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach

With the rapid advances of microarray technologies, large amounts of hig...
research
08/18/2020

EXCLUVIS: A MATLAB GUI Software for Comparative Study of Clustering and Visualization of Gene Expression Data

Clustering is a popular data mining technique that aims to partition an ...
research
09/05/2019

Reply to "Issues arising from benchmarking single-cell RNA sequencing imputation methods"

In our Brief Communication (DOI: 10.1038/s41592-018-0033-z), we presente...
research
10/08/2013

Retrieval of Experiments with Sequential Dirichlet Process Mixtures in Model Space

We address the problem of retrieving relevant experiments given a query ...

Please sign up or login with your details

Forgot password? Click here to reset