Retrieval of Experiments with Sequential Dirichlet Process Mixtures in Model Space

10/08/2013
by   Ritabrata Dutta, et al.
0

We address the problem of retrieving relevant experiments given a query experiment, motivated by the public databases of datasets in molecular biology and other experimental sciences, and the need of scientists to relate to earlier work on the level of actual measurement data. Since experiments are inherently noisy and databases ever accumulating, we argue that a retrieval engine should possess two particular characteristics. First, it should compare models learnt from the experiments rather than the raw measurements themselves: this allows incorporating experiment-specific prior knowledge to suppress noise effects and focus on what is important. Second, it should be updated sequentially from newly published experiments, without explicitly storing either the measurements or the models, which is critical for saving storage space and protecting data privacy: this promotes life long learning. We formulate the retrieval as a "supermodelling" problem, of sequentially learning a model of the set of posterior distributions, represented as sets of MCMC samples, and suggest the use of Particle-Learning-based sequential Dirichlet process mixture (DPM) for this purpose. The relevance measure for retrieval is derived from the supermodel through the mixture representation. We demonstrate the performance of the proposed retrieval method on simulated data and molecular biological experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2014

Retrieval of Experiments by Efficient Estimation of Marginal Likelihood

We study the task of retrieving relevant experiments given a query exper...
research
08/19/2022

Real and simulated CBM data interacting with an ESCAPE datalake

Integration of the ESCAPE and CBM software environment. The ESCAPE datal...
research
06/18/2022

IID Sampling from Posterior Dirichlet Process Mixtures

The influence of Dirichlet process mixture is ubiquitous in the Bayesian...
research
10/16/2012

Hilbert Space Embedding for Dirichlet Process Mixtures

This paper proposes a Hilbert space embedding for Dirichlet Process mixt...
research
06/04/2020

Characteristics of Dataset Retrieval Sessions: Experiences from a Real-life Digital Library

Secondary analysis or the reuse of existing survey data is a common prac...
research
05/19/2015

Modelling-based experiment retrieval: A case study with gene expression clustering

Motivation: Public and private repositories of experimental data are gro...
research
09/29/2014

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

We develop a sequential low-complexity inference procedure for Dirichlet...

Please sign up or login with your details

Forgot password? Click here to reset