Non-parametric Bayesian modelling of digital gene expression data

01/17/2013
by   Dimitrios V. Vavoulis, et al.
0

Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

A mathematical framework for raw counts of single-cell RNA-seq data analysis

Single-cell RNA-seq data are challenging because of the sparseness of th...
research
12/31/2021

An empirical Bayes approach to estimating dynamic models of co-regulated gene expression

Time-course gene expression datasets provide insight into the dynamics o...
research
10/12/2012

Bayesian Analysis for miRNA and mRNA Interactions Using Expression Data

MicroRNAs (miRNAs) are small RNA molecules composed of 19-22 nt, which p...
research
11/29/2010

Nonparametric Bayesian sparse factor models with application to gene expression modeling

A nonparametric Bayesian extension of Factor Analysis (FA) is proposed w...
research
03/07/2018

Differential Expression Analysis of Dynamical Sequencing Count Data with a Gamma Markov Chain

Next-generation sequencing (NGS) to profile temporal changes in living s...
research
02/22/2018

SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis

Bakground: With the proliferation of available microarray and high throu...

Please sign up or login with your details

Forgot password? Click here to reset