Negative binomial count splitting for single-cell RNA sequencing data

07/24/2023
by   Anna Neufeld, et al.
0

The analysis of single-cell RNA sequencing (scRNA-seq) data often involves fitting a latent variable model to learn a low-dimensional representation for the cells. Validating such a model poses a major challenge. If we could sequence the same set of cells twice, we could use one dataset to fit a latent variable model and the other to validate it. In reality, we cannot sequence the same set of cells twice. Poisson count splitting was recently proposed as a way to work backwards from a single observed Poisson data matrix to obtain independent Poisson training and test matrices that could have arisen from two independent sequencing experiments conducted on the same set of cells. However, the Poisson count splitting approach requires that the original data are exactly Poisson distributed: in the presence of any overdispersion, the resulting training and test datasets are not independent. In this paper, we introduce negative binomial count splitting, which extends Poisson count splitting to the more flexible negative binomial setting. Given an n × p dataset from a negative binomial distribution, we use Dirichlet-multinomial sampling to create two or more independent n × p negative binomial datasets. We show that this procedure outperforms Poisson count splitting in simulation, and apply it to validate clusters of kidney cells from a human fetal cell atlas.

READ FULL TEXT

page 4

page 20

research
07/01/2022

Inference after latent variable estimation for single-cell RNA sequencing data

In the analysis of single-cell RNA sequencing data, researchers often ch...
research
12/29/2021

ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

Jupyter notebooks represent a unique format for programming - a combinat...
research
03/04/2023

Stochastic networks theory to model single-cell genomic count data

We propose a novel way of representing and analysing single-cell genomic...
research
03/25/2021

Biwhitening Reveals the Rank of a Count Matrix

Estimating the rank of a corrupted data matrix is an important task in d...
research
07/11/2019

Splitting methods for Fourier spectral discretizations of the strongly magnetized Vlasov-Poisson and the Vlasov-Maxwell system

Fourier spectral discretizations belong to the most straightforward meth...
research
01/18/2023

Data thinning for convolution-closed distributions

We propose data thinning, a new approach for splitting an observation in...
research
04/30/2021

Models Based on Exponential Interarrival Times for Single-Unusual-Event Count Data

At least one unusual event appears in some count datasets. It will lead ...

Please sign up or login with your details

Forgot password? Click here to reset