Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data

08/03/2023
by   Rong Ma, et al.
0

Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data. SMAI provides a statistical test to robustly determine the alignability between datasets to avoid misleading inference, and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.

READ FULL TEXT

page 9

page 11

page 38

page 41

research
12/05/2021

Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics Alignment and Integration

Muilti-modality data are ubiquitous in biology, especially that we have ...
research
04/06/2017

DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become ...
research
05/31/2022

AVIDA: Alternating method for Visualizing and Integrating Data

High-dimensional multimodal data arises in many scientific fields. The i...
research
07/17/2023

Kernel-Based Testing for Single-Cell Differential Analysis

Single-cell technologies have provided valuable insights into the distri...
research
08/11/2022

Interpretable cytometry cell-type annotation with flow-based deep generative models

Cytometry enables precise single-cell phenotyping within heterogeneous p...
research
02/13/2022

Robust Statistical Inference for Cell Type Deconvolution

Cell type deconvolution is a computational approach to infer proportions...
research
03/04/2023

Stochastic networks theory to model single-cell genomic count data

We propose a novel way of representing and analysing single-cell genomic...

Please sign up or login with your details

Forgot password? Click here to reset