Multiscale methods for signal selection in single-cell data

06/15/2022
by   Renee S. Hoekzema, et al.
20

Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between cell types may not be detected. We propose three topologically-motivated mathematical methods for unsupervised feature selection that consider discrete and continuous transcriptional patterns on an equal footing across multiple scales simultaneously. Eigenscores (eig_i) rank signals or genes based on their correspondence to low-frequency intrinsic patterning in the data using the spectral decomposition of the graph Laplacian. The multiscale Laplacian score (MLS) is an unsupervised method for locating relevant scales in data and selecting the genes that are coherently expressed at these respective scales. The persistent Rayleigh quotient (PRQ) takes data equipped with a filtration, allowing separation of genes with different roles in a bifurcation process (e.g. pseudo-time). We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets. The methods validate previously identified genes and detect additional genes with coherent expression patterns. By studying the interaction between gene signals and the geometry of the underlying space, the three methods give multidimensional rankings of the genes and visualisation of relationships between them.

READ FULL TEXT

page 16

page 19

research
12/13/2022

Multiscale topology classifies and quantifies cell types in subcellular spatial transcriptomics

Spatial transcriptomics has the potential to transform our understanding...
research
11/08/2018

Spectral Simplicial Theory for Feature Selection and Applications to Genomics

The scale and complexity of modern data sets and the limitations associa...
research
02/07/2020

A mathematical framework for raw counts of single-cell RNA-seq data analysis

Single-cell RNA-seq data are challenging because of the sparseness of th...
research
07/28/2022

MarkerMap: nonlinear marker selection for single-cell studies

Single-cell RNA-seq data allow the quantification of cell type differenc...
research
04/29/2022

Topological Data Analysis in Time Series: Temporal Filtration and Application to Single-Cell Genomics

The absence of a conventional association between the cell-cell cohabita...
research
07/20/2020

Joint Learning of Discrete and Continuous Variability with Coupled Autoencoding Agents

Jointly identifying discrete and continuous factors of variability can h...
research
06/05/2023

Graph Fourier MMD for Signals on Graphs

While numerous methods have been proposed for computing distances betwee...

Please sign up or login with your details

Forgot password? Click here to reset