Structured factorization for single-cell gene expression data

05/19/2023
by   Antonio Canale, et al.
0

Single-cell gene expression data are often characterized by large matrices, where the number of cells may be lower than the number of genes of interest. Factorization models have emerged as powerful tools to condense the available information through a sparse decomposition into lower rank matrices. In this work, we adapt and implement a recent Bayesian class of generalized factor models to count data and, specifically, to model the covariance between genes. The developed methodology also allows one to include exogenous information within the prior, such that recognition of covariance structures between genes is favoured. In this work, we use biological pathways as external information to induce sparsity patterns within the loadings matrix. This approach facilitates the interpretation of loadings columns and the corresponding latent factors, which can be regarded as unobserved cell covariates. We demonstrate the effectiveness of our model on single-cell RNA sequencing data obtained from lung adenocarcinoma cell lines, revealing promising insights into the role of pathways in characterizing gene relationships and extracting valuable information about unobserved cell traits.

READ FULL TEXT
research
02/21/2019

A Nonparametric Multi-view Model for Estimating Cell Type-Specific Gene Regulatory Networks

We present a Bayesian hierarchical multi-view mixture model termed Symph...
research
07/14/2023

Single-cell RNA-seq data imputation using Feature Propagation

While single-cell RNA sequencing provides an understanding of the transc...
research
06/08/2018

Hadamard Matrices, Quaternions, and the Pearson Chi-square Statistic

We present a symbolic decomposition of the Pearson chi-square statistic ...
research
07/08/2014

MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data

Background: Biological data often originate from samples containing mixt...
research
06/29/2022

Extracting Information from Stochastic Trajectories of Gene Expression

Gene expression is a stochastic process in which cells produce biomolecu...
research
01/26/2021

essHi-C: Essential component analysis of Hi-C matrices

Motivation: Hi-C matrices are cornerstones for qualitative and quantitat...
research
03/29/2020

The covariance shift (C-SHIFT) algorithm for normalizing biological data

Omics technologies are powerful tools for analyzing patterns in gene exp...

Please sign up or login with your details

Forgot password? Click here to reset