A single-cell gene expression language model

10/25/2022
by   William Connell, et al.
0

Gene regulation is a dynamic process that connects genotype and phenotype. Given the difficulty of physically mapping mammalian gene circuitry, we require new computational methods to learn regulatory rules. Natural language is a valuable analogy to the communication of regulatory control. Machine learning systems model natural language by explicitly learning context dependencies between words. We propose a similar system applied to single-cell RNA expression profiles to learn context dependencies between genes. Our model, Exceiver, is trained across a diversity of cell types using a self-supervised task formulated for discrete count data, accounting for feature sparsity. We found agreement between the similarity profiles of latent sample representations and learned gene embeddings with respect to biological annotations. We evaluated Exceiver on a new dataset and a downstream prediction task and found that pretraining supports transfer learning. Our work provides a framework to model gene regulation on a single-cell level and transfer knowledge to downstream tasks.

READ FULL TEXT
research
02/04/2016

Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model

Cataloging the neuronal cell types that comprise circuitry of individual...
research
12/14/2021

Epigenomic language models powered by Cerebras

Large scale self-supervised pre-training of Transformer language models ...
research
06/13/2018

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks

Understanding cell identity is an important task in many biomedical area...
research
10/05/2020

Factorized linear discriminant analysis for phenotype-guided representation learning of neuronal gene expression data

A central goal in neurobiology is to relate the expression of genes to t...
research
06/23/2023

Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heteroge...
research
01/29/2021

A principle feature analysis

A key task of data science is to identify relevant features linked to ce...
research
11/07/2022

Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling

Latent variable models such as the Variational Auto-Encoder (VAE) have b...

Please sign up or login with your details

Forgot password? Click here to reset