ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for Clustering Single-cell Gene Expression Data

07/16/2021
by   Arnab Kumar Mondal, et al.
27

Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and computational challenges due to their high-dimensionality and data-sparsity, also known as `dropout' events. Recently, Regularized Auto-Encoder (RAE) based deep neural network models have achieved remarkable success in learning robust low-dimensional representations. The basic idea in RAEs is to learn a non-linear mapping from the high-dimensional data space to a low-dimensional latent space and vice-versa, simultaneously imposing a distributional prior on the latent space, which brings in a regularization effect. This paper argues that RAEs suffer from the infamous problem of bias-variance trade-off in their naive formulation. While a simple AE without a latent regularization results in data over-fitting, a very strong prior leads to under-representation and thus bad clustering. To address the above issues, we propose a modified RAE framework (called the scRAE) for effective clustering of the single-cell RNA sequencing data. scRAE consists of deterministic AE with a flexibly learnable prior generator network, which is jointly trained with the AE. This facilitates scRAE to trade-off better between the bias and variance in the latent space. We demonstrate the efficacy of the proposed method through extensive experimentation on several real-world single-cell Gene expression datasets.

READ FULL TEXT

page 4

page 7

page 8

page 9

page 10

page 12

research
06/10/2020

To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs

Regularized Auto-Encoders (AE) form a rich class of methods within the l...
research
08/05/2018

Hybrid Subspace Learning for High-Dimensional Data

The high-dimensional data setting, in which p >> n, is a challenging sta...
research
05/25/2022

RENs: Relevance Encoding Networks

The manifold assumption for high-dimensional data assumes that the data ...
research
08/21/2020

MPCC: Matching Priors and Conditionals for Clustering

Clustering is a fundamental task in unsupervised learning that depends h...
research
03/24/2019

Variational Inference with Latent Space Quantization for Adversarial Resilience

Despite their tremendous success in modelling high-dimensional data mani...
research
12/06/2018

RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets

Clustering is a fundamental machine learning task and can be used in man...
research
02/07/2023

Learning bias corrections for climate models using deep neural operators

Numerical simulation for climate modeling resolving all important scales...

Please sign up or login with your details

Forgot password? Click here to reset