Batch correction of high-dimensional data

11/15/2019
by   Emanuele Aliverti, et al.
0

Biomedical research often produces high-dimensional data confounded by batch effects such as systematic experimental variations, different protocols and subject identifiers. Without proper correction, low-dimensional representation of high-dimensional data might encode and reproduce the same systematic variations observed in the original data, and compromise the interpretation of the results. In this article, we propose a novel procedure to remove batch effects from low-dimensional embeddings obtained with t-SNE dimensionality reduction. The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumors.

READ FULL TEXT
research
10/13/2019

Unsupervised Discovery of Sparse Multimodal Representations in High Dimensional Data

Extracting an understanding of the underlying system from high dimension...
research
01/13/2023

RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

High-throughput screening techniques are commonly used to obtain large q...
research
10/13/2016

Removal of Batch Effects using Distribution-Matching Residual Networks

Sources of variability in experimentally derived data include measuremen...
research
09/07/2017

A deep generative model for gene expression profiles from single-cell RNA sequencing

We propose a probabilistic model for interpreting gene expression levels...
research
10/13/2017

A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes

We propose a probabilistic model for interpreting gene expression levels...
research
02/07/2023

Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors

Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows re...
research
06/28/2023

CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images

Cell line authentication plays a crucial role in the biomedical field, e...

Please sign up or login with your details

Forgot password? Click here to reset