RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

01/13/2023
by   Maciej Sypetkowski, et al.
0

High-throughput screening techniques are commonly used to obtain large quantities of data in many fields of biology. It is well known that artifacts arising from variability in the technical execution of different experimental batches within such screens confound these observations and can lead to invalid biological conclusions. It is therefore necessary to account for these batch effects when analyzing outcomes. In this paper we describe RxRx1, a biological dataset designed specifically for the systematic study of batch effect correction methods. The dataset consists of 125,510 high-resolution fluorescence microscopy images of human cells under 1,138 genetic perturbations in 51 experimental batches across 4 cell types. Visual inspection of the images alone clearly demonstrates significant batch effects. We propose a classification task designed to evaluate the effectiveness of experimental batch correction methods on these images and examine the performance of a number of correction methods on this task. Our goal in releasing RxRx1 is to encourage the development of effective experimental batch correction methods that generalize well to unseen experimental batches. The dataset can be downloaded at https://rxrx.ai.

READ FULL TEXT

page 2

page 3

page 4

page 8

page 10

research
11/15/2019

Batch correction of high-dimensional data

Biomedical research often produces high-dimensional data confounded by b...
research
12/18/2019

Cluster Analysis of High-Dimensional scRNA Sequencing Data

With ongoing developments and innovations in single-cell RNA sequencing ...
research
03/22/2010

Towards automated high-throughput screening of C. elegans on agar

High-throughput screening (HTS) using model organisms is a promising met...
research
11/15/2018

Adjusting for Confounding in Unsupervised Latent Representations of Images

Biological imaging data are often partially confounded or contain unwant...
research
03/13/2023

Ins-ATP: Deep Estimation of ATP for Organoid Based on High Throughput Microscopic Images

Adenosine triphosphate (ATP) is a high-energy phosphate compound and the...
research
12/07/2018

METCC: METric learning for Confounder Control Making distance matter in high dimensional biological analysis

High-dimensional data acquired from biological experiments such as next ...
research
06/28/2023

CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images

Cell line authentication plays a crucial role in the biomedical field, e...

Please sign up or login with your details

Forgot password? Click here to reset