Improving Accuracy in Cell-Perturbation Experiments by Leveraging Auxiliary Information

07/21/2023
by   Jackson Loper, et al.
0

Modern cell-perturbation experiments expose cells to panels of hundreds of stimuli, such as cytokines or CRISPR guides that perform gene knockouts. These experiments are designed to investigate whether a particular gene is upregulated or downregulated by exposure to each treatment. However, due to high levels of experimental noise, typical estimators of whether a gene is up- or down-regulated make many errors. In this paper, we make two contributions. Our first contribution is a new estimator of regulatory effect that makes use of Gaussian processes and factor analysis to leverage auxiliary information about similarities among treatments, such as the chemical similarity among the drugs used to perturb cells. The new estimator typically has lower variance than unregularized estimators, which do not use auxiliary information, but higher bias. To assess whether this new estimator improves accuracy (i.e., achieves a favorable trade-off between bias and variance), we cannot simply compute its error on heldout data as “ground truth” about the effects of treatments is unavailable. Our second contribution is a novel data-splitting method to evaluate error rates. This data-splitting method produces valid error bounds using “sign-valid” estimators, which by definition have the correct sign more often than not. Using this data-splitting method, through a series of case studies we find that our new estimator, which leverages auxiliary information, can yield a three-fold reduction in type S error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2022

Quantifying the Reproducibility of Cell-Perturbation Experiments

Experiments adhering to the same protocol can nonetheless lead to differ...
research
07/01/2022

Inference after latent variable estimation for single-cell RNA sequencing data

In the analysis of single-cell RNA sequencing data, researchers often ch...
research
07/30/2016

Double/Debiased Machine Learning for Treatment and Causal Parameters

Most modern supervised statistical/machine learning (ML) methods are exp...
research
09/18/2022

Estimation of the Selected Treatment Mean in Two-Stage Drop-the-Losers Design

A common problem faced in clinical studies is that of estimating the eff...
research
12/20/2022

Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias

We present in this paper a family of generalized simultaneous perturbati...
research
12/28/2021

GANISP: a GAN-assisted Importance SPlitting Probability Estimator

Designing manufacturing processes with high yield and strong reliability...
research
04/07/2021

Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

Eigenvector perturbation analysis plays a vital role in various statisti...

Please sign up or login with your details

Forgot password? Click here to reset