Optimal Covariate Weighting Increases Discoveries in High-throughput Biology

03/11/2022
by   Mohamad Hasan, et al.
0

The large-scale multiple testing inherent to high throughput biological data necessitates very high statistical stringency and thus true effects in data are difficult to detect unless they have high effect sizes. One promising approach for reducing the multiple testing burden is to use independent information to prioritize the features most likely to be true effects. However, using the independent data effectively is challenging and often does not lead to substantial gains in power. Current state-of-the-art methods sort features into groups by the independent information and calculate weights for each group. However, when true effects are weak and rare (the typical situation for high throughput biological studies), all groups will contain many null tests and thus their weights are diluted, and performance suffers. We introduce Covariate Rank Weighting (CRW), a method for calculating approximate optimal weights conditioned on the ranking of tests by an external covariate. This approach uses the probabilistic relationship between covariate ranking and test effect size to calculate individual weights for each test that are more informative than group weights and are not diluted by null effects. We show how this relationship can be calculated theoretically for normally distributed covariates. It can be estimated empirically in other cases. We show via simulations and applications to data that this method outperforms existing methods by as much as 10-fold in the rare/low effect size scenario common to biological data and has at least comparable performance in all scenarios.

READ FULL TEXT
research
12/19/2017

Optimal P-value Weighting with Independent Information

The large-scale multiple testing inherent to high throughput biological ...
research
11/07/2018

Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes

Penalization schemes like Lasso or ridge regression are routinely used t...
research
11/15/2020

Nonparametric goodness-of-fit testing for parametric covariate models in pharmacometric analyses

The characterization of covariate effects on model parameters is a cruci...
research
08/17/2018

Estimating and accounting for unobserved covariates in high dimensional correlated data

Many high dimensional and high-throughput biological datasets have compl...
research
11/06/2018

NExUS: Bayesian simultaneous network estimation across unequal sample sizes

Network-based analyses of high-throughput genomics data provide a holist...
research
03/20/2018

HINT: A Toolbox for Hierarchical Modeling of Neuroimaging Data

The modular behavior of the human brain is commonly investigated using i...
research
01/26/2021

A Coding Theory Perspective on Multiplexed Molecular Profiling of Biological Tissues

High-throughput and quantitative experimental technologies are experienc...

Please sign up or login with your details

Forgot password? Click here to reset