NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

11/03/2017
by   Fei Xia, et al.
0

As datasets grow richer, an important challenge is to leverage the full features in the data to maximize the number of useful discoveries while controlling for false positives. We address this problem in the context of multiple hypotheses testing, where for each hypothesis, we observe a p-value along with a set of features specific to that hypothesis. For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait. We have a rich set of features for each variant (e.g. its location, conservation, epigenetics etc.) which could inform how likely the variant is to have a true association. However popular testing approaches, such as Benjamini-Hochberg's procedure (BH) and independent hypothesis weighting (IHW), either ignore these features or assume that the features are categorical or uni-variate. We propose a new algorithm, NeuralFDR, which automatically learns a discovery threshold as a function of all the hypothesis features. We parametrize the discovery threshold as a neural network, which enables flexible handling of multi-dimensional discrete and continuous features as well as efficient end-to-end optimization. We prove that NeuralFDR has strong false discovery rate (FDR) guarantees, and show that it makes substantially more discoveries in synthetic and real datasets. Moreover, we demonstrate that the learned discovery threshold is directly interpretable.

READ FULL TEXT
research
02/07/2019

Contextual Online False Discovery Rate Control

Multiple hypothesis testing, a situation when we wish to consider many h...
research
01/24/2021

NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy

Controlling false discovery rate (FDR) while leveraging the side informa...
research
02/03/2019

Optimal FDR control in the two-group model

The highly influential two group model in testing a large number of stat...
research
10/04/2021

Online Control of the False Discovery Rate under "Decision Deadlines"

Online testing procedures aim to control the extent of false discoveries...
research
10/06/2022

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Controlling False Discovery Rate (FDR) while leveraging the side informa...
research
12/17/2022

Inference with approximate local false discovery rates

Efron's two-group model is widely used in large scale multiple testing. ...
research
09/06/2018

Controlling FDR while highlighting distinct discoveries

Often modern scientific investigations start by testing a very large num...

Please sign up or login with your details

Forgot password? Click here to reset