Machine learning meets false discovery rate

08/13/2022
by   Ariane Marandon, et al.
6

Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees, while they often lack of flexibility. On the other hand, recent machine learning classification algorithms, as those based on random forests (RF) or neural networks (NN), have great practical performances but lack of interpretation and of theoretical guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). AdaDetect is shown to both control strongly the FDR and to have a power that mimics the one of the oracle in a specific sense. The interest and validity of our approach is demonstrated with theoretical results, numerical experiments on several benchmark datasets and with an application to astrophysical data. In particular, while AdaDetect can be used in combination with any classifier, it is particularly efficient on real-world datasets with RF, and on images with NN.

READ FULL TEXT

page 3

page 15

research
02/21/2023

Stepdown SLOPE for Controlled Feature Selection

Sorted L-One Penalized Estimation (SLOPE) has shown the nice theoretical...
research
02/23/2023

Bounding the FDP in competition-based control of the FDR

Competition-based approach to controlling the false discovery rate (FDR)...
research
11/24/2020

Competition-based control of the false discovery proportion

Target-decoy competition (TDC) is commonly used in the computational mas...
research
03/14/2018

A Unified View of False Discovery Rate Control: Reconciliation of Bayesian and Frequentist Approaches

This paper explores the intrinsic connections between the Bayesian false...
research
06/13/2023

False discovery proportion envelopes with consistency

We provide new false discovery proportion (FDP) confidence envelopes in ...
research
07/12/2023

Empirical Bayes large-scale multiple testing for high-dimensional sparse binary sequences

This paper investigates the multiple testing problem for high-dimensiona...
research
08/10/2021

Why multiple hypothesis test corrections provide poor control of false positives in the real world

Most scientific disciplines use significance testing to draw conclusions...

Please sign up or login with your details

Forgot password? Click here to reset