Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control

11/04/2022
by   Taejoo Ahn, et al.
0

In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures controlling the false discovery rate (FDR) and simultaneously discovering more relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the first goal of finite-sample FDR control under the assumption of known covariates distribution. However, it is not clear whether these methods can concurrently achieve the second goal of maximizing the number of discoveries. In fact, designing procedures to discover more relevant variables with finite-sample FDR control is a largely open question, even in the arguably simplest linear models. In this paper, we derive near-optimal testing procedures in high dimensional Bayesian linear models with isotropic covariates. We propose a Model-X multiple testing procedure, PoEdCe, which provably controls the frequentist FDR from finite samples even under model misspecification, and conjecturally achieves near-optimal power when the data follow the Bayesian linear model with a known prior. PoEdCe has three important ingredients: Posterior Expectation, distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on a heuristic calculation of its asymptotic true positive proportion (TPP) and false discovery proportion (FDP), which is supported by methods from statistical physics as well as extensive numerical simulations. Furthermore, when the prior is unknown, we show that an empirical Bayes variant of PoEdCe still has finite-sample FDR control and achieves near-optimal power.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2017

Dynamic adaptive procedures for false discovery rate estimation and control

In the multiple testing problem with independent tests, the classical li...
research
04/20/2018

Variable Selection via Adaptive False Negative Control in High-Dimensional Regression

In high-dimensional regression, variable selection methods have been dev...
research
03/27/2023

Discovering the Network Granger Causality in Large Vector Autoregressive Models

This paper proposes novel inferential procedures for the network Granger...
research
12/30/2022

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

We study the task of learning state representations from potentially hig...
research
03/01/2019

Metropolized Knockoff Sampling

Model-X knockoffs is a wrapper that transforms essentially any feature i...
research
05/02/2021

Directional FDR Control for Sub-Gaussian Sparse GLMs

High-dimensional sparse generalized linear models (GLMs) have emerged in...

Please sign up or login with your details

Forgot password? Click here to reset