Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio

12/17/2022
by   Asher Spector, et al.
0

This paper introduces a class of asymptotically most powerful knockoff statistics based on a simple principle: that we should prioritize variables in order of our ability to distinguish them from their knockoffs. Our contribution is threefold. First, we argue that feature statistics should estimate "oracle masked likelihood ratios," which are Neyman-Pearson statistics for discriminating between features and knockoffs using partially observed (masked) data. Second, we introduce the masked likelihood ratio (MLR) statistic, a knockoff statistic that estimates the oracle MLR. We show that MLR statistics are asymptotically average-case optimal, i.e., they maximize the expected number of discoveries made by knockoffs when averaging over a user-specified prior on unknown parameters. Our optimality result places no explicit restrictions on the problem dimensions or the unknown relationship between the response and covariates; instead, we assume a "local dependence" condition which depends only on simple quantities that can be calculated from the data. Third, in simulations and three real data applications, we show that MLR statistics outperform state-of-the-art feature statistics, including in settings where the prior is highly misspecified. We implement MLR statistics in the open-source python package knockpy; our implementation is often (although not always) faster than computing a cross-validated lasso.

READ FULL TEXT

page 18

page 21

research
03/24/2022

Learning Optimal Test Statistics in the Presence of Nuisance Parameters

The design of optimal test statistics is a key task in frequentist stati...
research
07/02/2020

A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

The generalized linear models (GLM) have been widely used in practice to...
research
04/10/2022

Existence of maximum likelihood estimates in exponential random graph models

We present a streamlined proof of the foundational result in the theory ...
research
08/10/2020

Design based incomplete U-statistics

U-statistics are widely used in fields such as economics, machine learni...
research
06/29/2021

Bounds for the chi-square approximation of the power divergence family of statistics

It is well-known that each statistic in the family of power divergence o...
research
10/14/2022

Conditional Likelihood Ratio Test with Many Weak Instruments

This paper extends validity of the conditional likelihood ratio (CLR) te...
research
06/17/2021

Optimal Relevant Subset Designs in Nonlinear Models

Fisher (1934) argued that certain ancillary statistics form a relevant s...

Please sign up or login with your details

Forgot password? Click here to reset