Deep Direct Likelihood Knockoffs

07/31/2020
by   Mukund Sudarshan, et al.
2

Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2018

Black Box FDR

Analyzing large-scale, multi-experiment studies requires scientists to t...
research
03/29/2019

Interpreting Black Box Models with Statistical Guarantees

While many methods for interpreting machine learning models have been pr...
research
02/25/2020

Reliable Estimation of Kullback-Leibler Divergence by Controlling Discriminator Complexity in the Reproducing Kernel Hilbert Space

Several scalable methods to compute the Kullback Leibler (KL) divergence...
research
02/16/2023

Aligning Language Models with Preferences through f-divergence Minimization

Aligning language models with preferences can be posed as approximating ...
research
07/30/2022

Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions

We explore clustering the softmax predictions of deep neural networks an...
research
02/27/2021

A Brief Introduction to Generative Models

We introduce and motivate generative modeling as a central task for mach...

Please sign up or login with your details

Forgot password? Click here to reset