Missing Value Knockoffs

02/26/2022
by   Deniz Koyuncu, et al.
0

One limitation of the most statistical/machine learning-based variable selection approaches is their inability to control the false selections. A recently introduced framework, model-x knockoffs, provides that to a wide range of models but lacks support for datasets with missing values. In this work, we discuss ways of preserving the theoretical guarantees of the model-x framework in the missing data setting. First, we prove that posterior sampled imputation allows reusing existing knockoff samplers in the presence of missing values. Second, we show that sampling knockoffs only for the observed variables and applying univariate imputation also preserves the false selection guarantees. Third, for the special case of latent variable models, we demonstrate how jointly imputing and sampling knockoffs can reduce the computational complexity. We have verified the theoretical findings with two different exploratory variable distributions and investigated how the missing data pattern, amount of correlation, the number of observations, and missing values affected the statistical power.

READ FULL TEXT

page 9

page 10

page 22

research
10/24/2021

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence...
research
08/21/2018

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tail...
research
02/25/2022

Flexible variable selection in the presence of missing data

In many applications, it is of interest to identify a parsimonious set o...
research
06/10/2022

Provable Guarantees for Sparsity Recovery with Deterministic Missing Data Patterns

We study the problem of consistently recovering the sparsity pattern of ...
research
09/01/2021

RIFLE: Robust Inference from Low Order Marginals

The ubiquity of missing values in real-world datasets poses a challenge ...
research
07/20/2021

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Prior work has shown that combining bootstrap imputation with tree-based...

Please sign up or login with your details

Forgot password? Click here to reset