Missing Value Knockoffs

by   Deniz Koyuncu, et al.
Rensselaer Polytechnic Institute

One limitation of the most statistical/machine learning-based variable selection approaches is their inability to control the false selections. A recently introduced framework, model-x knockoffs, provides that to a wide range of models but lacks support for datasets with missing values. In this work, we discuss ways of preserving the theoretical guarantees of the model-x framework in the missing data setting. First, we prove that posterior sampled imputation allows reusing existing knockoff samplers in the presence of missing values. Second, we show that sampling knockoffs only for the observed variables and applying univariate imputation also preserves the false selection guarantees. Third, for the special case of latent variable models, we demonstrate how jointly imputing and sampling knockoffs can reduce the computational complexity. We have verified the theoretical findings with two different exploratory variable distributions and investigated how the missing data pattern, amount of correlation, the number of observations, and missing values affected the statistical power.



page 9

page 10

page 22


Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence...

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tail...

Flexible variable selection in the presence of missing data

In many applications, it is of interest to identify a parsimonious set o...

RIFLE: Robust Inference from Low Order Marginals

The ubiquity of missing values in real-world datasets poses a challenge ...

Provable Guarantees for Sparsity Recovery with Deterministic Missing Data Patterns

We study the problem of consistently recovering the sparsity pattern of ...

Statistical Network Analysis with Bergm

Recent advances in computational methods for intractable models have mad...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.