Missing Value Knockoffs

02/26/2022
by   Deniz Koyuncu, et al.
Rensselaer Polytechnic Institute
0

One limitation of the most statistical/machine learning-based variable selection approaches is their inability to control the false selections. A recently introduced framework, model-x knockoffs, provides that to a wide range of models but lacks support for datasets with missing values. In this work, we discuss ways of preserving the theoretical guarantees of the model-x framework in the missing data setting. First, we prove that posterior sampled imputation allows reusing existing knockoff samplers in the presence of missing values. Second, we show that sampling knockoffs only for the observed variables and applying univariate imputation also preserves the false selection guarantees. Third, for the special case of latent variable models, we demonstrate how jointly imputing and sampling knockoffs can reduce the computational complexity. We have verified the theoretical findings with two different exploratory variable distributions and investigated how the missing data pattern, amount of correlation, the number of observations, and missing values affected the statistical power.

READ FULL TEXT

Authors

page 9

page 10

page 22

10/24/2021

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence...
08/21/2018

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tail...
02/25/2022

Flexible variable selection in the presence of missing data

In many applications, it is of interest to identify a parsimonious set o...
09/01/2021

RIFLE: Robust Inference from Low Order Marginals

The ubiquity of missing values in real-world datasets poses a challenge ...
06/10/2022

Provable Guarantees for Sparsity Recovery with Deterministic Missing Data Patterns

We study the problem of consistently recovering the sparsity pattern of ...
04/06/2021

Statistical Network Analysis with Bergm

Recent advances in computational methods for intractable models have mad...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.