An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

02/06/2018
by   Faming Liang, et al.
0

Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap: we propose a general algorithm for high-dimensional missing data problems. The proposed algorithm works by iterating between an imputation step and a consistency step. At the imputation step, the missing data are imputed conditional on the observed data and the current estimate of parameters; and at the consistency step, a consistent estimate is found for the minimizer of a Kullback-Leibler divergence defined on the pseudo-complete data. For high dimensional problems, the consistent estimate can be found under sparsity constraints. The consistency of the averaged estimate for the true parameter can be established under quite general conditions. The proposed algorithm is illustrated using high-dimensional Gaussian graphical models, high-dimensional variable selection, and a random coefficient model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2018

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tail...
research
01/12/2019

Integrating multi-source block-wise missing data in model selection

For multi-source data, blocks of variable information from certain sourc...
research
01/28/2021

Inference of stochastic time series with missing data

Inferring dynamics from time series is an important objective in data an...
research
09/16/2011

High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Although the standard formulations of prediction problems involve fully-...
research
11/06/2021

In Nonparametric and High-Dimensional Models, Bayesian Ignorability is an Informative Prior

In problems with large amounts of missing data one must model two distin...
research
10/02/2017

Detecting Epistatic Selection with Partially Observed Genotype Data Using Copula Graphical Models

Recombinant Inbred Lines derived from divergent parental lines can displ...
research
05/27/2022

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

State-of-the-art causal discovery methods usually assume that the observ...

Please sign up or login with your details

Forgot password? Click here to reset