The Effect of Sample Size and Missingness on Inference with Missing Data

12/17/2021

∙

When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This paper answers this question by offering a new asymptotic theory about inference with missing data that is more general than existing theories. By using more powerful tools from real analysis and probability theory than those used in previous research, it proves that as the sample size increases and the extent of missingness decreases, the average-loglikelihood function generated by partial data and that ignores the missingness mechanism will almost surely converge uniformly to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences from partial data, such as posterior modes, uncertainty estimates, confidence intervals, likelihood ratios, test statistics, and indeed, all quantities or features derived from the partial-data loglikelihood function, will be consistently estimated. They will approximate their complete-data analogues. This adds to previous research which has only proved the consistency and asymptotic normality of the posterior mode, and developed separate theories for Direct-Likelihood, Bayesian, and Frequentist inference. Practical implications of this result are discussed, and the theory is verified using a previous study of International Human Rights Law.

READ FULL TEXT

The Effect of Sample Size and Missingness on Inference with Missing Data

Estimating Viral Genetic Linkage Rates in the Presence of Missing Data

Inference for partial correlation when data are missing not at random

Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

Second Term Improvement to Generalised Linear Mixed Model Asymptotics

Estimating Undirected Graphs Under Weak Assumptions

Challenges of the inconsistency regime: Novel debiasing methods for missing data models

Posterior Probabilities: Nonmonotonicity, Asymptotic Rates, Log-Concavity, and Turán's Inequality

The Effect of Sample Size and Missingness on Inference with Missing Data

Related Research

Estimating Viral Genetic Linkage Rates in the Presence of Missing Data

Inference for partial correlation when data are missing not at random

Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

Second Term Improvement to Generalised Linear Mixed Model Asymptotics

Estimating Undirected Graphs Under Weak Assumptions

Challenges of the inconsistency regime: Novel debiasing methods for missing data models

Posterior Probabilities: Nonmonotonicity, Asymptotic Rates, Log-Concavity, and Turán's Inequality