Listwise Deletion in High Dimensions
We consider the properties of listwise deletion when both n and the number of variables grow large. We show that when (i) all data has some idiosyncratic missingness and (ii) the number of variables grows superlogarithmically in n, then, for large n, listwise deletion will drop all rows with probability 1. We present numerical illustrations to demonstrate finite-n implications. These results suggest, in practice, using listwise deletion may mean using few of the variables available to the researcher even when n is very large.
READ FULL TEXT