On Inferences from Completed Data
Matrix completion has become an extremely important technique as data scientists are routinely faced with large, incomplete datasets on which they wish to perform statistical inferences. We investigate how error introduced via matrix completion affects statistical inference. Furthermore, we prove recovery error bounds which depend upon the matrix recovery error for several common statistical inferences. We consider matrix recovery via nuclear norm minimization and a variant, ℓ_1-regularized nuclear norm minimization for data with a structured sampling pattern. Finally, we run a series of numerical experiments on synthetic data and real patient surveys from MyLymeData, which illustrate the relationship between inference recovery error and matrix recovery error. These results indicate that exact matrix recovery is often not necessary to achieve small inference recovery error.
READ FULL TEXT