Statistical Inference with Local Optima
We study the statistical properties of an estimator derived by applying a gradient ascent method with multiple initializations to a multi-modal likelihood function. We derive the population quantity that is the target of this estimator and study the properties of confidence intervals (CIs) constructed from asymptotic normality and the bootstrap approach. In particular, we analyze the coverage deficiency due to finite number of random initializations. We also investigate the CIs by inverting the likelihood ratio test, the score test, and the Wald test, and we show that the resulting CIs may be very different. We provide a summary of the uncertainties that we need to consider while making inference about the population. Note that we do not provide a solution to the problem of multiple local maxima; instead, our goal is to investigate the effect from local maxima on the behavior of our estimator. In addition, we analyze the performance of the EM algorithm under random initializations and derive the coverage of a CI with a finite number of initializations. Finally, we extend our analysis to a nonparametric mode hunting problem.
READ FULL TEXT