Challenges with EM in application to weakly identifiable mixture models
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on n i.i.d. samples are known to have lower accuracy than the classical n^- 1/2 error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We first demonstrate via simulation studies a broad range of over-specified mixture models for which the EM algorithm converges very slowly, both in one and higher dimensions. We provide a complete analytical characterization of this behavior for fitting data generated from a multivariate standard normal distribution using two-component Gaussian mixture with varying location and scale parameters. Our results reveal distinct regimes in the convergence behavior of EM as a function of the dimension d. In the multivariate setting (d ≥ 2), when the covariance matrix is constrained to a multiple of the identity matrix, the EM algorithm converges in order (n/d)^1/2 steps and returns estimates that are at a Euclidean distance of order (n/d)^-1/4 and (n d)^- 1/2 from the true location and scale parameter respectively. On the other hand, in the univariate setting (d = 1), the EM algorithm converges in order n^3/4 steps and returns estimates that are at a Euclidean distance of order n^- 1/8 and n^-1/4 from the true location and scale parameter respectively. Establishing the slow rates in the univariate setting requires a novel localization argument with two stages, with each stage involving an epoch-based argument applied to a different surrogate EM operator at the population level. We also show multivariate (d ≥ 2) examples, involving more general covariance matrices, that exhibit the same slow rates as the univariate case.
READ FULL TEXT