Learning Mixtures of Linear Regressions with Nearly Optimal Complexity
Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity N = Õ(d) as well as nearly optimal computational complexity Õ(Nd), where d is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.
READ FULL TEXT