Estimating Distributions with Low-dimensional Structures Using Mixtures of Generative Models
There has been a growing interest in statistical inference from data satisfying the so-called manifold hypothesis, assuming data points in the high-dimensional ambient space to lie in close vicinity of a submanifold of much lower dimension. In machine learning, encoder-decoder pair based generative modelling approaches have been successful in learning complicated high-dimensional distributions such as those over images and texts by explicitly imposing the low-dimensional manifold structure. In this work, we introduce a new approach for estimating distributions on unknown submanifolds via mixtures of generative models. We show that conventional generative modeling approaches using a single encoder-decoder pair are generally unable to capture data distributions under the manifold hypothesis, unless the underlying manifold admits a global parametrization; however, this issue can be solved by using a collection of encoder-decoder pairs for learning different local patches of the data supporting manifold. A rigorous theoretical analysis is developed to demonstrate that the proposed estimator attains the minimax-optimal rate of convergence for the implicit estimation of data distributions with manifold structures. Our experiments show that, by utilizing parameter sharing, the proposed method can significantly improve the performance of conventional auto-encoder based generative modelling approaches with minimal additional computational efforts.
READ FULL TEXT