DeepAI AI Chat
Log In Sign Up

Convergence Rates for Mixture-of-Experts

10/10/2011
by   Eduardo F. Mendes, et al.
0

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where m experts are mixed, with each expert being related to a polynomial regression model of order k. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size n increases. The convergence rate is found to be dependent on both m and k, and certain choices of m and k are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose m, and on how m and k should be compromised, for achieving good convergence rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/09/2019

Convergence Rates for Gaussian Mixtures of Experts

We provide a theoretical treatment of over-specified Gaussian mixtures o...
07/17/2020

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

We study posterior concentration properties of Bayesian procedures for e...
10/30/2017

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

In this paper we study the frequentist convergence rate for the Latent D...
06/10/2015

Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...
06/29/2020

Spectral Gap of Replica Exchange Langevin Diffusion on Mixture Distributions

Langevin diffusion (LD) is one of the main workhorses for sampling probl...
08/04/2019

Fast Nonoverlapping Block Jacobi Method for the Dual Rudin--Osher--Fatemi Model

We consider nonoverlapping domain decomposition methods for the Rudin--O...