Convergence Rates for Mixture-of-Experts

10/10/2011
by   Eduardo F. Mendes, et al.
0

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where m experts are mixed, with each expert being related to a polynomial regression model of order k. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size n increases. The convergence rate is found to be dependent on both m and k, and certain choices of m and k are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose m, and on how m and k should be compromised, for achieving good convergence rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Convergence Rates for Gaussian Mixtures of Experts

We provide a theoretical treatment of over-specified Gaussian mixtures o...
research
07/17/2020

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

We study posterior concentration properties of Bayesian procedures for e...
research
10/30/2017

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

In this paper we study the frequentist convergence rate for the Latent D...
research
06/29/2020

Spectral Gap of Replica Exchange Langevin Diffusion on Mixture Distributions

Langevin diffusion (LD) is one of the main workhorses for sampling probl...
research
06/10/2015

Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...
research
05/17/2022

Bagged Polynomial Regression and Neural Networks

Series and polynomial regression are able to approximate the same functi...

Please sign up or login with your details

Forgot password? Click here to reset