Dropout Regularization in Hierarchical Mixture of Experts

12/25/2018
by   Ozan Irsoy, et al.
8

Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a soft hierarchical partitioning of the input space. In this work, we propose a variant of dropout for hierarchical mixture of experts that is faithful to the tree hierarchy defined by the model, as opposed to having a flat, unitwise independent application of dropout as one has with multi-layer perceptrons. We show that on a synthetic regression data and on MNIST and CIFAR-10 datasets, our proposed dropout mechanism prevents overfitting on trees with many levels improving generalization and providing smoother fits.

READ FULL TEXT

page 4

page 5

page 6

research
05/28/2022

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Sparsely activated transformers, such as Mixture of Experts (MoE), have ...
research
11/19/2015

Reducing Overfitting in Deep Networks by Decorrelating Representations

One major challenge in training Deep Neural Networks is preventing overf...
research
03/18/2019

Hierarchical Routing Mixture of Experts

In regression tasks the distribution of the data is often too complex to...
research
07/19/2022

MoEC: Mixture of Expert Clusters

Sparsely Mixture of Experts (MoE) has received great interest due to its...
research
06/12/2013

Understanding Dropout: Training Multi-Layer Perceptrons with Auxiliary Independent Stochastic Neurons

In this paper, a simple, general method of adding auxiliary stochastic n...
research
08/10/2018

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Multi-layer neural networks have lead to remarkable performance on many ...
research
10/23/2019

A Hierarchical Mixture Density Network

The relationship among three correlated variables could be very sophisti...

Please sign up or login with your details

Forgot password? Click here to reset