Learning Factored Representations in a Deep Mixture of Experts

12/16/2013
by   David Eigen, et al.
0

Mixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. In this this work, we extend the Mixture of Experts to a stacked model, the Deep Mixture of Experts, with multiple sets of gating and experts. This exponentially increases the number of effective experts by associating each input with a combination of experts at each layer, yet maintains a modest model size. On a randomly translated version of the MNIST dataset, we find that the Deep Mixture of Experts automatically learns to develop location-dependent ("where") experts at the first layer, and class-specific ("what") experts at the second layer. In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones. These demonstrate effective use of all expert combinations.

READ FULL TEXT

page 5

page 6

page 7

research
08/04/2022

Towards Understanding Mixture of Experts in Deep Learning

The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlle...
research
04/20/2017

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Training convolutional networks (CNN's) that fit on a single GPU with mi...
research
08/21/2020

Biased Mixtures Of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations

We propose a novel mixture-of-experts class to optimize computer vision ...
research
08/28/2023

Fast Feedforward Networks

We break the linear link between the layer size and its inference cost b...
research
04/22/2022

Balancing Expert Utilization in Mixture-of-Experts Layers Embedded in CNNs

This work addresses the problem of unbalanced expert utilization in spar...
research
08/11/2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

We introduce a new domain expert mixture (DEMix) layer that enables cond...
research
09/17/2019

Historical and Modern Features for Buddha Statue Classification

While Buddhism has spread along the Silk Roads, many pieces of art have ...

Please sign up or login with your details

Forgot password? Click here to reset