A Review of Sparse Expert Models in Deep Learning

09/04/2022
by   William Fedus, et al.
42

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute per example allowing for extremely large, but efficient models. The resulting models have demonstrated significant improvements across diverse domains such as natural language processing, computer vision, and speech recognition. We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work.

READ FULL TEXT
research
09/08/2023

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Sparse Mixture-of-Experts models (MoEs) have recently gained popularity ...
research
02/17/2022

Designing Effective Sparse Expert Models

Scale has opened new frontiers in natural language processing – but at a...
research
06/10/2021

Scaling Vision with Sparse Mixture of Experts

Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...
research
11/06/2017

What Really is Deep Learning Doing?

Deep learning has achieved a great success in many areas, from computer ...
research
01/11/2021

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

In deep learning, models typically reuse the same parameters for all inp...
research
11/13/2019

The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

The past decade has seen a remarkable series of advances in machine lear...
research
06/05/2023

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

The sparse Mixture-of-Experts (Sparse-MoE) framework efficiently scales ...

Please sign up or login with your details

Forgot password? Click here to reset