Balancing Expert Utilization in Mixture-of-Experts Layers Embedded in CNNs

04/22/2022
by   Svetlana Pavlitskaya, et al.
9

This work addresses the problem of unbalanced expert utilization in sparsely-gated Mixture of Expert (MoE) layers, embedded directly into convolutional neural networks. To enable a stable training process, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, hard constraints mostly maintain generalized experts and increase the model performance for many applications. Our findings demonstrate that even with a single dataset and end-to-end training, experts can implicitly focus on individual sub-domains of the input space. Experts in the proposed models with MoE embeddings implicitly focus on distinct domains, even without suitable predefined datasets. As an example, experts trained for CIFAR-100 image classification specialize in recognizing different domains such as sea animals or flowers without previous data clustering. Experiments with RetinaNet and the COCO dataset further indicate that object detection experts can also specialize in detecting objects of distinct sizes.

READ FULL TEXT

page 7

page 8

page 10

research
12/16/2013

Learning Factored Representations in a Deep Mixture of Experts

Mixtures of Experts combine the outputs of several "expert" networks, ea...
research
08/02/2023

From Sparse to Soft Mixtures of Experts

Sparse mixture of expert architectures (MoEs) scale model capacity witho...
research
08/04/2022

Towards Understanding Mixture of Experts in Deep Learning

The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlle...
research
07/19/2022

MoEC: Mixture of Expert Clusters

Sparsely Mixture of Experts (MoE) has received great interest due to its...
research
06/06/2022

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Large sparsely-activated models have obtained excellent performance in m...
research
08/11/2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

We introduce a new domain expert mixture (DEMix) layer that enables cond...
research
05/25/2023

Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts

Accurate estimation of multiple quality variables is critical for buildi...

Please sign up or login with your details

Forgot password? Click here to reset