Diversified Dynamic Routing for Vision Tasks

09/26/2022
by   Botos Csaba, et al.
7

Deep learning models for vision tasks are trained on large datasets under the assumption that there exists a universal representation that can be used to make predictions for all samples. Whereas high complexity models are proven to be capable of learning such representations, a mixture of experts trained on specific subsets of the data can infer the labels more efficiently. However using mixture of experts poses two new problems, namely (i) assigning the correct expert at inference time when a new unseen sample is presented. (ii) Finding the optimal partitioning of the training data, such that the experts rely the least on common features. In Dynamic Routing (DR) a novel architecture is proposed where each layer is composed of a set of experts, however without addressing the two challenges we demonstrate that the model reverts to using the same subset of experts. In our method, Diversified Dynamic Routing (DivDR) the model is explicitly trained to solve the challenge of finding relevant partitioning of the data and assigning the correct experts in an unsupervised approach. We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO showing improved performance over several baselines.

READ FULL TEXT
research
04/20/2022

On the Representation Collapse of Sparse Mixture of Experts

Sparse mixture of experts provides larger model capacity while requiring...
research
12/15/2022

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

Optimization in multi-task learning (MTL) is more challenging than singl...
research
08/29/2023

Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Mixture of experts (MoE) is a popular technique in deep learning that im...
research
03/18/2019

Hierarchical Routing Mixture of Experts

In regression tasks the distribution of the data is often too complex to...
research
09/28/2020

Scalable Transfer Learning with Expert Models

Transfer of pre-trained representations can improve sample efficiency an...
research
05/19/2019

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Modern NLP systems require high-quality annotated data. In specialized d...

Please sign up or login with your details

Forgot password? Click here to reset