Sparse Fusion Mixture-of-Experts are Domain Generalizable Learners

06/08/2022
by   Bo Li, et al.
0

Domain generalization (DG) aims at learning generalizable models under distribution shifts to avoid redundantly overfitting massive training data. Previous works with complex loss design and gradient constraint have not yet led to empirical success on large-scale benchmarks. In this work, we reveal the mixture-of-experts (MoE) model's generalizability on DG by leveraging to distributively handle multiple aspects of the predictive features across domains. To this end, we propose Sparse Fusion Mixture-of-Experts (SF-MoE), which incorporates sparsity and fusion mechanisms into the MoE framework to keep the model both sparse and predictive. SF-MoE has two dedicated modules: 1) sparse block and 2) fusion block, which disentangle and aggregate the diverse learned signals of an object, respectively. Extensive experiments demonstrate that SF-MoE is a domain-generalizable learner on large-scale benchmarks. It outperforms state-of-the-art counterparts by more than 2 DG datasets (e.g., DomainNet), with the same or even lower computational costs. We further reveal the internal mechanism of SF-MoE from distributed representation perspective (e.g., visual attributes). We hope this framework could facilitate future research to push generalizable object recognition to the real world. Code and models are released at https://github.com/Luodian/SF-MoE-DG.

READ FULL TEXT

page 4

page 7

research
05/20/2022

SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System

With the increasing diversity of ML infrastructures nowadays, distribute...
research
05/19/2021

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Domain generalizable (DG) person re-identification (ReID) is a challengi...
research
04/07/2018

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

Joint understanding of video and language is an active research area wit...
research
04/26/2022

Focal Sparse Convolutional Networks for 3D Object Detection

Non-uniformed 3D sparse data, e.g., point clouds or voxels in different ...
research
09/07/2021

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization

Learning robust models that generalize well under changes in the data di...
research
02/03/2022

Direct Molecular Conformation Generation

Molecular conformation generation aims to generate three-dimensional coo...
research
11/29/2022

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) t...

Please sign up or login with your details

Forgot password? Click here to reset