Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

09/11/2023
by   Ted Zadouri, et al.
0

The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient fine-tuning (PEFT) methods and is on par with full fine-tuning by only updating the lightweight experts – less than 1 tasks as it does not depend on any prior task knowledge. Our research underscores the versatility of the mixture of experts architecture, showcasing its ability to deliver robust performance even when subjected to rigorous parameter constraints. Our code used in all the experiments is publicly available here: https://github.com/for-ai/parameter-efficient-moe.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Standard fine-tuning of large pre-trained language models (PLMs) for dow...
research
03/05/2022

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Conventional NAS-based pruning algorithms aim to find the sub-network wi...
research
07/25/2023

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

As the size of transformer-based models continues to grow, fine-tuning t...
research
12/06/2022

FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer

Recent work has explored the potential to adapt a pre-trained vision tra...
research
03/13/2022

SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization

Sequence-to-sequence neural networks have recently achieved great succes...
research
07/31/2023

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

Current state-of-the-art results in computer vision depend in part on fi...
research
04/19/2022

Table-based Fact Verification with Self-adaptive Mixture of Experts

The table-based fact verification task has recently gained widespread at...

Please sign up or login with your details

Forgot password? Click here to reset