SpeechMoE2: Mixture-of-Experts Model with Improved Routing

11/23/2021
by   Zhao You, et al.
0

Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. Experimental results show that the proposed SpeechMoE2 can achieve lower character error rate (CER) with comparable parameters than SpeechMoE on both multi-domain and multi-accent task. Primarily, the proposed method provides up to 1.6 improvement for the multidomain task and 1.9 for the multi-accent task respectively. Besides, increasing the number of experts also achieves consistent performance improvement and keeps the computational cost constant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2021

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

Recently, Mixture of Experts (MoE) based Transformer has shown promising...
research
07/12/2023

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Multilingual speech recognition for both monolingual and code-switching ...
research
10/14/2021

Towards More Effective and Economic Sparsely-Activated Model

The sparsely-activated models have achieved great success in natural lan...
research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
09/17/2022

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

While transformers and their variant conformers show promising performan...
research
04/07/2022

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Recently, Conformer based CTC/AED model has become a mainstream architec...
research
10/19/2022

On the Adversarial Robustness of Mixture of Experts

Adversarial robustness is a key desirable property of neural networks. I...

Please sign up or login with your details

Forgot password? Click here to reset