Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

09/17/2022
by   Ye Bai, et al.
1

While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient conformer via sharing sparsely-gated experts. Specifically, we use sparsely-gated mixture-of-experts (MoE) to extend the capacity of a conformer block without increasing computation. Then, the parameters of the grouped conformer blocks are shared so that the number of parameters is reduced. Next, to ensure the shared blocks with the flexibility of adapting representations at different levels, we design the MoE routers and normalization individually. Moreover, we use knowledge distillation to further improve the performance. Experimental results show that the proposed model achieves competitive performance with 1/3 of the parameters of the encoder, compared with the full-parameter model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Mixture-of-Expert Conformer for Streaming Multilingual ASR

End-to-end models with large capacity have significantly improved multil...
research
06/15/2023

Understanding Parameter Sharing in Transformers

Parameter sharing has proven to be a parameter-efficient approach. Previ...
research
11/09/2020

Efficient End-to-End Speech Recognition Using Performers in Conformers

On-device end-to-end speech recognition poses a high requirement on mode...
research
11/23/2021

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Mixture-of-experts based acoustic models with dynamic routing mechanisms...
research
03/15/2023

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Continued improvements in machine learning techniques offer exciting new...
research
11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...
research
05/16/2022

Chemical transformer compression for accelerating both training and inference of molecular modeling

Transformer models have been developed in molecular science with excelle...

Please sign up or login with your details

Forgot password? Click here to reset