Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

12/10/2021
by   Kenichi Kumatani, et al.
7

The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity. In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy. More specifically, we apply the sparsely-gated MoE technique to two types of networks: Sequence-to-Sequence Transformer (S2S-T) and Transformer Transducer (T-T). We demonstrate through a set of ASR experiments on multiple language data that the MoE networks can reduce the relative word error rates by 16.5 and 4.7 investigate the effect of the MoE on the T-T architecture in various conditions: streaming mode, non-streaming mode, the use of language ID and the label decoder with the MoE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2019

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model ...
research
03/01/2023

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

We propose gated language experts to improve multilingual transformer tr...
research
11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...
research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
07/20/2023

Globally Normalising the Transducer for Streaming Speech Recognition

The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates a...
research
06/17/2021

Multi-mode Transformer Transducer with Stochastic Future Context

Automatic speech recognition (ASR) models make fewer errors when more su...
research
10/28/2019

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...

Please sign up or login with your details

Forgot password? Click here to reset