Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

06/12/2018
by   Shiyu Zhou, et al.
0

Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are concerned with multilingual end-to-end speech recognition on low-resource languages by a single Transformer, one of sequence-to-sequence attention-based models. Sub-words are employed as the multilingual modeling unit without using any pronunciation lexicon. First, we show that a single multilingual ASR Transformer performs well on low-resource languages despite of some language confusion. We then look at incorporating language-specific information into the model by inserting the language symbol at the beginning or at the end of the original sub-words sequence under the condition of language information being known during training. Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10.5% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning. We go on to show that, assuming the language information being known during training and testing, about relatively 12.4% average WER reduction can be observed compared to SHL-MLSTM with residual learning through giving the language symbol as the sentence start token.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2017

Multilingual Speech Recognition With A Single End-To-End Model

Training a conventional automatic speech recognition (ASR) system to sup...
research
05/16/2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

The choice of modeling units is critical to automatic speech recognition...
research
08/22/2021

A Dual-Decoder Conformer for Multilingual Speech Recognition

Transformer-based models have recently become very popular for sequence-...
research
12/05/2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for bu...
research
06/25/2020

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech...
research
11/07/2018

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

This paper investigates the applications of various multilingual approac...
research
07/03/2023

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Connectionist Temporal Classification (CTC) models are popular for their...

Please sign up or login with your details

Forgot password? Click here to reset