Multilingual Speech Recognition With A Single End-To-End Model

11/06/2017
by   Shubham Toshniwal, et al.
0

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21 analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7 eliminate confusion between different languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2018

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

Sequence-to-sequence attention-based models integrate an acoustic, pronu...
research
06/14/2021

Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts

Sequence-to-sequence (seq2seq) models are competitive with hybrid models...
research
12/05/2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for bu...
research
04/25/2018

End-to-End Multimodal Speech Recognition

Transcription or sub-titling of open-domain videos is still a challengin...
research
05/28/2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

Modern public ASR tools usually provide rich support for training variou...
research
07/05/2018

Neural Language Codes for Multilingual Acoustic Models

Multilingual Speech Recognition is one of the most costly AI problems, b...
research
04/10/2020

Scalable Multilingual Frontend for TTS

This paper describes progress towards making a Neural Text-to-Speech (TT...

Please sign up or login with your details

Forgot password? Click here to reset