Towards Language-Universal End-to-End Speech Recognition

11/06/2017
by   Suyoun Kim, et al.
0

Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network's internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems built with a multi-task learning approach. We also show that this model can be used to initialize a monolingual speech recognizer, and can be used to create a bilingual model for use in code-switching scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2018

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio ...
research
06/27/2023

Confidence-based Ensembles of End-to-End Speech Recognition Models

The number of end-to-end speech recognition models grows every year. The...
research
01/19/2019

Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

We propose an end-to-end affect recognition approach using a Convolution...
research
04/17/2020

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 ...
research
07/31/2023

Multilingual context-based pronunciation learning for Text-to-Speech

Phonetic information and linguistic knowledge are an essential component...
research
05/07/2021

Efficient Weight factorization for Multilingual Speech Recognition

End-to-end multilingual speech recognition involves using a single model...
research
06/03/2016

Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning

Various treebanks have been released for dependency parsing. Despite tha...

Please sign up or login with your details

Forgot password? Click here to reset