Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

06/25/2020
by   Alex Sokolov, et al.
0

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systems are monolingual and based on traditional joint-sequence based n-gram models [1,2]. As an alternative, we present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages. This allows the model to utilize a combination of universal symbol inventories of Latin-like alphabets and cross-linguistically shared feature representations. Such model is especially useful in the scenarios of low resource languages and code switching/foreign words, where the pronunciations in one language need to be adapted to other locales or accents. We further experiment with word language distribution vector as an additional training target in order to improve system performance by helping the model decouple pronunciations across a variety of languages in the parameter space. We show 7.2 in phoneme error rate over low resource languages and no degradation over high resource ones compared to monolingual baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2017

Massively Multilingual Neural Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and...
research
05/19/2023

Language-universal phonetic encoder for low-resource speech recognition

Multilingual training is effective in improving low-resource ASR, which ...
research
06/12/2018

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

Sequence-to-sequence attention-based models integrate an acoustic, pronu...
research
06/09/2020

Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Transfer learning from high-resource languages is known to be an efficie...
research
07/07/2022

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Multilingual speech recognition has drawn significant attention as an ef...
research
11/10/2022

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

End-to-end multilingual ASR has become more appealing because of several...
research
03/12/2021

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

With the rapid development of speech assistants, adapting server-intende...

Please sign up or login with your details

Forgot password? Click here to reset