Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

11/22/2018
by   Bo Li, et al.
0

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for multilingual speech recognition and synthesis. Prior work has predominantly used characters, sub-words or words as the unit of choice to model text. These units are difficult to scale to languages with large vocabularies, particularly in the case of multilingual processing. In this work, we model text via a sequence of Unicode bytes, specifically, the UTF-8 variable length byte sequence for each character. Bytes allow us to avoid large softmaxes in languages with large vocabularies, and share representations in multilingual models. We show that bytes are superior to grapheme characters over a wide variety of languages in monolingual end-to-end speech recognition. Additionally, our multilingual byte model outperform each respective single language baseline on average by 4.4 code-switching speech, our multilingual byte model outperform our monolingual baseline by 38.6 speech synthesis model using byte representations which matches the performance of our monolingual baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
research
07/31/2023

Multilingual context-based pronunciation learning for Text-to-Speech

Phonetic information and linguistic knowledge are an essential component...
research
08/06/2020

Phonological Features for 0-shot Multilingual Speech Synthesis

Code-switching—the intra-utterance use of multiple languages—is prevalen...
research
11/23/2020

An Online Multilingual Hate speech Recognition System

The exponential increase in the use of the Internet and social media ove...
research
06/02/2022

Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

This paper proposes a multilingual speech synthesis method which combine...
research
07/21/2023

Prompting Large Language Models with Speech Recognition Abilities

Large language models have proven themselves highly flexible, able to so...
research
04/03/2019

Massively Multilingual Adversarial Speech Recognition

We report on adaptation of multilingual end-to-end speech recognition mo...

Please sign up or login with your details

Forgot password? Click here to reset