That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

by   Piotr Żelasko, et al.

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages - an encouraging result for the low-resource speech community.


page 1

page 2

page 3

page 4


Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

We study training a single acoustic model for multiple languages with th...

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

The idea of combining multiple languages' recordings to train a single a...

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (AS...

Multilingual ASR with Massive Data Augmentation

Towards developing high-performing ASR for low-resource languages, appro...

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

We improve low-resource ASR by integrating the ideas of multilingual tra...

Multilingual Adaptation of RNN Based ASR Systems

A large amount of data is required for automatic speech recognition (ASR...

Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

This paper proposes a multilingual speech synthesis method which combine...

Please sign up or login with your details

Forgot password? Click here to reset