Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

01/26/2022
by   Piotr Żelasko, et al.
6

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we 1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

The idea of combining multiple languages' recordings to train a single a...
research
05/16/2020

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Only a handful of the world's languages are abundant with the resources ...
research
06/21/2023

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

We compare using a PHOIBLE-based phone mapping method and using phonolog...
research
07/24/2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Building language-universal speech recognition systems entails producing...
research
03/04/2020

Learning Fast Adaptation on Cross-Accented Speech Recognition

Local dialects influence people to pronounce words of the same language ...
research
11/13/2017

Phonemic and Graphemic Multilingual CTC Based Speech Recognition

Training automatic speech recognition (ASR) systems requires large amoun...
research
05/25/2020

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Mispronunciation detection and diagnosis (MDD) is a core component of co...

Please sign up or login with your details

Forgot password? Click here to reset