AlloVera: A Multilingual Allophone Database

04/17/2020
by   David R. Mortensen, et al.
0

We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from phonological context. While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription. AlloVera allows the training of speech recognition models that output phonetic transcriptions in the International Phonetic Alphabet (IPA), regardless of the input language. We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task. We explore the implications of this technology (and related technologies) for the documentation of endangered and minority languages. We further explore other applications for which AlloVera will be suitable as it grows, including phonological typology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Repre...
research
08/07/2023

Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

This paper presents a state-of-the-art model for transcribing speech in ...
research
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
research
07/24/2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Building language-universal speech recognition systems entails producing...
research
07/13/2021

A Configurable Multilingual Model is All You Need to Recognize All Languages

Multilingual automatic speech recognition (ASR) models have shown great ...
research
01/19/2019

Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

We propose an end-to-end affect recognition approach using a Convolution...
research
12/30/2019

An Empirical Study of Factors Affecting Language-Independent Models

Scaling existing applications and solutions to multiple human languages ...

Please sign up or login with your details

Forgot password? Click here to reset