Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

01/15/2022
by   Philipp Klumpp, et al.
0

Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is not necessarily large in size, but large with respect to the amount of unique speakers, utilized hardware and varying recording conditions. This enables a machine learning model to explore as much of the domain-specific input space as possible during parameter estimation. This work introduces Common Phone, a gender-balanced, multilingual corpus recorded from more than 11.000 contributors via Mozilla's Common Voice project. It comprises around 116 hours of speech enriched with automatically generated phonetic segmentation. A Wav2Vec 2.0 acoustic model was trained with the Common Phone to perform phonetic symbol recognition and validate the quality of the generated phonetic annotation. The architecture achieved a PER of 18.1 computed with all 101 unique phonetic symbols, showing slight differences between the individual languages. We conclude that Common Phone provides sufficient variability and reliable phonetic annotation to help bridging the gap between research and application of acoustic models.

READ FULL TEXT
research
07/11/2021

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

The use of phonological features (PFs) potentially allows language-speci...
research
02/26/2020

Universal Phone Recognition with a Multilingual Allophone System

Multilingual models can improve language processing, particularly for lo...
research
06/19/2023

Comparison of L2 Korean pronunciation error patterns from five L1 backgrounds by using automatic phonetic transcription

This paper presents a large-scale analysis of L2 Korean pronunciation er...
research
08/30/2018

SonarSnoop: Active Acoustic Side-Channel Attacks

We report the first active acoustic side-channel attack. Speakers are us...
research
07/05/2018

Neural Language Codes for Multilingual Acoustic Models

Multilingual Speech Recognition is one of the most costly AI problems, b...
research
04/29/2020

Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

Phone level localization of mis-articulation is a key requirement for an...

Please sign up or login with your details

Forgot password? Click here to reset