Masked Autoencoders Are Articulatory Learners

10/27/2022
by   Ahmed Adel Attia, et al.
0

Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings.

READ FULL TEXT

page 3

page 4

research
01/13/2022

Speech Resources in the Tamasheq Language

In this paper we present two datasets for Tamasheq, a developing languag...
research
02/21/2022

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Audio recordings of collaborative learning environments contain a consta...
research
10/25/2021

Lhotse: a speech data representation library for the modern deep learning ecosystem

Speech data is notoriously difficult to work with due to a variety of co...
research
08/27/2020

Exploring British Accents: Modeling the Trap-Bath Split with Functional Data Analysis

The sound of our speech is influenced by the places we come from. Great ...
research
10/05/2021

Detection of blue whale vocalisations using a temporal-domain convolutional neural network

We present a framework for detecting blue whale vocalisations from acous...
research
12/01/2020

NHSS: A Speech and Singing Parallel Database

We present a database of parallel recordings of speech and singing, coll...

Please sign up or login with your details

Forgot password? Click here to reset