Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

10/21/2022
by   Cheol Jun Cho, et al.
0

Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and semantic perspectives, the physical grounding by speech production has not yet received full attention. To bridge this gap, we conduct a comprehensive analysis to link speech representations to articulatory trajectories measured by electromagnetic articulography (EMA). Our analysis is based on a linear probing approach where we measure articulatory score as an average correlation of linear mapping to EMA. We analyze a set of SSL models selected from the leaderboard of the SUPERB benchmark and perform further layer-wise analyses on two most successful models, Wav2Vec 2.0 and HuBERT. Surprisingly, representations from the recent speech SSL models are highly correlated with EMA traces (best: r = 0.81), and only 5 minutes are sufficient to train a linear model with high performance (r = 0.77). Our findings suggest that SSL models learn to align closely with continuous articulations, and provide a novel insight into speech SSL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2020

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

For self-supervised speech processing, it is crucial to use pretrained m...
research
10/05/2022

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

Self-supervised speech models have grown fast during the past few years ...
research
06/09/2023

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Textless self-supervised speech models have grown in capabilities in rec...
research
05/24/2023

Reverse Engineering Self-Supervised Learning

Self-supervised learning (SSL) is a powerful tool in machine learning, b...
research
10/27/2022

Opening the Black Box of wav2vec Feature Encoder

Self-supervised models, namely, wav2vec and its variants, have shown pro...
research
10/13/2022

On the Utility of Self-supervised Models for Prosody-related Tasks

Self-Supervised Learning (SSL) from speech data has produced models that...
research
10/09/2021

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

Many speech applications require understanding aspects beyond the words ...

Please sign up or login with your details

Forgot password? Click here to reset