Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

06/09/2023
by   Kinan Martin, et al.
9

Textless self-supervised speech models have grown in capabilities in recent years, but the nature of the linguistic information they encode has not yet been thoroughly examined. We evaluate the extent to which these models' learned representations align with basic representational distinctions made by humans, focusing on a set of phonetic (low-level) and phonemic (more abstract) contrasts instantiated in word-initial stops. We find that robust representations of both phonetic and phonemic distinctions emerge in early layers of these models' architectures, and are preserved in the principal components of deeper layer representations. Our analyses suggest two sources for this success: some can only be explained by the optimization of the models on speech data, while some can be attributed to these models' high-dimensional architectures. Our findings show that speech-trained HuBERT derives a low-noise and low-dimensional subspace corresponding to abstract phonological distinctions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2023

What do self-supervised speech models know about words?

Many self-supervised speech models (S3Ms) have been introduced over the ...
research
10/21/2022

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Recent self-supervised learning (SSL) models have proven to learn rich r...
research
05/27/2022

Self-supervised models of audio effectively explain human cortical responses to speech

Self-supervised language models are very effective at predicting high-le...
research
10/28/2022

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

Given the strong results of self-supervised models on various tasks, the...
research
06/04/2023

An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

Self-supervised representation learning for speech often involves a quan...
research
02/11/2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Auditory attention decoding (AAD) is a technique used to identify and am...
research
10/06/2022

Critical Learning Periods for Multisensory Integration in Deep Networks

We show that the ability of a neural network to integrate information fr...

Please sign up or login with your details

Forgot password? Click here to reset