Do self-supervised speech models develop human-like perception biases?

05/31/2022
by   Juliette Millet, et al.
0

Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners' native languages. Does the same thing happen in self-supervised models? We examine the representational spaces of three kinds of state-of-the-art self-supervised models: wav2vec 2.0, HuBERT and contrastive predictive coding (CPC), and compare them with the perceptual spaces of French-speaking and English-speaking human listeners, both globally and taking account of the behavioural differences between the two language groups. We show that the CPC model shows a small native language effect, but that wav2vec 2.0 and HuBERT seem to develop a universal speech perception space which is not language specific. A comparison against the predictions of supervised phone recognisers suggests that all three self-supervised models capture relatively fine-grained perceptual phenomena, while supervised models are better at capturing coarser, phone-level, effects of listeners' native language, on perception.

READ FULL TEXT

page 12

page 13

research
05/31/2022

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

Our native language influences the way we perceive speech sounds, affect...
research
07/30/2023

Mispronunciation detection using self-supervised speech representations

In recent years, self-supervised learning (SSL) models have produced pro...
research
02/23/2023

ProsAudit, a prosodic benchmark for self-supervised speech models

We present ProsAudit, a benchmark in English to assess structural prosod...
research
08/06/2020

Evaluating computational models of infant phonetic learning across languages

In the first year of life, infants' speech perception becomes attuned to...
research
10/12/2020

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

In this paper, we present a data set and methods to compare speech proce...
research
02/24/2022

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

Recent work on unsupervised speech segmentation has used self-supervised...
research
05/19/2023

North Sámi Dialect Identification with Self-supervised Speech Models

The North Sámi (NS) language encapsulates four primary dialectal variant...

Please sign up or login with your details

Forgot password? Click here to reset