Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

10/28/2022
by   Ramon Sanabria, et al.
0

Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments. In this work, we study several pre-trained models and pooling methods for constructing AWEs with self-supervised representations. Owing to the contextualized nature of self-supervised representations, we hypothesize that simple pooling methods, such as averaging, might already be useful for constructing AWEs. When evaluating on a standard word discrimination task, we find that HuBERT representations with mean-pooling rival the state of the art on English AWEs. More surprisingly, despite being trained only on English, HuBERT representations evaluated on Xitsonga, Mandarin, and French consistently outperform the multilingual model XLSR-53 (as well as Wav2Vec 2.0 trained on English).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Acoustic word embeddings are typically created by training a pooling fun...
research
01/03/2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

In speech recognition, it is essential to model the phonetic content of ...
research
12/14/2020

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

Many speech processing tasks involve measuring the acoustic similarity b...
research
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
research
11/09/2022

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

In this paper, we extend previous self-supervised approaches for languag...
research
06/09/2023

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Textless self-supervised speech models have grown in capabilities in rec...
research
05/19/2023

North Sámi Dialect Identification with Self-supervised Speech Models

The North Sámi (NS) language encapsulates four primary dialectal variant...

Please sign up or login with your details

Forgot password? Click here to reset