Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

06/01/2023
by   Salah Zaiem, et al.
6

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

Self-supervised learning (SSL) leverages large datasets of unlabeled spe...
research
11/08/2021

Characterizing the adversarial vulnerability of speech self-supervised learning

A leaderboard named Speech processing Universal PERformance Benchmark (S...
research
11/29/2022

Model Extraction Attack against Self-supervised Speech Models

Self-supervised learning (SSL) speech models generate meaningful represe...
research
09/07/2023

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

The choice of the objective function is crucial in emerging high-quality...
research
10/13/2022

On the Utility of Self-supervised Models for Prosody-related Tasks

Self-Supervised Learning (SSL) from speech data has produced models that...
research
06/02/2023

Masked Autoencoder for Unsupervised Video Summarization

Summarizing a video requires a diverse understanding of the video, rangi...
research
10/09/2021

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

Many speech applications require understanding aspects beyond the words ...

Please sign up or login with your details

Forgot password? Click here to reset