Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

10/09/2021
by   Joel Shor, et al.
0

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech. In this work, we introduce a new state-of-the-art paralinguistic representation derived from large-scale, fully self-supervised training of a 600M+ parameter Conformer-based architecture. We benchmark on a diverse set of speech tasks and demonstrate that simple linear classifiers trained on top of our time-averaged representation outperform nearly all previous results, in some cases by large margins. Our analyses of context-window size demonstrate that, surprisingly, 2 second context-windows achieve 98 the full long-term context. Furthermore, while the best per-task representations are extracted internally in the network, stable performance across several layers allows a single universal representation to reach near optimal performance on all tasks.

READ FULL TEXT

page 3

page 4

research
05/23/2023

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

Large self-supervised pre-trained speech models have achieved remarkable...
research
09/15/2023

Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection

Existing deepfake speech detection systems lack generalizability to unse...
research
10/15/2022

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

Self-supervised learning of speech representations from large amounts of...
research
10/16/2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

We present the SUPERB challenge at SLT 2022, which aims at learning self...
research
03/01/2022

Towards a Common Speech Analysis Engine

Recent innovations in self-supervised representation learning have led t...
research
10/21/2022

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Recent self-supervised learning (SSL) models have proven to learn rich r...
research
06/01/2023

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

Self-supervised learning (SSL) has recently allowed leveraging large dat...

Please sign up or login with your details

Forgot password? Click here to reset