Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

10/05/2022
by   Yuanchao Li, et al.
0

Self-supervised speech models have grown fast during the past few years and have proven feasible for use in various downstream tasks. Some recent work has started to look at the characteristics of these models, yet many concerns have not been fully addressed. In this work, we conduct a study on emotional corpora to explore a popular self-supervised model – wav2vec 2.0. Via a set of quantitative analysis, we mainly demonstrate that: 1) wav2vec 2.0 appears to discard paralinguistic information that is less useful for word recognition purposes; 2) for emotion recognition, representations from the middle layer alone perform as well as those derived from layer averaging, while the final layer results in the worst performance in some cases; 3) current self-supervised models may not be the optimal solution for downstream tasks that make use of non-lexical features. Our work provides novel findings that will aid future research in this area and theoretical basis for the use of existing models.

READ FULL TEXT
research
02/24/2023

Ensemble knowledge distillation of self-supervised speech models

Distilled self-supervised models have shown competitive performance and ...
research
07/02/2023

Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

Speech representations learned in a self-supervised fashion from massive...
research
10/21/2022

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Recent self-supervised learning (SSL) models have proven to learn rich r...
research
09/11/2023

Towards generalisable and calibrated synthetic speech detection with self-supervised representations

Generalisation – the ability of a model to perform well on unseen data –...
research
11/08/2022

Comparative layer-wise analysis of self-supervised speech models

Many self-supervised speech models, varying in their pre-training object...
research
10/20/2022

SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video

This work focuses on the apparent emotional reaction recognition (AERR) ...
research
04/08/2023

Unsupervised Speech Representation Pooling Using Vector Quantization

With the advent of general-purpose speech representations from large-sca...

Please sign up or login with your details

Forgot password? Click here to reset