Investigating Pre-trained Audio Encoders in the Low-Resource Condition

05/28/2023
by   Hao Yang, et al.
0

Pre-trained speech encoders have been central to pushing state-of-the-art results across various speech understanding and generation tasks. Nonetheless, the capabilities of these encoders in low-resource settings are yet to be thoroughly explored. To address this, we conduct a comprehensive set of experiments using a representative set of 3 state-of-the-art encoders (Wav2vec2, WavLM, Whisper) in the low-resource setting across 7 speech understanding and generation tasks. We provide various quantitative and qualitative analyses on task performance, convergence speed, and representational properties of the encoders. We observe a connection between the pre-training protocols of these encoders and the way in which they capture information in their internal layers. In particular, we observe the Whisper encoder exhibits the greatest low-resource capabilities on content-driven tasks in terms of performance and convergence speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Pre-trained speech Transformers have facilitated great success across va...
research
12/12/2022

Jointly Learning Visual and Auditory Speech Representations from Raw Data

We present RAVEn, a self-supervised multi-modal approach to jointly lear...
research
11/02/2022

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

We present a new Self-Supervised Learning (SSL) approach to pre-train en...
research
06/30/2023

Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition

With the rise of bidirectional encoder representations from Transformer ...
research
04/10/2022

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data

Multimodal pre-training for audio-and-text has recently been proved to b...
research
05/02/2023

Contrastive Speech Mixup for Low-resource Keyword Spotting

Most of the existing neural-based models for keyword spotting (KWS) in s...

Please sign up or login with your details

Forgot password? Click here to reset