Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

06/11/2022
by   A Arunkumar, et al.
0

Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these models optimizes a different loss which gives rise to the possibility of their features being complementary. This paper proposes using an ensemble of such SSL representations and models, which exploits the complementary nature of the features extracted by the various pretrained models. We hypothesize that this results in a richer feature representation and shows results for the ASR downstream task. To this end, we use three SSL models that have shown excellent results on ASR tasks, namely HuBERT, Wav2vec2.0, and WaveLM. We explore the ensemble of models fine-tuned for the ASR task and the ensemble of features using the embeddings obtained from the pre-trained models for a downstream ASR task. We get improved performance over individual models and pre-trained features using Librispeech(100h) and WSJ dataset for the downstream tasks.

READ FULL TEXT
research
02/07/2022

Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Self-supervised learning (SSL) is a powerful tool that allows learning o...
research
05/23/2023

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

Large self-supervised pre-trained speech models have achieved remarkable...
research
05/31/2023

Representation Reliability and Its Impact on Downstream Tasks

Self-supervised pre-trained models extract general-purpose representatio...
research
03/31/2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

While self-supervised speech representation learning (SSL) models serve ...
research
07/20/2023

Identifying Interpretable Subspaces in Image Representations

We propose Automatic Feature Explanation using Contrasting Concepts (FAL...
research
04/09/2021

Feature Replacement and Combination for Hybrid ASR Systems

Acoustic modeling of raw waveform and learning feature extractors as par...
research
04/04/2022

A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Self-supervised models for speech processing emerged recently as popular...

Please sign up or login with your details

Forgot password? Click here to reset