Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

04/05/2022
by   Dan Berrebbi, et al.
0

Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain. On the contrary, spectral feature (SF) extractors such as log Mel-filterbanks are hand-crafted non-learnable components, and could be more robust to domain shifts. The present work examines the assumption that combining non-learnable SF extractors to SSL models is an effective approach to low resource speech tasks. We propose a learnable and interpretable framework to combine SF and SSL representations. The proposed framework outperforms significantly both baseline and SSL models on Automatic Speech Recognition (ASR) and Speech Translation (ST) tasks on three low resource datasets. We additionally design a mixture of experts based combination model. This last model reveals that the relative contribution of SSL models over conventional SF extractors is very small in case of domain mismatch between SSL training set and the target language data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

A Novel Self-training Approach for Low-resource Speech Recognition

In this paper, we propose a self-training approach for automatic speech ...
research
03/31/2022

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Self-supervised learning (SSL) to learn high-level speech representation...
research
05/04/2022

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

This paper describes the ON-TRAC Consortium translation systems develope...
research
06/22/2020

Self-Supervised Representations Improve End-to-End Speech Translation

End-to-end speech-to-text translation can provide a simpler and smaller ...
research
06/01/2023

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

Self-Supervised Learning (SSL) has allowed leveraging large amounts of u...
research
09/21/2023

Sparsely Shared LoRA on Whisper for Child Speech Recognition

Whisper is a powerful automatic speech recognition (ASR) model. Neverthe...
research
10/27/2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Recent years have witnessed great strides in self-supervised learning (S...

Please sign up or login with your details

Forgot password? Click here to reset