Sparsely Shared LoRA on Whisper for Child Speech Recognition

09/21/2023
by   Wei Liu, et al.
0

Whisper is a powerful automatic speech recognition (ASR) model. Nevertheless, its zero-shot performance on low-resource speech requires further improvement. Child speech, as a representative type of low-resource speech, is leveraged for adaptation. Recently, parameter-efficient fine-tuning (PEFT) in NLP was shown to be comparable and even better than full fine-tuning, while only needing to tune a small set of trainable parameters. However, current PEFT methods have not been well examined for their effectiveness on Whisper. In this paper, only parameter composition types of PEFT approaches such as LoRA and Bitfit are investigated as they do not bring extra inference costs. Different popular PEFT methods are examined. Particularly, we compare LoRA and AdaLoRA and figure out the learnable rank coefficient is a good design. Inspired by the sparse rank distribution allocated by AdaLoRA, a novel PEFT approach Sparsely Shared LoRA (S2-LoRA) is proposed. The two low-rank decomposed matrices are globally shared. Each weight matrix only has to maintain its specific rank coefficients that are constrained to be sparse. Experiments on low-resource Chinese child speech show that with much fewer trainable parameters, S2-LoRA can achieve comparable in-domain adaptation performance to AdaLoRA and exhibit better generalization ability on out-of-domain data. In addition, the rank distribution automatically learned by S2-LoRA is found to have similar patterns to AdaLoRA's allocation.

READ FULL TEXT
research
08/23/2023

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...
research
09/14/2021

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

Automatic Speech Recognition (ASR) systems are often optimized to work b...
research
02/05/2023

MAC: A unified framework boosting low resource automatic speech recognition

We propose a unified framework for low resource automatic speech recogni...
research
02/01/2021

On Scaling Contrastive Representations for Low-Resource Speech Recognition

Recent advances in self-supervised learning through contrastive training...
research
04/05/2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in ...
research
01/06/2023

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

Despite improvements to the generalization performance of automated spee...
research
05/18/2023

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

This paper presents a parameter-efficient learning (PEL) to develop a lo...

Please sign up or login with your details

Forgot password? Click here to reset