Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

09/18/2023
by   Dianwen Ng, et al.
0

Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2023

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Audiovisual automatic speech recognition (AV-ASR) aims to improve the ro...
research
06/15/2022

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

In this paper, we present our progress in pretraining Czech monolingual ...
research
11/10/2021

Scaling ASR Improves Zero and Few Shot Learning

With 4.5 million hours of English speech from 10 different sources acros...
research
03/31/2022

How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Recent work on self-supervised pre-training focus on leveraging large-sc...
research
07/10/2021

Layer-wise Analysis of a Self-supervised Speech Representation Model

Recently proposed self-supervised learning approaches have been successf...
research
05/31/2023

Zero-Shot Automatic Pronunciation Assessment

Automatic Pronunciation Assessment (APA) is vital for computer-assisted ...
research
08/09/2019

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition

A speech-based hearing test is designed to identify the susceptible erro...

Please sign up or login with your details

Forgot password? Click here to reset