INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

05/25/2023
by   Eunseop Yoon, et al.
7

Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented accents, resulting in a deteriorated performance for non-native (L2) English accents. Although there have been some approaches to mitigate this issue, all of these methods require updating the pre-trained model weights. In this paper, we propose Information Theoretic Adversarial Prompt Tuning (INTapt), which introduces prompts concatenated to the original input that can re-modulate the attention of the pre-trained model such that the corresponding input resembles a native (L1) English speech without updating the backbone weights. INTapt is trained simultaneously in the following two manners: (1) adversarial training to reduce accent feature dependence between the original input and the prompt-concatenated input and (2) training to minimize CTC loss for improving ASR performance to a prompt-concatenated input. Experimental results show that INTapt improves the performance of L2 English and increases feature similarity between L2 and L1 accents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

ASR systems designed for native English (L1) usually underperform on non...
research
06/05/2023

Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition

The limited availability of non-native speech datasets presents a major ...
research
01/19/2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

In this work, we propose a new parameter-efficient learning framework ba...
research
12/12/2021

Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

This paper describes foundational efforts with SautiDB-Naija, a novel co...
research
12/22/2022

Pushing the performances of ASR models on English and Spanish accents

Speech to text models tend to be trained and evaluated against a single ...
research
12/14/2020

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Accents mismatching is a critical problem for end-to-end ASR. This paper...
research
08/24/2023

A Small and Fast BERT for Chinese Medical Punctuation Restoration

In clinical dictation, utterances after automatic speech recognition (AS...

Please sign up or login with your details

Forgot password? Click here to reset