SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

06/03/2023
by   Changhun Kim, et al.
0

Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naive greedy decoding and performs adaptation across timesteps at a frame level, which may not be optimal given the sequential nature of the model output. Motivated by this, we propose a novel TTA framework, dubbed SGEM, for general ASR models. To treat the sequential output, SGEM first exploits beam search to explore candidate output logits and selects the most plausible one. Then, it utilizes generalized entropy minimization and negative sampling as unsupervised objectives to adapt the model. SGEM achieves state-of-the-art performance for three mainstream ASR models under various domain shifts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2022

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

Although deep learning-based end-to-end Automatic Speech Recognition (AS...
research
08/18/2022

Evaluating Continual Test-Time Adaptation for Contextual and Semantic Domain Shifts

In this paper, our goal is to adapt a pre-trained Convolutional Neural N...
research
09/06/2020

Libri-Adapt: A New Speech Dataset for Unsupervised Domain Adaptation

This paper introduces a new dataset, Libri-Adapt, to support unsupervise...
research
11/08/2020

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Automatic Speech Recognition with Transformers

Recently, Transformers have shown competitive automatic speech recogniti...
research
06/01/2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

Deep Learning (DL) models have been popular nowadays to execute differen...
research
09/13/2023

Can Whisper perform speech-based in-context learning

This paper investigates the in-context learning abilities of the Whisper...
research
12/22/2022

Alignment Entropy Regularization

Existing training criteria in automatic speech recognition(ASR) permit t...

Please sign up or login with your details

Forgot password? Click here to reset