Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

03/27/2022
by   Guan-Ting Lin, et al.
0

Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions. Test-time Adaptation (TTA), previously explored in the computer vision area, aims to adapt the model trained on source domains to yield better predictions for test samples, often out-of-domain, without accessing the source data. Here, we propose the Single-Utterance Test-time Adaptation (SUTA) framework for ASR, which is the first TTA study in speech area to our best knowledge. The single-utterance TTA is a more realistic setting that does not assume test data are sampled from identical distribution and does not delay on-demand inference due to pre-collection for the batch of adaptation data. SUTA consists of unsupervised objectives with an efficient adaptation strategy. The empirical results demonstrate that SUTA effectively improves the performance of the source ASR model evaluated on multiple out-of-domain target corpora and in-domain test samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2023

SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Automatic speech recognition (ASR) models are frequently exposed to data...
research
11/26/2020

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

The performance of automatic speech recognition (ASR) systems typically ...
research
12/04/2021

SITA: Single Image Test-time Adaptation

In Test-time Adaptation (TTA), given a model trained on some source data...
research
03/25/2021

Radically Old Way of Computing Spectra: Applications in End-to-End ASR

We propose a technique to compute spectrograms using Frequency Domain Li...
research
02/22/2023

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

End-to-end automatic speech recognition (ASR) usually suffers from perfo...
research
05/22/2023

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

Automatic speech recognition systems based on deep learning are mainly t...
research
04/21/2022

Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Accent variability has posed a huge challenge to automatic speech recogn...

Please sign up or login with your details

Forgot password? Click here to reset