Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

04/15/2021
by   Wenxin Hou, et al.
0

End-to-end automatic speech recognition (ASR) can achieve promising performance with large-scale training data. However, it is known that domain mismatch between training and testing data often leads to a degradation of recognition accuracy. In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. First, to obtain labels for the features belonging to each character, we achieve frame-level label assignment using the Connectionist Temporal Classification (CTC) pseudo labels. Then, we match the character-level distributions using Maximum Mean Discrepancy. We train our algorithm using the self-training technique. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39 reduction on both cross-device and cross-environment ASR. We also comprehensively analyze the different strategies for frame-level label assignment and Transformer adaptations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

End-to-end automatic speech recognition (ASR) usually suffers from perfo...
research
11/26/2020

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

The performance of automatic speech recognition (ASR) systems typically ...
research
06/20/2022

Boosting Cross-Domain Speech Recognition with Self-Supervision

The cross-domain performance of automatic speech recognition (ASR) could...
research
06/09/2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

End-to-end (E2E) systems have shown comparable performance to hybrid sys...
research
05/25/2022

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

The performance of child speech recognition is generally less satisfacto...
research
03/07/2018

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

The performance of automatic speech recognition (ASR) systems can be sig...
research
05/04/2022

Unsupervised Domain Adaptation Learning for Hierarchical Infant Pose Recognition with Synthetic Data

The Alberta Infant Motor Scale (AIMS) is a well-known assessment scheme ...

Please sign up or login with your details

Forgot password? Click here to reset