Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

06/07/2021
by   Emiru Tsunoo, et al.
0

Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions. In this study, we investigated data augmentation methods for E2E ASR in distant-talk scenarios. E2E ASR models are trained on the series of CHiME challenge datasets, which are suitable tasks for studying robustness against noisy and spontaneous speech. We propose to use three augmentation methods and thier combinations: 1) data augmentation using text-to-speech (TTS) data, 2) cycle-consistent generative adversarial network (Cycle-GAN) augmentation trained to map two different audio characteristics, the one of clean speech and of noisy recordings, to match the testing condition, and 3) pseudo-label augmentation provided by the pretrained ASR module for smoothing label distributions. Experimental results using the CHiME-6/CHiME-4 datasets show that each augmentation method individually improves the accuracy on top of the conventional SpecAugment; further improvements are obtained by combining these approaches. We achieved 4.3% word error rate (WER) reduction, which was more significant than that of the SpecAugment, when we combine all three augmentations for the CHiME-6 task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

In this paper, we propose MixSpeech, a simple yet effective data augment...
research
02/27/2023

A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Data augmentations are known to improve robustness in speech-processing ...
research
12/18/2019

A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Naturally introduced perturbations in audio signal, caused by emotional ...
research
02/15/2022

Multi-style Training for South African Call Centre Audio

Mismatched data is a challenging problem for automatic speech recognitio...
research
07/09/2021

Noisy Training Improves E2E ASR for the Edge

Automatic speech recognition (ASR) has become increasingly ubiquitous on...
research
11/11/2020

Text Augmentation for Language Models in High Error Recognition Scenario

We examine the effect of data augmentation for training of language mode...
research
04/27/2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Multimodal speech recognition aims to improve the performance of automat...

Please sign up or login with your details

Forgot password? Click here to reset