MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

02/25/2021
by   Linghui Meng, et al.
14

In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model by taking a weighted combination of two different speech features (e.g., mel-spectrograms or MFCC) as the input, and recognizing both text sequences, where the two recognition losses use the same combination weight. We apply MixSpeech on two popular end-to-end speech recognition models including LAS (Listen, Attend and Spell) and Transformer, and conduct experiments on several low-resource datasets including TIMIT, WSJ, and HKUST. Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strong data augmentation method SpecAugment on these recognition tasks. Specifically, MixSpeech outperforms SpecAugment with a relative PER improvement of 10.6% on TIMIT dataset, and achieves a strong WER of 4.7% on WSJ dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Nowadays, the main problem of deep learning techniques used in the devel...
research
04/08/2022

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

End-to-end models have achieved significant improvement on automatic spe...
research
06/07/2021

Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

Although end-to-end automatic speech recognition (E2E ASR) has achieved ...
research
10/09/2021

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Psychoacoustic studies have shown that locally-time reversed (LTR) speec...
research
10/16/2022

A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR

SpecAugment is a very effective data augmentation method for both HMM an...
research
03/12/2021

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

With the rapid development of speech assistants, adapting server-intende...
research
04/27/2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Multimodal speech recognition aims to improve the performance of automat...

Please sign up or login with your details

Forgot password? Click here to reset