On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

04/03/2021
by   Tsz Kin Lam, et al.
0

We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23 improvements in WER over SpecAugment alone on LibriSpeech 100h and LibriSpeech 960h test datasets, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Sequence-to-Sequence (S2S) models recently started to show state-of-the-...
research
10/27/2022

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on ...
research
05/14/2022

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

Consistency regularization has recently been applied to semi-supervised ...
research
02/27/2020

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

We propose autoencoding speaker conversion for training data augmentatio...
research
11/02/2020

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Data augmentation methods usually apply the same augmentation (or a mix ...
research
09/16/2021

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Automatic lyrics transcription (ALT), which can be regarded as automatic...
research
07/09/2018

Foreign English Accent Adjustment by Learning Phonetic Patterns

State-of-the-art automatic speech recognition (ASR) systems struggle wit...

Please sign up or login with your details

Forgot password? Click here to reset