SpecAugment on Large Scale Datasets

12/11/2019
by   Daniel S. Park, et al.
0

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). We achieve improvement across all test domains by mixing raw training data augmented with SpecAugment and noise-perturbed training data when training the acoustic model. We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks. By using adaptive masking, we are able to further improve the performance of the Listen, Attend and Spell model on LibriSpeech to 2.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

End-to-end models have achieved significant improvement on automatic spe...
research
07/10/2019

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

In automatic speech recognition (ASR), wideband (WB) and narrowband (NB)...
research
10/28/2022

Improving short-video speech recognition using random utterance concatenation

One of the limitations in end-to-end automatic speech recognition framew...
research
11/13/2018

Corpus Phonetics Tutorial

Corpus phonetics has become an increasingly popular method of research i...
research
10/27/2022

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on ...
research
12/07/2020

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Inspired by SpecAugment – a data augmentation method for end-to-end ASR ...
research
08/14/2023

O-1: Self-training with Oracle and 1-best Hypothesis

We introduce O-1, a new self-training objective to reduce training bias ...

Please sign up or login with your details

Forgot password? Click here to reset