Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

12/07/2020
by   Xinwei Li, et al.
0

Inspired by SpecAugment – a data augmentation method for end-to-end ASR systems, we propose a frame-level SpecAugment method (f-SpecAugment) to improve the performance of deep convolutional neural networks (CNN) for hybrid HMM based ASR systems. Similar to the utterance level SpecAugment, f-SpecAugment performs three transformations: time warping, frequency masking, and time masking. Instead of applying the transformations at the utterance level, f-SpecAugment applies them to each convolution window independently during training. We demonstrate that f-SpecAugment is more effective than the utterance level SpecAugment for deep CNN based hybrid models. We evaluate the proposed f-SpecAugment on 50-layer Self-Normalizing Deep CNN (SNDCNN) acoustic models trained with up to 25000 hours of training data. We observe f-SpecAugment reduces WER by 0.5-4.5 four languages. As the benefits of augmentation techniques tend to diminish as training data size increases, the large scale training reported is important in understanding the effectiveness of f-SpecAugment. Our experiments demonstrate that even with 25k training data, f-SpecAugment is still effective. We also demonstrate that f-SpecAugment has benefits approximately equivalent to doubling the amount of training data for deep CNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Frustratingly Easy Noise-aware Training of Acoustic Models

Environmental noises and reverberation have a detrimental effect on the ...
research
09/14/2019

Multilingual ASR with Massive Data Augmentation

Towards developing high-performing ASR for low-resource languages, appro...
research
05/31/2016

Model-driven Simulations for Deep Convolutional Neural Networks

The use of simulated virtual environments to train deep convolutional ne...
research
04/02/2020

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

We present a complete training pipeline to build a state-of-the-art hybr...
research
05/01/2022

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...
research
03/11/2015

A Novel Hybrid CNN-AIS Visual Pattern Recognition Engine

Machine learning methods are used today for most recognition problems. C...
research
12/11/2019

SpecAugment on Large Scale Datasets

Recently, SpecAugment, an augmentation scheme for automatic speech recog...

Please sign up or login with your details

Forgot password? Click here to reset