The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

by   Wei Zhou, et al.

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both LSTM and Transformer language models are trained and evaluated. Our best system achieves a 5.6 27


page 1

page 2

page 3

page 4


Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have ...

Multilingual ASR with Massive Data Augmentation

Towards developing high-performing ASR for low-resource languages, appro...

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

In this paper we present state-of-the-art (SOTA) performance on the Libr...

Improving And Analyzing Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics th...

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Inspired by SpecAugment – a data augmentation method for end-to-end ASR ...

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Speaker adaptation is important to build robust automatic speech recogni...

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resourc...

Please sign up or login with your details

Forgot password? Click here to reset