DeepAI AI Chat
Log In Sign Up

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

05/21/2020
by   Jing Pan, et al.
ASAPP INC
0

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4 improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75 test-other.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/12/2016

The Microsoft 2016 Conversational Speech Recognition System

We describe Microsoft's conversational speech recognition system, in whi...
05/21/2020

Multistream CNN for Robust Acoustic Modeling

This paper presents multistream CNN, a novel neural network architecture...
11/01/2019

Predicting word error rate for reverberant speech

Reverberation negatively impacts the performance of automatic speech rec...
04/02/2020

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

We present a complete training pipeline to build a state-of-the-art hybr...
02/08/2019

Speaker diarisation using 2D self-attentive combination of embeddings

Speaker diarisation systems often cluster audio segments using speaker e...
07/02/2023

CNN-BiLSTM model for English Handwriting Recognition: Comprehensive Evaluation on the IAM Dataset

We present a CNN-BiLSTM system for the problem of offline English handwr...
10/17/2016

Achieving Human Parity in Conversational Speech Recognition

Conversational speech recognition has served as a flagship speech recogn...