Exploring Neural Transducers for End-to-End Speech Recognition

07/24/2017
by   Eric Battenberg, et al.
0

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperform the best reported CTC models with a language model, on the popular Hub5'00 benchmark. On our internal diverse dataset, these trends continue - RNNTransducer models rescored with a language model after beam search outperform our best CTC models. These results simplify the speech recognition pipeline so that decoding can now be expressed purely as neural network operations. We also study how the choice of encoder architecture affects the performance of the three models - when all encoder layers are forward only, and when encoders downsample the input representation aggressively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2017

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

We present a state-of-the-art end-to-end Automatic Speech Recognition (A...
research
02/17/2021

Do End-to-End Speech Recognition Models Care About Context?

The two most common paradigms for end-to-end speech recognition are conn...
research
03/22/2023

Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network

In recent years, End-to-End speech recognition technology based on deep ...
research
01/28/2022

Neural-FST Class Language Model for End-to-End Speech Recognition

We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech...
research
07/20/2023

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

This paper presents a speech recognition system developed by the Transsi...
research
04/22/2022

Efficient Training of Neural Transducer for Speech Recognition

As one of the most popular sequence-to-sequence modeling approaches for ...
research
05/07/2020

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end...

Please sign up or login with your details

Forgot password? Click here to reset