Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

06/08/2017
by   Takaaki Hori, et al.
0

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

This paper describes LeVoice automatic speech recognition systems to tra...
research
07/21/2020

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Recent advances in Automatic Speech Recognition (ASR) demonstrated how e...
research
11/07/2018

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments

Casual conversations involving multiple speakers and noises from surroun...
research
07/02/2021

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Recently, attention-based encoder-decoder (AED) models have shown high p...
research
09/22/2017

Attention-based Wav2Text with Feature Transfer Learning

Conventional automatic speech recognition (ASR) typically performs multi...
research
10/12/2017

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

This thesis introduces the sequence to sequence model with Luong's atten...
research
07/24/2017

Exploring Neural Transducers for End-to-End Speech Recognition

In this work, we perform an empirical comparison among the CTC, RNN-Tran...

Please sign up or login with your details

Forgot password? Click here to reset