Residual Convolutional CTC Networks for Automatic Speech Recognition

02/24/2017
by   Yisen Wang, et al.
0

Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However, most CNNs used in existing work have less than 10 layers which may not be deep enough to capture all human speech signal information. In this paper, we propose a novel deep and wide CNN architecture denoted as RCNN-CTC, which has residual connections and Connectionist Temporal Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can exploit temporal and spectral structures of speech signals simultaneously. Furthermore, we introduce a CTC-based system combination, which is different from the conventional frame-wise senone-based one. The basic subsystems adopted in the combination are different types and thus mutually complementary to each other. Experimental results show that our proposed single system RCNN-CTC can achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets, compared to several widely used neural network systems in ASR. In addition, the proposed system combination can offer a further error reduction on these two data sets, resulting in relative WER reductions of 14.91% and 6.52% on WSJ dev93 and Tencent Chat data sets respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2017

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing s...
research
05/12/2021

StutterNet: Stuttering Detection Using Time Delay Neural Network

This paper introduces StutterNet, a novel deep learning based stuttering...
research
04/21/2022

Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Accent variability has posed a huge challenge to automatic speech recogn...
research
09/06/2020

Non causal deep learning based dereverberation

In this paper we demonstrate the effectiveness of non-causal context for...
research
10/10/2016

Very Deep Convolutional Networks for End-to-End Speech Recognition

Sequence-to-sequence models have shown success in end-to-end speech reco...
research
10/28/2017

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

Connectionist temporal classification (CTC) is a popular sequence predic...
research
03/13/2018

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

Machine lipreading is a special type of automatic speech recognition (AS...

Please sign up or login with your details

Forgot password? Click here to reset