A Study of All-Convolutional Encoders for Connectionist Temporal Classification

10/28/2017
by   Kalpesh Krishna, et al.
0

Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition. In particular, we explore a range of one-dimensional convolutional layers, which are particularly efficient. We compare the performance of our CNN-based models against typical RNNbased models in terms of training time, decoding time, model size and word error rate (WER) on the Switchboard Eval2000 corpus. We find that our CNN-based models are close in performance to LSTMs, while not matching them, and are much faster to train and decode.

READ FULL TEXT
research
01/10/2017

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing s...
research
05/16/2020

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models h...
research
02/24/2017

Residual Convolutional CTC Networks for Automatic Speech Recognition

Deep learning approaches have been widely used in Automatic Speech Recog...
research
01/16/2023

BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

Recent developments using End-to-End Deep Learning models have been show...
research
05/01/2023

File Fragment Classification using Light-Weight Convolutional Neural Networks

In digital forensics, file fragment classification is an important step ...
research
07/23/2020

Sequential Routing Framework: Fully Capsule Network-based Speech Recognition

Capsule networks (CapsNets) have recently gotten attention as alternativ...
research
04/06/2016

Advances in Very Deep Convolutional Neural Networks for LVCSR

Very deep CNNs with small 3x3 kernels have recently been shown to achiev...

Please sign up or login with your details

Forgot password? Click here to reset