Listen, Attend and Spell

08/05/2015
by   William Chan, et al.
0

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1 language model, and 10.3 By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0

READ FULL TEXT

page 1

page 2

page 3

page 4

01/02/2018

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

We investigate training end-to-end speech recognition models with the re...
05/23/2018

Implicit Language Model in LSTM for OCR

Neural networks have become the technique of choice for OCR, but many as...
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
09/02/2015

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

We propose an end-to-end, domain-independent neural encoder-aligner-deco...
07/27/2018

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elega...
07/27/2022

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation

End-to-end speech synthesis models directly convert the input characters...
02/18/2022

BLPnet: A new DNN model and Bengali OCR engine for Automatic License Plate Recognition

The development of the Automatic License Plate Recognition (ALPR) system...

Code Repositories

LAS-SpeechRecognition

Listen, Attend and Spell (LAS) framework for speech recognition (see https://arxiv.org/pdf/1508.01211.pdf) with DNN feature extractor


view repo