Listen, Attend and Spell

08/05/2015
by   William Chan, et al.
0

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1 language model, and 10.3 By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2018

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

We investigate training end-to-end speech recognition models with the re...
research
05/23/2018

Implicit Language Model in LSTM for OCR

Neural networks have become the technique of choice for OCR, but many as...
research
04/05/2023

Efficient OCR for Building a Diverse Digital History

Thousands of users consult digital archives daily, but the information t...
research
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
research
09/02/2015

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

We propose an end-to-end, domain-independent neural encoder-aligner-deco...
research
07/27/2022

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation

End-to-end speech synthesis models directly convert the input characters...
research
10/29/2015

Attention with Intention for a Neural Network Conversation Model

In a conversation or a dialogue process, attention and intention play in...

Please sign up or login with your details

Forgot password? Click here to reset