Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

01/22/2019
by   Julian Salazar, et al.
0

Self-attention has demonstrated great success in sequence-to-sequence tasks in natural language processing, with preliminary work applying it to end-to-end encoder-decoder approaches in speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free strategy for monotonic sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for speech recognition. On the Wall Street Journal and LibriSpeech datasets, SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, attaining 4.7 CER in 1 day and 2.8 and one GPU. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how the label alphabet affects attention head and performance outcomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

End-to-end speech recognition has become popular in recent years, since ...
research
08/03/2017

Sensor Transformation Attention Networks

Recent work on encoder-decoder models for sequence-to-sequence mapping h...
research
10/25/2019

Towards Online End-to-end Transformer Automatic Speech Recognition

The Transformer self-attention network has recently shown promising perf...
research
11/08/2020

On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

Self-attention models such as Transformers, which can capture temporal r...
research
06/14/2023

EM-Network: Oracle Guided Self-distillation for Sequence Learning

We introduce EM-Network, a novel self-distillation approach that effecti...
research
06/13/2018

Double Path Networks for Sequence to Sequence Learning

Encoder-decoder based Sequence to Sequence learning (S2S) has made remar...
research
02/12/2020

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

We discuss the problem of echographic transcription in autoregressive se...

Please sign up or login with your details

Forgot password? Click here to reset