Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

10/24/2020
by   Ethan A. Chi, et al.
0

Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance. Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model, but are constrained in the edits that they can make. We propose iterative realignment, where refinements occur over latent alignments rather than output sequence space. We demonstrate this in speech recognition with Align-Refine, an end-to-end Transformer-based model which refines connectionist temporal classification (CTC) alignments to allow length-changing insertions and deletions. Align-Refine outperforms Imputer and Mask-CTC, matching an autoregressive baseline on WSJ at 1/14th the real-time factor and attaining a LibriSpeech test-other WER of 9.0 model is strong even in one iteration with a shallower decoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Non-autoregressive transformer models have achieved extremely fast infer...
research
05/18/2020

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

We present Mask CTC, a novel non-autoregressive end-to-end automatic spe...
research
02/20/2020

Imputer: Sequence Modelling via Imputation and Dynamic Programming

This paper presents the Imputer, a neural sequence model that generates ...
research
04/15/2022

Streaming Align-Refine for Non-autoregressive Deliberation

We propose a streaming non-autoregressive (non-AR) decoding algorithm to...
research
05/11/2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Although attention based end-to-end models have achieved promising perfo...
research
07/15/2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

Modern non-autoregressive (NAR) speech recognition systems aim to accele...
research
03/30/2023

Self-Refine: Iterative Refinement with Self-Feedback

Like people, LLMs do not always generate the best text for a given gener...

Please sign up or login with your details

Forgot password? Click here to reset