Monotonic Chunkwise Attention

12/14/2017
by   Chung-Cheng Chiu, et al.
0

Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable to real-time sequence transduction. To address these issues, we propose Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed. We show that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. When applied to online speech recognition, we obtain state-of-the-art results and match the performance of a model using an offline soft attention mechanism. In document summarization experiments where we do not expect monotonic alignments, we show significantly improved performance compared to a baseline monotonic attention-based model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2017

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Recurrent neural network models with an attention mechanism have proven ...
research
03/30/2021

A study of latent monotonic attention variants

End-to-end models reach state-of-the-art performance for speech recognit...
research
06/03/2019

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural TTS has demonstrated strong capabilities to generate human-like s...
research
05/01/2020

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

The attention mechanism of the Listen, Attend and Spell (LAS) model requ...
research
08/29/2018

Hard Non-Monotonic Attention for Character-Level Transduction

Character-level string-to-string transduction is an important component ...
research
04/08/2021

On Biasing Transformer Attention Towards Monotonicity

Many sequence-to-sequence tasks in natural language processing are rough...
research
04/28/2022

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

Recent deep learning Text-to-Speech (TTS) systems have achieved impressi...

Please sign up or login with your details

Forgot password? Click here to reset