DeepAI AI Chat
Log In Sign Up

Monotonic Chunkwise Attention

12/14/2017
by   Chung-Cheng Chiu, et al.
Google
0

Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable to real-time sequence transduction. To address these issues, we propose Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed. We show that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. When applied to online speech recognition, we obtain state-of-the-art results and match the performance of a model using an offline soft attention mechanism. In document summarization experiments where we do not expect monotonic alignments, we show significantly improved performance compared to a baseline monotonic attention-based model.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/03/2017

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Recurrent neural network models with an attention mechanism have proven ...
03/30/2021

A study of latent monotonic attention variants

End-to-end models reach state-of-the-art performance for speech recognit...
06/03/2019

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural TTS has demonstrated strong capabilities to generate human-like s...
05/01/2020

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

The attention mechanism of the Listen, Attend and Spell (LAS) model requ...
08/29/2018

Hard Non-Monotonic Attention for Character-Level Transduction

Character-level string-to-string transduction is an important component ...
04/08/2021

On Biasing Transformer Attention Towards Monotonicity

Many sequence-to-sequence tasks in natural language processing are rough...
04/28/2022

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

Recent deep learning Text-to-Speech (TTS) systems have achieved impressi...