On Biasing Transformer Attention Towards Monotonicity

04/08/2021
by   Annette Rios, et al.
26

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

Exact Hard Monotonic Attention for Character-Level Transduction

Many common character-level, string-to-string transduction tasks, e.g., ...
research
06/03/2019

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural TTS has demonstrated strong capabilities to generate human-like s...
research
04/03/2017

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Recurrent neural network models with an attention mechanism have proven ...
research
04/28/2022

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

Recent deep learning Text-to-Speech (TTS) systems have achieved impressi...
research
12/14/2017

Monotonic Chunkwise Attention

Sequence-to-sequence models with soft attention have been successfully a...
research
08/29/2018

Hard Non-Monotonic Attention for Character-Level Transduction

Character-level string-to-string transduction is an important component ...
research
04/21/2017

Attention Strategies for Multi-Source Sequence-to-Sequence Learning

Modeling attention in neural multi-source sequence-to-sequence learning ...

Please sign up or login with your details

Forgot password? Click here to reset