Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

05/23/2017

∙

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P and machine translation between two languages with similar sentence structures, demonstrate that the proposed encoder-decoder model with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture.

READ FULL TEXT

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Towards Online End-to-end Transformer Automatic Speech Recognition

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

Efficient Attention using a Fixed-Size Memory Representation

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Related Research

Towards Online End-to-end Transformer Automatic Speech Recognition

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

Efficient Attention using a Fixed-Size Memory Representation

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection