Sparse Sequence-to-Sequence Models

05/14/2019
by   Ben Peters, et al.
0

Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of α-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any α > 1. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2018

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

The Softmax function is used in the final layer of nearly all existing s...
research
03/18/2021

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Current sequence-to-sequence models are trained to minimize cross-entrop...
research
06/29/2017

Talking Drums: Generating drum grooves with neural networks

Presented is a method of generating a full drum kit part for a provided ...
research
06/22/2015

A Deep Memory-based Architecture for Sequence-to-Sequence Learning

We propose DEEPMEMORY, a novel deep architecture for sequence-to-sequenc...
research
04/01/2022

Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models

In many natural language processing (NLP) tasks the same input (e.g. sou...
research
06/09/2016

Sequence-to-Sequence Learning as Beam-Search Optimization

Sequence-to-Sequence (seq2seq) modeling has rapidly become an important ...
research
11/08/2016

The Neural Noisy Channel

We formulate sequence to sequence transduction as a noisy channel decodi...

Please sign up or login with your details

Forgot password? Click here to reset