Approximate Distribution Matching for Sequence-to-Sequence Learning

08/24/2018
by   Wenhu Chen, et al.
2

Sequence-to-Sequence models were introduced to tackle many real-life problems like machine translation, summarization, image captioning, etc. The standard optimization algorithms are mainly based on example-to-example matching like maximum likelihood estimation, which is known to suffer from data sparsity problem. Here we present an alternate view to explain sequence-to-sequence learning as a distribution matching problem, where each source or target example is viewed to represent a local latent distribution in the source or target domain. Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions. In our framework, we approximate both the source and target latent distributions with recurrent neural networks (augmenter). During training, the parallel augmenters learn to better approximate the local latent distributions, while the sequence prediction model learns to minimize the KL-divergence of the transformed source distributions and the approximated target distributions. This algorithm can alleviate the data sparsity issues in sequence learning by locally augmenting more unseen data pairs and increasing the model's robustness. Experiments conducted on machine translation and image captioning consistently demonstrate the superiority of our proposed algorithm over the other competing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2018

Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

This work investigates an alternative model for neural machine translati...
research
10/25/2017

GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks

The Fisher information metric is an important foundation of information ...
research
01/18/2019

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood ...
research
09/02/2021

Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de fac...
research
06/28/2017

Generative Bridging Network in Neural Sequence Prediction

Maximum Likelihood Estimation (MLE) suffers from data sparsity problem i...
research
04/26/2019

Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications

Models such as Sequence-to-Sequence and Image-to-Sequence are widely use...
research
11/15/2022

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...

Please sign up or login with your details

Forgot password? Click here to reset