DeepAI AI Chat
Log In Sign Up

Approximate Distribution Matching for Sequence-to-Sequence Learning

by   Wenhu Chen, et al.
The Regents of the University of California

Sequence-to-Sequence models were introduced to tackle many real-life problems like machine translation, summarization, image captioning, etc. The standard optimization algorithms are mainly based on example-to-example matching like maximum likelihood estimation, which is known to suffer from data sparsity problem. Here we present an alternate view to explain sequence-to-sequence learning as a distribution matching problem, where each source or target example is viewed to represent a local latent distribution in the source or target domain. Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions. In our framework, we approximate both the source and target latent distributions with recurrent neural networks (augmenter). During training, the parallel augmenters learn to better approximate the local latent distributions, while the sequence prediction model learns to minimize the KL-divergence of the transformed source distributions and the approximated target distributions. This algorithm can alleviate the data sparsity issues in sequence learning by locally augmenting more unseen data pairs and increasing the model's robustness. Experiments conducted on machine translation and image captioning consistently demonstrate the superiority of our proposed algorithm over the other competing algorithms.


page 1

page 2

page 3

page 4


Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

This work investigates an alternative model for neural machine translati...

GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks

The Fisher information metric is an important foundation of information ...

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood ...

Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de fac...

Generative Bridging Network in Neural Sequence Prediction

Maximum Likelihood Estimation (MLE) suffers from data sparsity problem i...

Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications

Models such as Sequence-to-Sequence and Image-to-Sequence are widely use...

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...