Improving Sequence-to-Sequence Learning via Optimal Transport

01/18/2019
by   Liqun Chen, et al.
0

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.

READ FULL TEXT
research
10/12/2020

Improving Text Generation with Student-Forcing Optimal Transport

Neural language models are often trained with maximum likelihood estimat...
research
05/14/2018

Token-level and sequence-level loss smoothing for RNN language models

Despite the effectiveness of recurrent neural network language models, t...
research
08/24/2018

Approximate Distribution Matching for Sequence-to-Sequence Learning

Sequence-to-Sequence models were introduced to tackle many real-life pro...
research
06/28/2017

Generative Bridging Network in Neural Sequence Prediction

Maximum Likelihood Estimation (MLE) suffers from data sparsity problem i...
research
06/06/2019

Bridging the Gap between Training and Inference for Neural Machine Translation

Neural Machine Translation (NMT) generates target words sequentially in ...
research
10/24/2022

Structural generalization is hard for sequence-to-sequence models

Sequence-to-sequence (seq2seq) models have been successful across many N...
research
03/22/2021

Selective information exchange in collaborative clustering using regularized Optimal Transport

Collaborative learning has recently achieved very significant results. I...

Please sign up or login with your details

Forgot password? Click here to reset