Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

06/05/2017
by   Jiasen Lu, et al.
0

We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce 'safe' and generic responses ("I don't know", "I can't tell"). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it cannot be deployed to have real conversations with users. Our work aims to achieve the best of both worlds -- the practical usefulness of G and the strong performance of D -- via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution -- specifically, an RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67 https://github.com/jiasenlu/visDial.pytorch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2016

Visual Dialog

We introduce the task of Visual Dialog, which requires an AI agent to ho...
research
04/30/2019

Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators

Encoder-decoder based neural architectures serve as the basis of state-o...
research
08/20/2017

An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog

We present a novel end-to-end trainable neural network model for task-or...
research
01/27/2019

Promoting Diversity for End-to-End Conversation Response Generation

We present our work on Track 2 in the Dialog System Technology Challenge...
research
04/14/2020

DialGraph: Sparse Graph Learning Networks for Visual Dialog

Visual dialog is a task of answering a sequence of questions grounded in...
research
11/25/2019

End-to-End Trainable Non-Collaborative Dialog System

End-to-end task-oriented dialog models have achieved promising performan...
research
04/15/2021

Ensemble of MRR and NDCG models for Visual Dialog

Assessing an AI agent that can converse in human language and understand...

Please sign up or login with your details

Forgot password? Click here to reset