A Revised Generative Evaluation of Visual Dialogue

04/20/2020
by   Daniela Massiceti, et al.
0

Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge. The current evaluation scheme of the VisDial dataset computes the ranks of ground-truth answers in predefined candidate sets, which Massiceti et al. (2018) show can be susceptible to the exploitation of dataset biases. This scheme also does little to account for the different ways of expressing the same answer–an aspect of language that has been well studied in NLP. We propose a revised evaluation scheme for the VisDial dataset leveraging metrics from the NLP literature to measure consensus between answers generated by the model and a set of relevant answers. We construct these relevant answer sets using a simple and effective semi-supervised method based on correlation, which allows us to automatically extend and scale sparse relevance annotations from humans to the entire dataset. We release these sets and code for the revised evaluation scheme as DenseVisDial, and intend them to be an improvement to the dataset in the face of its existing constraints and design choices.

READ FULL TEXT

page 1

page 14

page 16

research
12/16/2018

Visual Dialogue without Vision or Dialogue

We characterise some of the quirks and shortcomings in the exploration o...
research
02/11/2018

FlipDial: A Generative Model for Two-Way Visual Dialogue

We present FlipDial, a generative model for visual dialogue that simulta...
research
11/21/2017

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

The Visual Dialogue task requires an agent to engage in a conversation a...
research
08/29/2016

Machine Comprehension Using Match-LSTM and Answer Pointer

Machine comprehension of text is an important problem in natural languag...
research
06/09/2020

ConfNet2Seq: Full Length Answer Generation from Spoken Questions

Conversational and task-oriented dialogue systems aim to interact with t...
research
05/20/2022

Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

Solving crossword puzzles requires diverse reasoning capabilities, acces...

Please sign up or login with your details

Forgot password? Click here to reset