DeepAI AI Chat
Log In Sign Up

Learning to Disambiguate by Asking Discriminative Questions

08/09/2017
by   Yining Li, et al.
Carnegie Mellon University
The Chinese University of Hong Kong
0

The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.

READ FULL TEXT

page 1

page 4

page 5

page 8

page 12

page 13

page 14

03/29/2018

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Human conversation is a complex mechanism with subtle nuances. It is hen...
06/10/2020

ClarQ: A large-scale and diverse dataset for Clarification Question Generation

Question answering and conversational systems are often baffled and need...
04/12/2017

What's in a Question: Using Visual Questions as a Form of Supervision

Collecting fully annotated image datasets is challenging and expensive. ...
09/23/2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Mirroring the success of masked language models, vision-and-language cou...
07/25/2022

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

The use of Deep Learning and Computer Vision in the Cultural Heritage do...
11/09/2020

CapWAP: Captioning with a Purpose

The traditional image captioning task uses generic reference captions to...
02/28/2019

From Visual to Acoustic Question Answering

We introduce the new task of Acoustic Question Answering (AQA) to promot...