DeepAI AI Chat
Log In Sign Up

Learning to Disambiguate by Asking Discriminative Questions

by   Yining Li, et al.
Carnegie Mellon University
The Chinese University of Hong Kong

The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.


page 1

page 4

page 5

page 8

page 12

page 13

page 14


Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Human conversation is a complex mechanism with subtle nuances. It is hen...

ClarQ: A large-scale and diverse dataset for Clarification Question Generation

Question answering and conversational systems are often baffled and need...

What's in a Question: Using Visual Questions as a Form of Supervision

Collecting fully annotated image datasets is challenging and expensive. ...

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Mirroring the success of masked language models, vision-and-language cou...

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

The use of Deep Learning and Computer Vision in the Cultural Heritage do...

CapWAP: Captioning with a Purpose

The traditional image captioning task uses generic reference captions to...

From Visual to Acoustic Question Answering

We introduce the new task of Acoustic Question Answering (AQA) to promot...