GuessWhat?! Visual object discovery through multi-modal dialogue

11/23/2016
by   Harm de Vries, et al.
0

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.

READ FULL TEXT

page 11

page 12

page 14

page 15

page 16

page 17

page 19

page 23

research
07/11/2019

MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Building computer systems that can converse about their visual environme...
research
11/12/2019

Visual Dialogue State Tracking for Question Generation

GuessWhat?! is a visual dialogue task between a guesser and an oracle. T...
research
09/17/2021

Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation

Visual dialogue is a challenging task since it needs to answer a series ...
research
11/17/2019

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Different from Visual Question Answering task that requires to answer on...
research
12/16/2022

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

Persuasion modeling is a key building block for conversational agents. E...
research
06/21/2023

Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling

Enhancing AI systems with efficient communication skills that align with...
research
10/24/2022

Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?

Decoding strategies play a crucial role in natural language generation s...

Please sign up or login with your details

Forgot password? Click here to reset