Jointly Learning to See, Ask, and GuessWhat

09/10/2018
by   Aashish Venkatesh, et al.
2

We are interested in understanding how the ability to ground language in vision interacts with other abilities at play in dialogue, such as asking a series of questions to obtain the necessary information to perform a certain task. With this aim, we develop a Questioner agent in the context of the GuessWhat?! game. Our model exploits a neural network architecture to build a continuous representation of the dialogue state that integrates information from the visual and linguistic modalities and conditions future action. To play the GuessWhat?! game, the Questioner agent has to be able to do both, ask questions and guess a target object in the visual environment. In our architecture, these two capabilities are considered jointly as a supervised multi-task learning problem, to which cooperative learning can be further applied. We show that the introduction of our new architecture combined with these learning regimes yields an increase of 19.5 with respect to a baseline model that treats submodules independently. With this increase, we reach an accuracy comparable to state-of-the-art models that use reinforcement learning, with the advantage that our architecture is entirely differentiable and thus easier to train. This suggests that combining our approach with reinforcement learning could lead to further improvements in the future. Finally, we present a range of analyses that examine the quality of the dialogues and shed light on the internal dynamics of the model.

READ FULL TEXT

page 6

page 14

page 16

page 17

research
09/10/2022

Ask Before You Act: Generalising to Novel Environments by Asking Questions

Solving temporally-extended tasks is a challenge for most reinforcement ...
research
06/29/2021

Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

Building an interactive artificial intelligence that can ask questions a...
research
06/18/2021

Deep Reinforcement Learning Models Predict Visual Responses in the Brain: A Preliminary Result

Supervised deep convolutional neural networks (DCNNs) are currently one ...
research
10/01/2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

In an open-world setting, it is inevitable that an intelligent agent (e....
research
01/06/2018

Using reinforcement learning to learn how to play text-based games

The ability to learn optimal control policies in systems where action sp...
research
04/18/2022

Learning to Execute Actions or Ask Clarification Questions

Collaborative tasks are ubiquitous activities where a form of communicat...

Please sign up or login with your details

Forgot password? Click here to reset