The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

09/29/2017
by   Yanchao Yu, et al.
0

We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as " burchak " for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78 Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.

READ FULL TEXT

page 3

page 7

research
09/29/2017

Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

We present an optimised multi-modal dialogue agent for interactive learn...
research
09/29/2017

Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

We present a multi-modal dialogue system for interactive learning of per...
research
09/22/2017

Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena

Natural, spontaneous dialogue proceeds incrementally on a word-by-word b...
research
02/07/2018

Enhance word representation for out-of-vocabulary on Ubuntu dialogue corpus

Ubuntu dialogue corpus is the largest public available dialogue corpus t...
research
11/02/2018

Engaging Image Chat: Modeling Personality in Grounded Dialogue

To achieve the long-term goal of machines being able to engage humans in...
research
05/04/2020

What is Learned in Visually Grounded Neural Syntax Acquisition

Visual features are a promising signal for learning bootstrap textual mo...
research
09/22/2017

Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars

We investigate an end-to-end method for automatically inducing task-base...

Please sign up or login with your details

Forgot password? Click here to reset