DeepAI AI Chat
Log In Sign Up

A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions

by   Takuma Udagawa, et al.

Recent models achieve promising results in visually grounded dialogues. However, existing datasets often contain undesirable biases and lack sophisticated linguistic analyses, which make it difficult to understand how well current models recognize their precise linguistic structures. To address this problem, we make two design choices: first, we focus on OneCommon Corpus <cit.>, a simple yet challenging common grounding dataset which contains minimal bias by design. Second, we analyze their linguistic structures based on spatial expressions and provide comprehensive and reliable annotation for 600 dialogues. We show that our annotation captures important linguistic structures including predicate-argument structure, modification and ellipsis. In our experiments, we assess the model's understanding of these structures through reference resolution. We demonstrate that our annotation can reveal both the strengths and weaknesses of baseline models in essential levels of detail. Overall, we propose a novel framework and resource for investigating fine-grained language understanding in visually grounded dialogues.


page 3

page 5

page 6

page 10

page 11

page 12

page 14

page 16


Refining Implicit Argument Annotation For UCCA

Few resources represent implicit roles for natural language understandin...

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding

Common grounding is the process of creating, repairing and updating mutu...

Grounded Semantic Composition for Visual Scenes

We present a visually-grounded language understanding model based on a s...

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

A major challenge in visually grounded language generation is to build r...

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

Negation is poorly captured by current language models, although the ext...

Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands

Many task domains require robots to interpret and act upon natural langu...

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments

The task of argument mining aims to detect all possible argumentative co...