'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

07/28/2023
by   Javier Chiyah-Garcia, et al.
0

Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models. We use the SIMMC 2.0 dataset to evaluate the ability of different state-of-the-art model architectures to process CEs, with a metric that probes the contextual updates that arise from them in the model. We find that language-based models are able to encode simple multi-modal semantic information and process some CEs, excelling with those related to the dialogue history, whilst multi-modal models can use additional learning objectives to obtain disentangled object representations, which become crucial to handle complex referential ambiguities across modalities overall.

READ FULL TEXT

page 1

page 2

page 8

research
12/08/2022

DialogCC: Large-Scale Multi-Modal Dialogue Dataset

As sharing images in an instant message is a crucial factor, there has b...
research
05/24/2023

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Perceiving multi-modal information and fulfilling dialogues with humans ...
research
07/19/2021

Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images

In multi-modal dialogue systems, it is important to allow the use of ima...
research
09/14/2023

VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

Visually-grounded dialog systems, which integrate multiple modes of comm...
research
09/29/2017

Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

We present a multi-modal dialogue system for interactive learning of per...
research
05/23/2022

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Task-oriented dialogue (TOD) systems have been widely used by mobile pho...
research
07/05/2022

Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation

This paper introduces the schemes of Team LingJing's experiments in NLPC...

Please sign up or login with your details

Forgot password? Click here to reset