VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

09/14/2023
by   Yunshui Li, et al.
0

Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose VDialogUE, a Visually-grounded Dialogue benchmark for Unified Evaluation. It defines five core multi-modal dialogue tasks and covers six datasets. Furthermore, in order to provide a comprehensive assessment of the model's performance across all tasks, we developed a novel evaluation metric called VDscore, which is based on the Analytic Hierarchy Process (AHP) method. Additionally, we present a straightforward yet efficient baseline model, named VISIT (VISually-grounded dIalog Transformer), to promote the advancement of general multi-modal dialogue systems. It progressively builds its multi-modal foundation and dialogue capability via a two-stage pre-training strategy. We believe that the VDialogUE benchmark, along with the evaluation scripts and our baseline models, will accelerate the development of visually-grounded dialog systems and lead to the development of more sophisticated and effective pre-trained models.

READ FULL TEXT
research
05/24/2023

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Perceiving multi-modal information and fulfilling dialogues with humans ...
research
08/04/2023

Towards Generalist Foundation Model for Radiology

In this study, we aim to initiate the development of Radiology Foundatio...
research
07/28/2023

'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

Referential ambiguities arise in dialogue when a referring expression do...
research
05/17/2023

IMAD: IMage-Augmented multi-modal Dialogue

Currently, dialogue systems have achieved high performance in processing...
research
09/29/2017

Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

We present a multi-modal dialogue system for interactive learning of per...
research
09/29/2017

Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

We present an optimised multi-modal dialogue agent for interactive learn...
research
08/27/2019

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

This work aims at modeling how the meaning of gradable adjectives of siz...

Please sign up or login with your details

Forgot password? Click here to reset