Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics' correlations with human evaluation, there have been very few attempts to assess their robustness over multiple domains and dimensions. Also, their focus is mainly on the English language. All of these challenges prompt the development of automatic evaluation metrics that are reliable in various domains, dimensions, and languages. This track in the 11th Dialogue System Technology Challenge (DSTC11) is part of the ongoing effort to promote robust and multilingual automatic evaluation metrics. This article describes the datasets and baselines provided to participants and discusses the submission and result details of the two proposed subtasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2023

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Despite significant research effort in the development of automatic dial...
research
11/03/2021

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

The development of Open-Domain Dialogue Systems (ODS)is a trending topic...
research
08/31/2023

Towards Multilingual Automatic Dialogue Evaluation

The main limiting factor in the development of robust multilingual dialo...
research
06/11/2021

Assessing Political Prudence of Open-domain Chatbots

Politically sensitive topics are still a challenge for open-domain chatb...
research
04/06/2020

PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems

Open-domain generative dialogue systems have attracted considerable atte...
research
02/08/2016

The "Sprekend Nederland" project and its application to accent location

This paper describes the data collection effort that is part of the proj...
research
06/13/2023

HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation

Similes play an imperative role in creative writing such as story and di...

Please sign up or login with your details

Forgot password? Click here to reset