Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

08/31/2023
by   John Mendonça, et al.
0

Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose a novel framework that takes advantage of the strengths of current evaluation models with the newly-established paradigm of prompting Large Language Models (LLMs). Empirical results show our framework achieves state of the art results in terms of mean Spearman correlation scores across several benchmarks and ranks first place on both the Robust and Multilingual tasks of the DSTC11 Track 4 "Automatic Evaluation Metrics for Open-Domain Dialogue Systems", proving the evaluation capabilities of prompted LLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

The advent and fast development of neural networks have revolutionized t...
research
06/19/2022

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

Automatic open-domain dialogue evaluation is a crucial component of dial...
research
12/14/2021

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Chatbots are designed to carry out human-like conversations across diffe...
research
08/31/2023

Towards Multilingual Automatic Dialogue Evaluation

The main limiting factor in the development of robust multilingual dialo...
research
11/03/2021

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

The development of Open-Domain Dialogue Systems (ODS)is a trending topic...
research
04/06/2020

PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems

Open-domain generative dialogue systems have attracted considerable atte...
research
04/10/2020

Designing Precise and Robust Dialogue Response Evaluators

Automatic dialogue response evaluator has been proposed as an alternativ...

Please sign up or login with your details

Forgot password? Click here to reset