An Evaluation Protocol for Generative Conversational Systems

10/24/2020
by   Seolhwa Lee, et al.
0

There is a multitude of novel generative models for open-domain conversational systems; however, there is no systematic evaluation of different systems. Systematic comparisons require consistency in experimental design, evaluation sets, conversational systems and their outputs, and statistical analysis. We lay out a protocol for the evaluation of conversational models using head-to-head pairwise comparison. We analyze ten recent models that claim state-of-the-art performance using a paired head-to-head performance (win-loss-tie) on five evaluation datasets. Our findings show that DialoGPT and Blender are superior systems using Bradley-Terry model and TrueSkill ranking methods. These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets. Finally, we make all code and evaluations publicly available for researchers to compare their model to other state-of-the-art dialog models.

READ FULL TEXT

page 6

page 15

page 18

research
07/20/2023

Learning and Evaluating Human Preferences for Conversational Head Generation

A reliable and comprehensive evaluation metric that aligns with manual p...
research
05/28/2023

ConvGenVisMo: Evaluation of Conversational Generative Vision Models

Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu ...
research
05/22/2023

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

The recent success of large language models (LLMs) has shown great poten...
research
04/17/2022

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

Knowledge-grounded conversational models are known to suffer from produc...
research
04/13/2019

A Repository of Conversational Datasets

Progress in Machine Learning is often driven by the availability of larg...
research
11/28/2022

ROC Analysis for Paired Comparison Data

Paired comparison models are used for analyzing data that involves pairw...
research
05/18/2023

An Android Robot Head as Embodied Conversational Agent

This paper describes, how current Machine Learning (ML) techniques combi...

Please sign up or login with your details

Forgot password? Click here to reset