Unsupervised Evaluation of Interactive Dialog with DialoGPT

06/23/2020
by   Shikib Mehri, et al.
0

It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. This paper introduces the FED metric (fine-grained evaluation of dialog), an automatic evaluation metric which uses DialoGPT, without any fine-tuning or supervision. It also introduces the FED dataset which is constructed by annotating a set of human-system and human-human conversations with eighteen fine-grained dialog qualities. The FED metric (1) does not rely on a ground-truth response, (2) does not require training data and (3) measures fine-grained dialog qualities at both the turn and whole dialog levels. FED attains moderate to strong correlation with human judgement at both levels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2020

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

The lack of meaningful automatic evaluation metrics for dialog has imped...
research
01/11/2017

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

Open-domain human-computer conversation has been attracting increasing a...
research
05/15/2020

Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation

There is a growing interest in developing goal-oriented dialog systems w...
research
05/24/2023

Human-Centered Metrics for Dialog System Evaluation

We present metrics for evaluating dialog systems through a psychological...
research
11/19/2018

Visual Font Pairing

This paper introduces the problem of automatic font pairing. Font pairin...
research
05/23/2023

How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation

We release MMSMR, a Massively Multi-System MultiReference dataset to ena...
research
10/12/2022

Better Smatch = Better Parser? AMR evaluation is not so simple anymore

Recently, astonishing advances have been observed in AMR parsing, as mea...

Please sign up or login with your details

Forgot password? Click here to reset