Towards Standard Criteria for human evaluation of Chatbots: A Survey

05/24/2021
by   Hongru Liang, et al.
0

Human evaluation is becoming a necessity to test the performance of Chatbots. However, off-the-shelf settings suffer the severe reliability and replication issues partly because of the extremely high diversity of criteria. It is high time to come up with standard criteria and exact definitions. To this end, we conduct a through investigation of 105 papers involving human evaluation for Chatbots. Deriving from this, we propose five standard criteria along with precise definitions.

READ FULL TEXT
research
08/24/2022

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Research on Automatic Story Generation (ASG) relies heavily on human and...
research
02/19/2020

Interpreting Interpretations: Organizing Attribution Methods by Criteria

Attribution methods that explains the behaviour of machine learning mode...
research
06/30/2022

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Offensive Content Warning: This paper contains offensive language only f...
research
02/27/2023

Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values behind Arguments by Leveraging Their Definitions

We describe our experiments for SemEval-2023 Task 4 on the identificatio...
research
08/23/2019

Comparing Process Calculi Using Encodings

Encodings or the proof of their absence are the main way to compare proc...
research
12/02/2020

A Methodology for Deriving Evaluation Criteria for Software Solutions

Finding a suited software solution for a company poses a resource-intens...
research
12/21/2017

Methodological Framework for Determining the Land Eligibility of Renewable Energy Sources

The quantity and distribution of land which is eligible for renewable en...

Please sign up or login with your details

Forgot password? Click here to reset