Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

06/05/2020
by   Erion Çano, et al.
0

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples coming from natural language generation methods used to solve several tasks. We propose to use a human likeliness score that shows the percentage of the output samples from a method that look as if they were written by a human. Instead of having human participants label or rate those samples, we completely automate the process by using a discrimination procedure based on large pretrained language models and their probability distributions. As follow up, we plan to perform an empirical analysis of human-written and machine-generated texts to find the optimal setup of this evaluation approach. A validation procedure involving human participants will also check how the automatic evaluation correlates with human judgments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Automating Text Naturalness Evaluation of NLG Systems

Automatic methods and metrics that assess various quality criteria of au...
research
05/03/2023

Can Large Language Models Be an Alternative to Human Evaluations?

Human evaluation is indispensable and inevitable for assessing the quali...
research
08/17/2023

Contrasting Linguistic Patterns in Human and LLM-Generated Text

We conduct a quantitative analysis contrasting human-written English new...
research
08/07/2020

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

Automatic evaluation for open-ended natural language generation tasks re...
research
05/20/2020

Creative Artificial Intelligence – Algorithms vs. humans in an incentivized writing competition

The release of openly available, robust text generation algorithms has s...
research
05/31/2022

Cluster-based Evaluation of Automatically Generated Text

While probabilistic language generators have improved dramatically over ...
research
07/10/2018

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

In this paper, we propose a joint architecture that captures language, r...

Please sign up or login with your details

Forgot password? Click here to reset