Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

by   Erion Çano, et al.

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples coming from natural language generation methods used to solve several tasks. We propose to use a human likeliness score that shows the percentage of the output samples from a method that look as if they were written by a human. Instead of having human participants label or rate those samples, we completely automate the process by using a discrimination procedure based on large pretrained language models and their probability distributions. As follow up, we plan to perform an empirical analysis of human-written and machine-generated texts to find the optimal setup of this evaluation approach. A validation procedure involving human participants will also check how the automatic evaluation correlates with human judgments.


page 1

page 2

page 3

page 4


Automating Text Naturalness Evaluation of NLG Systems

Automatic methods and metrics that assess various quality criteria of au...

Training Language Models with Natural Language Feedback

Pretrained language models often do not perform tasks in ways that are i...

Creative Artificial Intelligence – Algorithms vs. humans in an incentivized writing competition

The release of openly available, robust text generation algorithms has s...

Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation

Recent advances in deep learning have resulted in a resurgence in the po...

Cluster-based Evaluation of Automatically Generated Text

While probabilistic language generators have improved dramatically over ...

Challenges in Detoxifying Language Models

Large language models (LM) generate remarkably fluent text and can be ef...

A Preliminary Study for Literary Rhyme Generation based on Neuronal Representation, Semantics and Shallow Parsing

In recent years, researchers in the area of Computational Creativity hav...