Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

04/12/2021
by   Jakob Nyberg, et al.
0

Human ratings are one of the most prevalent methods to evaluate the performance of natural language processing algorithms. Similarly, it is common to measure the quality of sentences generated by a natural language generation model using human raters. In this paper, we argue for exploring the use of subjective evaluations within the process of training language generation models in a multi-task learning setting. As a case study, we use a crowd-authored dialogue corpus to fine-tune six different language generation models. Two of these models incorporate multi-task learning and use subjective ratings of lines as part of an explicit learning goal. A human evaluation of the generated dialogue lines reveals that utterances generated by the multi-tasking models were subjectively rated as the most typical, most moving the conversation forward, and least offensive. Based on these promising first results, we discuss future research directions for incorporating subjective human evaluations into language model training and to hence keep the human user in the loop during the development process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2022

A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods

Multi-task learning (MTL) has become increasingly popular in natural lan...
research
07/10/2018

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

In this paper, we propose a joint architecture that captures language, r...
research
03/15/2021

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

As transparency becomes key for robotics and AI, it will be necessary to...
research
05/16/2018

A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation

Natural language generation lies at the core of generative dialogue syst...
research
08/18/2023

Exploring Sampling Techniques for Generating Melodies with a Transformer Language Model

Research in natural language processing has demonstrated that the qualit...
research
03/22/2023

Can we trust the evaluation on ChatGPT?

ChatGPT, the first large language model (LLM) with mass adoption, has de...
research
06/17/2020

Modeling subjective assessments of guilt in newspaper crime narratives

Crime reporting is a prevalent form of journalism with the power to shap...

Please sign up or login with your details

Forgot password? Click here to reset