Using Natural Language Explanations to Rescale Human Judgments

05/24/2023
by   Manya Wadhwa, et al.
0

The rise of large language models (LLMs) has brought a critical need for high-quality human-labeled data, particularly for processes like human feedback and evaluation. A common practice is to label data via consensus annotation over the judgments of multiple crowdworkers. However, different annotators may have different interpretations of labeling schemes unless given extensive training, and for subjective NLP tasks, even trained expert annotators can diverge heavily. We show that these nuances can be captured by high quality natural language explanations, and propose a method to rescale ordinal annotation in the presence of disagreement using LLMs. Specifically, we feed Likert ratings and corresponding natural language explanations into an LLM and prompt it to produce a numeric score. This score should reflect the underlying assessment of the example by the annotator. The presence of explanations allows the LLM to homogenize ratings across annotators in spite of scale usage differences. We explore our technique in the context of a document-grounded question answering task on which large language models achieve near-human performance. Among questions where annotators identify incompleteness in the answers, our rescaling improves correlation between nearly all annotator pairs, improving pairwise correlation on these examples by an average of 0.2 Kendall's tau.

READ FULL TEXT
research
03/15/2021

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

As transparency becomes key for robotics and AI, it will be necessary to...
research
04/01/2023

Large language models can rate news outlet credibility

Although large language models (LLMs) have shown exceptional performance...
research
11/04/2019

Learning to Annotate: Modularizing Data Augmentation for TextClassifiers with Natural Language Explanations

Deep neural networks usually require massive labeled data, which restric...
research
11/04/2019

Learning to Annotate: Modularizing Data Augmentation for Text Classifiers with Natural Language Explanations

Deep neural networks usually require massive labeled data, which restric...
research
09/11/2022

Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech

Recent studies have exploited advanced generative language models to gen...
research
03/29/2023

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

Many natural language processing (NLP) tasks rely on labeled data to tra...
research
05/23/2018

Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow

For tasks like code synthesis from natural language, code retrieval, and...

Please sign up or login with your details

Forgot password? Click here to reset