Scarecrow: A Framework for Scrutinizing Machine Text

07/02/2021
by   Yao Dou, et al.
0

Modern neural text generation systems can produce remarkably fluent and grammatical texts. While earlier language models suffered from repetition and syntactic errors, the errors made by contemporary models are often semantic, narrative, or discourse failures. To facilitate research of these complex error types, we introduce a new structured, crowdsourced error annotation schema called Scarecrow. The error categories used in Scarecrow – such as redundancy, commonsense errors, and incoherence – were identified by combining expert analysis with several pilot rounds of ontology-free crowd annotation to arrive at a schema which covers the error phenomena found in real machine generated text. We use Scarecrow to collect 13k annotations of 1.3k human and machine generate paragraphs of English language news text, amounting to over 41k spans each labeled with its error category, severity, a natural language explanation, and antecedent span (where relevant). We collect annotations for text generated by state-of-the-art systems with varying known performance levels, from GPT-2 Small through the largest GPT-3. We isolate several factors for detailed analysis, including parameter count, training data, and decoding technique. Our results show both expected and surprising differences across these settings. These findings demonstrate the value of Scarecrow annotations in the assessment of current and future text generation systems. We release our complete annotation toolkit and dataset at https://yao-dou.github.io/scarecrow/.

READ FULL TEXT

page 19

page 21

page 22

page 23

research
12/24/2022

Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

As text generated by large language models proliferates, it becomes vita...
research
02/16/2023

Keep it Neutral: Using Natural Language Inference to Improve Generation

We explore incorporating natural language inference (NLI) into the text ...
research
06/05/2022

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing...
research
10/10/2022

Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

Is it possible to build a general and automatic natural language generat...
research
04/06/2023

Large language models effectively leverage document-level context for literary translation, but critical errors persist

Large language models (LLMs) are competitive with the state of the art o...
research
05/31/2023

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency

We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a s...
research
05/23/2023

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

Large pre-trained language models have exhibited unprecedented capabilit...

Please sign up or login with your details

Forgot password? Click here to reset