Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

12/20/2022
by   Xun Wang, et al.
0

We introduce PoliteRewrite – a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging – not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate PoliteRewrite. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2022

Replacing Language Model for Style Transfer

We introduce replacing language model (RLM), a sequence-to-sequence lang...
research
06/07/2023

A New Dataset and Empirical Study for Sentence Simplification in Chinese

Sentence Simplification is a valuable technique that can benefit languag...
research
10/07/2020

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Detecting fine-grained differences in content conveyed in different lang...
research
05/20/2021

The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help

We consider a class of variable effort human annotation tasks in which t...
research
05/25/2021

Understanding Mobile GUI: from Pixel-Words to Screen-Sentences

The ubiquity of mobile phones makes mobile GUI understanding an importan...
research
09/07/2021

PAUSE: Positive and Annealed Unlabeled Sentence Embedding

Sentence embedding refers to a set of effective and versatile techniques...
research
09/20/2018

Predicting Argumenthood of English Preposition Phrases

Distinguishing between core and non-core dependents (i.e., arguments and...

Please sign up or login with your details

Forgot password? Click here to reset