(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas

08/31/2019
by   Dongyeop Kang, et al.
0

Stylistic variation in text needs to be studied with different aspects including the writer's personal traits, interpersonal relations, rhetoric, and more. Despite recent attempts on computational modeling of the variation, the lack of parallel corpora of style language makes it difficult to systematically control the stylistic change as well as evaluate such models. We release PASTEL, the parallel and annotated stylistic language dataset, that contains 41K parallel sentences (8.3K parallel stories) annotated across different personas. Each persona has different styles in conjunction: gender, age, country, political view, education, ethnic, and time-of-writing. The dataset is collected from human annotators with solid control of input denotation: not only preserving original meaning between text, but promoting stylistic diversity to annotators. We test the dataset on two interesting applications of style language, where PASTEL helps design appropriate experiment and evaluation. First, in predicting a target style (e.g., male or female in gender) given a text, multiple styles of PASTEL make other external style variables controlled (or fixed), which is a more accurate experimental design. Second, a simple supervised model with our parallel text outperforms the unsupervised models using nonparallel text in style transfer. Our dataset is publicly available.

READ FULL TEXT

page 4

page 5

page 13

research
10/22/2020

Multi-dimensional Style Transfer for Partially Annotated Data using Language Models as Discriminators

Style transfer has been widely explored in natural language generation w...
research
09/25/2019

Semi-supervised Text Style Transfer: Cross Projection in Latent Space

Text style transfer task requires the model to transfer a sentence of on...
research
01/31/2019

Unsupervised Text Style Transfer via Iterative Matching and Translation

Text style transfer seeks to learn how to automatically rewrite sentence...
research
11/09/2019

xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation

Every natural text is written in some style. The style is formed by a co...
research
09/07/2021

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

An individual's variation in writing style is often a function of both s...
research
01/24/2023

Audience-Centric Natural Language Generation via Style Infusion

Adopting contextually appropriate, audience-tailored linguistic styles i...
research
10/18/2018

Large-scale Hierarchical Alignment for Author Style Transfer

We propose a simple method for extracting pseudo-parallel monolingual se...

Please sign up or login with your details

Forgot password? Click here to reset