xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation

11/09/2019
by   Dongyeop Kang, et al.
22

Every natural text is written in some style. The style is formed by a complex combination of different stylistic factors, including formality markers, emotions, metaphors, etc. Some factors implicitly reflect the author's personality, while others are explicitly controlled by the author's choices in order to achieve some personal or social goal. One cannot form a complete understanding of a text and its author without considering these factors. The factors combine and co-vary in complex ways to form styles. Studying the nature of the covarying combinations sheds light on stylistic language in general, sometimes called cross-style language understanding. This paper provides a benchmark corpus (xSLUE) with an online platform (http://xslue.com) for cross-style language understanding and evaluation. The benchmark contains text in 15 different styles and 23 classification tasks. For each task, we provide the fine-tuned classifier for further analysis. Our analysis shows that some styles are highly dependent on each other (e.g., impoliteness and offense), and some domains (e.g., tweets, political debates) are stylistically more diverse than others (e.g., academic manuscripts). We discuss the technical challenges of cross-style understanding and potential directions for future research: cross-style modeling which shares the internal representation for low-resource or low-performance styles and other applications such as cross-style generation.

READ FULL TEXT

page 5

page 9

page 10

page 18

research
10/29/2021

From Theories on Styles to their Transfer in Text: Bridging the Gap with a Hierarchical Survey

Humans are naturally endowed with the ability to write in a particular s...
research
09/07/2021

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

An individual's variation in writing style is often a function of both s...
research
08/31/2019

(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas

Stylistic variation in text needs to be studied with different aspects i...
research
09/30/2020

Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

Authorship identification tasks, which rely heavily on linguistic styles...
research
10/27/2022

Nearest Neighbor Language Models for Stylistic Controllable Generation

Recent language modeling performance has been greatly improved by the us...
research
11/17/2019

Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling

Teaching style plays an influential role in helping students to achieve ...
research
11/13/2021

SocialBERT – Transformers for Online SocialNetwork Language Modelling

The ubiquity of the contemporary language understanding tasks gives rele...

Please sign up or login with your details

Forgot password? Click here to reset