Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

09/07/2021
by   Jian Zhu, et al.
0

An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Same Author or Just Same Topic? Towards Content-Independent Style Representations

Linguistic style is an integral component of language. Recent advances i...
research
11/09/2019

xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation

Every natural text is written in some style. The style is formed by a co...
research
04/28/2022

Investigating writing style as a contributor to gender gaps in science and technology

While universalism is a foundational principle of science, a growing str...
research
08/31/2019

(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas

Stylistic variation in text needs to be studied with different aspects i...
research
12/06/2021

Letter-level Online Writer Identification

Writer identification (writer-id), an important field in biometrics, aim...
research
08/23/2019

Neural Poetry: Learning to Generate Poems using Syllables

Motivated by the recent progresses on machine learning-based models that...
research
01/11/2016

The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks

We present a corpus-based analysis of the effects of age, gender and reg...

Please sign up or login with your details

Forgot password? Click here to reset