Can Authorship Representation Learning Capture Stylistic Features?

08/22/2023
by   Andrew Wang, et al.
0

Automatically disentangling an author's style from the content of their writing is a longstanding and possibly insurmountable problem in computational linguistics. At the same time, the availability of large text corpora furnished with author labels has recently enabled learning authorship representations in a purely data-driven manner for authorship attribution, a task that ostensibly depends to a greater extent on encoding writing style than encoding content. However, success on this surrogate task does not ensure that such representations capture writing style since authorship could also be correlated with other latent variables, such as topic. In an effort to better understand the nature of the information these representations convey, and specifically to validate the hypothesis that they chiefly encode writing style, we systematically probe these representations through a series of targeted experiments. The results of these experiments suggest that representations learned for the surrogate authorship prediction task are indeed sensitive to writing style. As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time. Additionally, our findings may open the door to downstream applications that require stylistic representations, such as style transfer.

READ FULL TEXT
research
04/11/2022

Same Author or Just Same Topic? Towards Content-Independent Style Representations

Linguistic style is an integral component of language. Recent advances i...
research
05/22/2023

Learning Interpretable Style Embeddings via Prompting LLMs

Style representation learning builds content-independent representations...
research
02/24/2019

Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?

Textual deception constitutes a major problem for online security. Many ...
research
05/29/2020

The Importance of Suppressing Domain Style in Authorship Analysis

The prerequisite of many approaches to authorship analysis is a represen...
research
04/17/2021

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Authorship attribution is the problem of identifying the most plausible ...
research
09/12/2019

Style-aware Neural Model with Application in Authorship Attribution

Writing style is a combination of consistent decisions associated with a...
research
11/07/2022

Contrastive Learning enhanced Author-Style Headline Generation

Headline generation is a task of generating an appropriate headline for ...

Please sign up or login with your details

Forgot password? Click here to reset