The Topic Confusion Task: A Novel Scenario for Authorship Attribution

04/17/2021
by   Malik H. Altakrori, et al.
7

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic configuration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models' confusion by the switch because the features capture the topics, and one caused by the features' inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n-grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n-gram features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2016

Domain Specific Author Attribution Based on Feedforward Neural Network Language Models

Authorship attribution refers to the task of automatically determining t...
research
02/17/2016

Authorship Attribution Using a Neural Network Language Model

In practice, training language models for individual authors is often ex...
research
08/22/2023

Can Authorship Representation Learning Capture Stylistic Features?

Automatically disentangling an author's style from the content of their ...
research
06/21/2022

TraSE: Towards Tackling Authorial Style from a Cognitive Science Perspective

Stylistic analysis of text is a key task in research areas ranging from ...
research
09/23/2022

Whodunit? Learning to Contrast for Authorship Attribution

Authorship attribution is the task of identifying the author of a given ...
research
06/10/2021

DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Cross-language authorship attribution problems rely on either translatio...
research
06/30/2021

O2D2: Out-Of-Distribution Detector to Capture Undecidable Trials in Authorship Verification

The PAN 2021 authorship verification (AV) challenge is part of a three-y...

Please sign up or login with your details

Forgot password? Click here to reset