An Improved Topic Masking Technique for Authorship Analysis

05/02/2020
by   Oren Halvani, et al.
0

Authorship verification (AV) is an important sub-area of digital text forensics and has been researched for more than two decades. The fundamental question addressed by AV is whether two documents were written by the same person. A serious problem that has received little attention in the literature so far is the question if AV methods actually focus on the writing style during classification, or whether they are unintentionally distorted by the topic of the documents. To counteract this problem, we propose an effective technique called POSNoise, which aims to mask topic-related content in documents. In this way, AV methods are forced to focus on those text units that are more related to the author's writing style. Based on a comprehensive evaluation with eight existing AV methods applied to eight corpora, we demonstrate that POSNoise is able to outperform a well-known topic masking approach in 51 out of 64 cases with up to 12.5 for corpora preprocessed with POSNoise, the AV methods examined often achieve higher accuracies (improvement of up to 20.6 corpora.

READ FULL TEXT
research
06/22/2020

A Step Towards Interpretable Authorship Verification

A central problem that has been researched for many years in the field o...
research
05/09/2012

Multilingual Topic Models for Unaligned Text

We develop the multilingual topic model for unaligned text (MuTo), a pro...
research
06/24/2019

Assessing the Applicability of Authorship Verification Methods

Authorship verification (AV) is a research subject in the field of digit...
research
12/31/2018

Unary and Binary Classification Approaches and their Implications for Authorship Verification

Retrieving indexed documents, not by their topical content but their wri...
research
05/29/2020

The Importance of Suppressing Domain Style in Authorship Analysis

The prerequisite of many approaches to authorship analysis is a represen...
research
09/22/2021

Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

We explore Boccaccio's Decameron to see how digital humanities tools can...
research
02/23/2021

Assessing the Readability of Policy Documents on the Digital Single Market of the European Union

Today, literature skills are necessary. Engineering and other technical ...

Please sign up or login with your details

Forgot password? Click here to reset