The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

07/12/2017
by   Georgi Karadjov, et al.
0

Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2019

Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?

Textual deception constitutes a major problem for online security. Many ...
research
05/16/2022

Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks

Discourse cohesion facilitates text comprehension and helps the reader f...
research
12/18/2022

Low-Resource Authorship Style Transfer with In-Context Learning

Authorship style transfer involves altering the style of text to match t...
research
06/03/2016

Learning Stylometric Representations for Authorship Analysis

Authorship analysis (AA) is the study of unveiling the hidden properties...
research
05/16/2018

Towards Robust and Privacy-preserving Text Representations

Written text often provides sufficient clues to identify the author, the...
research
11/07/2022

Contrastive Learning enhanced Author-Style Headline Generation

Headline generation is a task of generating an appropriate headline for ...
research
05/31/2019

Effective writing style imitation via combinatorial paraphrasing

Stylometry can be used to profile authors based on their written text. T...

Please sign up or login with your details

Forgot password? Click here to reset