Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

10/19/2022
by   Adaku Uchendu, et al.
0

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality open-ended texts (so-called "neural texts"), one has to now consider authorships by humans, machines, or their combination. Due to the implications and potential threats of neural texts when used maliciously, it has become critical to understand the limitations of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution and obfuscation of neural text authorship from a Data Mining perspective, and share our view on their limitations and promising research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2018

Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?

Given the many technical, social, and policy shifts in access to scholar...
research
12/17/2019

Open Set Authorship Attribution toward Demystifying Victorian Periodicals

Existing research in computational authorship attribution (AA) has prima...
research
05/03/2014

Automated Attribution and Intertextual Analysis

In this work, we employ quantitative methods from the realm of statistic...
research
05/02/2020

A Girl Has A Name: Detecting Authorship Obfuscation

Authorship attribution aims to identify the author of a text based on th...
research
12/19/2022

Unsigned Play by Milan Kundera? An Authorship Attribution Study

In addition to being a widely recognised novelist, Milan Kundera has als...
research
05/02/2018

SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining

Text mining and information retrieval techniques have been developed to ...
research
09/07/2022

The art of algorithmic guessing in

The technique of guessing can be very fruitful when dealing with sequenc...

Please sign up or login with your details

Forgot password? Click here to reset