UPTON: Unattributable Authorship Text via Data Poisoning

11/17/2022
by   Ziyao Wang, et al.
0

In online medium such as opinion column in Bloomberg, The Guardian and Western Journal, aspiring writers post their writings for various reasons with their names often proudly open. However, it may occur that such a writer wants to write in other venues anonymously or under a pseudonym (e.g., activist, whistle-blower). However, if an attacker has already built an accurate authorship attribution (AA) model based off of the writings from such platforms, attributing an anonymous writing to the known authorship is possible. Therefore, in this work, we ask a question "can one make the writings and texts, T, in the open spaces such as opinion sharing platforms unattributable so that AA models trained from T cannot attribute authorship well?" Toward this question, we present a novel solution, UPTON, that exploits textual data poisoning method to disturb the training process of AA models. UPTON uses data poisoning to destroy the authorship feature only in training samples by perturbing them, and try to make released textual data unlearnable on deep neuron networks. It is different from previous obfuscation works, that use adversarial attack to modify the test samples and mislead an AA model, and also the backdoor works, which use trigger words both in test and training samples and only change the model output when trigger words occur. Using four authorship datasets (e.g., IMDb10, IMDb64, Enron and WJO), then, we present empirical validation where: (1)UPTON is able to downgrade the test accuracy to about 30 is able to preserve most of the original semantics. The BERTSCORE between the clean and UPTON poisoned texts are higher than 0.95. The number is very closed to 1.00, which means no sematic change. (3)UPTON is also robust towards spelling correction systems.

READ FULL TEXT
research
01/05/2023

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack

We propose a stealthy and powerful backdoor attack on neural networks ba...
research
11/12/2022

Generating Textual Adversaries with Minimal Perturbation

Many word-level adversarial attack approaches for textual data have been...
research
06/25/2020

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Social media networks and chatting platforms often use an informal versi...
research
09/02/2023

Domain Generalization via Balancing Training Difficulty and Model Capability

Domain generalization (DG) aims to learn domain-generalizable models fro...
research
08/24/2022

PEER: A Collaborative Language Model

Textual content is often the output of a collaborative writing process: ...
research
09/22/2021

BFClass: A Backdoor-free Text Classification Framework

Backdoor attack introduces artificial vulnerabilities into the model by ...
research
08/13/2019

Improving Generalization in Coreference Resolution via Adversarial Training

In order for coreference resolution systems to be useful in practice, th...

Please sign up or login with your details

Forgot password? Click here to reset