Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense

03/19/2022
by   Thai Le, et al.
0

We proposes a novel algorithm, ANTHRO, that inductively extracts over 600K human-written text perturbations in the wild and leverages them for realistic adversarial attack. Unlike existing character-based attacks which often deductively hypothesize a set of manipulation strategies, our work is grounded on actual observations from real-world texts. We find that adversarial texts generated by ANTHRO achieve the best trade-off between (1) attack success rate, (2) semantic preservation of the original text, and (3) stealthiness–i.e. indistinguishable from human writings hence harder to be flagged as suspicious. Specifically, our attacks accomplished around 83 on BERT and RoBERTa, respectively. Moreover, it outperformed the TextBugger baseline with an increase of 50 stealthiness when evaluated by both layperson and professional human workers. ANTHRO can further enhance a BERT classifier's performance in understanding different variations of human-written toxic texts via adversarial training when compared to the Perspective API.

READ FULL TEXT
research
03/18/2023

NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online

Online texts with toxic content are a threat in social media that might ...
research
01/16/2023

CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

User-generated textual contents on the Internet are often noisy, erroneo...
research
05/03/2022

SemAttack: Natural Textual Attacks via Different Semantic Spaces

Recent studies show that pre-trained language models (LMs) are vulnerabl...
research
11/02/2021

HydraText: Multi-objective Optimization for Adversarial Textual Attack

The field of adversarial textual attack has significantly grown over the...
research
12/13/2018

TextBugger: Generating Adversarial Text Against Real-world Applications

Deep Learning-based Text Understanding (DLTU) is the backbone technique ...
research
06/02/2021

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...
research
01/27/2021

Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

Written language contains stylistic cues that can be exploited to automa...

Please sign up or login with your details

Forgot password? Click here to reset