Data-Driven Mitigation of Adversarial Text Perturbation

02/19/2022
by   Rasika Bhalerao, et al.
0

Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text perturbations. In this paper, we classify intentional and unintentional adversarial text perturbation into ten types and propose a deobfuscation pipeline to make NLP models robust to such perturbations. We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words. We show that CW2V embeddings are generally more robust to text perturbations than embeddings based on character ngrams. Our robust classification pipeline combines deobfuscation and classification, using proposed defense methods and word embeddings to classify whether Facebook posts are requesting engagement such as likes. Our pipeline results in engagement bait classification that goes from 0.70 to 0.67 AUC with adversarial text perturbation, while character ngram-based word embedding methods result in downstream classification that goes from 0.76 to 0.64.

READ FULL TEXT
research
06/10/2016

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the ma...
research
05/08/2018

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Following great success in the image processing field, the idea of adver...
research
05/08/2021

Certified Robustness to Text Adversarial Attacks by Randomized [MASK]

Recently, few certified defense methods have been developed to provably ...
research
05/30/2019

Interpretable Adversarial Training for Text

Generating high-quality and interpretable adversarial examples in the te...
research
07/25/2020

Effect of Text Processing Steps on Twitter Sentiment Classification using Word Embedding

Processing of raw text is the crucial first step in text classification ...
research
08/14/2019

On the Robustness of Projection Neural Networks For Efficient Text Representation: An Empirical Study

Recently, there has been strong interest in developing natural language ...
research
08/27/2021

Deep learning models are not robust against noise in clinical text

Artificial Intelligence (AI) systems are attracting increasing interest ...

Please sign up or login with your details

Forgot password? Click here to reset