Bad Characters: Imperceptible NLP Attacks

06/18/2021
by   Nicholas Boucher, et al.
0

Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection – representing one invisible character, homoglyph, reordering, or deletion – an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.

READ FULL TEXT

page 1

page 7

research
06/12/2023

When Vision Fails: Text Attacks Against ViT and OCR

While text-based machine learning models that operate on visual inputs o...
research
08/16/2020

TextDecepter: Hard Label Black Box Attack on Text Classifiers

Machine learning has been proven to be susceptible to carefully crafted ...
research
05/24/2016

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Many machine learning models are vulnerable to adversarial examples: inp...
research
02/15/2018

Fooling OCR Systems with Adversarial Text Images

We demonstrate that state-of-the-art optical character recognition (OCR)...
research
05/24/2023

How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

Natural Language Processing (NLP) models based on Machine Learning (ML) ...
research
02/10/2021

Dompteur: Taming Audio Adversarial Examples

Adversarial examples seem to be inevitable. These specifically crafted i...
research
05/27/2023

Backdooring Neural Code Search

Reusing off-the-shelf code snippets from online repositories is a common...

Please sign up or login with your details

Forgot password? Click here to reset