Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present the TextFooler, a general attack framework, to generate natural adversarial texts. By successfully applying it to two fundamental natural language tasks, text classification and textual entailment, against various target models, convolutional and recurrent neural networks as well as the most powerful pre-trained BERT, we demonstrate the advantages of this framework in three ways: (i) effective---it outperforms state-of-the-art attacks in terms of success rate and perturbation rate; (ii) utility-preserving---it preserves semantic content and grammaticality, and remains correctly classified by humans; and (iii) efficient---it generates adversarial text with computational complexity linear in the text length.
READ FULL TEXT