DeepAI AI Chat
Log In Sign Up

Towards Evaluating the Robustness of Chinese BERT Classifiers

by   Boxin Wang, et al.

Recent advances in large-scale language representation models such as BERT have improved the state-of-the-art performances in many NLP tasks. Meanwhile, character-level Chinese NLP models, including BERT for Chinese, have also demonstrated that they can outperform the existing models. In this paper, we show that, however, such BERT-based models are vulnerable under character-level adversarial attacks. We propose a novel Chinese char-level attack method against BERT-based classifiers. Essentially, we generate "small" perturbation on the character level in the embedding space and guide the character substitution procedure. Extensive experiments show that the classification accuracy on a Chinese news dataset drops from 91.8 than 2 characters on average based on the proposed attack. Human evaluations also confirm that our generated Chinese adversarial examples barely affect human performance on these NLP tasks.


page 1

page 2

page 3

page 4


Glyph-aware Embedding of Chinese Characters

Given the advantage and recent success of English character-level and su...

Expanding Scope: Adapting English Adversarial Attacks to Chinese

Recent studies have revealed that NLP predictive models are vulnerable t...

FireBERT: Hardening BERT-based classifiers against adversarial attack

We present FireBERT, a set of three proof-of-concept NLP classifiers har...

EFSG: Evolutionary Fooling Sentences Generator

Large pre-trained language representation models (LMs) have recently col...

SemAttack: Natural Textual Attacks via Different Semantic Spaces

Recent studies show that pre-trained language models (LMs) are vulnerabl...

Enhancing Model Robustness By Incorporating Adversarial Knowledge Into Semantic Representation

Despite that deep neural networks (DNNs) have achieved enormous success ...