Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection

07/13/2021
by   Sergio Rojas-Galeano, et al.
0

One of the stratagems used to deceive spam filters is to substitute vocables with synonyms or similar words that turn the message unrecognisable by the detection algorithms. In this paper we investigate whether the recent development of language models sensitive to the semantics and context of words, such as Google's BERT, may be useful to overcome this adversarial attack (called "Mad-lib" as per the word substitution game). Using a dataset of 5572 SMS spam messages, we first established a baseline of detection performance using widely known document representation models (BoW and TFIDF) and the novel BERT model, coupled with a variety of classification algorithms (Decision Tree, kNN, SVM, Logistic Regression, Naive Bayes, Multilayer Perceptron). Then, we built a thesaurus of the vocabulary contained in these messages, and set up a Mad-lib attack experiment in which we modified each message of a held out subset of data (not used in the baseline experiment) with different rates of substitution of original words with synonyms from the thesaurus. Lastly, we evaluated the detection performance of the three representation models (BoW, TFIDF and BERT) coupled with the best classifier from the baseline experiment (SVM). We found that the classic models achieved a 94 in the original dataset, whereas the BERT model obtained 96 hand, the Mad-lib attack experiment showed that BERT encodings manage to maintain a similar BA performance of 96 1.82 words per message, and 95 contrast, the BA performance of the BoW and TFIDF encoders dropped to chance. These results hint at the potential advantage of BERT models to combat these type of ingenious attacks, offsetting to some extent for the inappropriate use of semantic relationships in language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2021

Abuse is Contextual, What about NLP? The Role of Context in Abusive Language Annotation and Detection

The datasets most widely used for abusive language detection contain lis...
research
04/21/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Adversarial attacks for discrete data (such as text) has been proved sig...
research
04/20/2021

UIT-ISE-NLP at SemEval-2021 Task 5: Toxic Spans Detection with BiLSTM-CRF and Toxic Bert Comment Classification

We present our works on SemEval-2021 Task 5 about Toxic Spans Detection....
research
10/24/2022

Explaining Translationese: why are Neural Classifiers Better and what do they Learn?

Recent work has shown that neural feature- and representation-learning, ...
research
10/31/2018

A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters

Naive Bayes spam filters are highly susceptible to data poisoning attack...

Please sign up or login with your details

Forgot password? Click here to reset