CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

10/07/2020
by   Younghoo Lee, et al.
0

Targeted phishing emails are on the rise and facilitate the theft of billions of dollars from organizations a year. While malicious signals from attached files or malicious URLs in emails can be detected by conventional malware signatures or machine learning technologies, it is challenging to identify hand-crafted social engineering emails which don't contain any malicious code and don't share word choices with known attacks. To tackle this problem, we fine-tune a pre-trained BERT model by replacing the half of Transformer blocks with simple adapters to efficiently learn sophisticated representations of the syntax and semantics of the natural language. Our Context-Aware network also learns the context representations between email's content and context features from email headers. Our CatBERT(Context-Aware Tiny Bert) achieves a 87 detection rate as compared to DistilBERT, LSTM, and logistic regression baselines which achieve 83 rates of 1 approaches and is resilient to adversarial attacks which deliberately replace keywords with typos or synonyms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2017

Context Aware Document Embedding

Recently, doc2vec has achieved excellent results in different tasks. In ...
research
10/26/2020

Fine-grained Information Status Classification Using Discourse Context-Aware BERT

Previous work on bridging anaphora recognition (Hou et al., 2013a) casts...
research
08/13/2019

Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention

Previous work on bridging anaphora recognition (Hou et al., 2013a) casts...
research
01/23/2019

Context-Sensitive Malicious Spelling Error Correction

Misspelled words of the malicious kind work by changing specific keyword...
research
12/06/2021

Context-Aware Transfer Attacks for Object Detection

Blackbox transfer attacks for image classifiers have been extensively st...
research
05/05/2023

GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

As the field of automated machine learning (AutoML) advances, it becomes...
research
12/15/2021

Tracing Text Provenance via Context-Aware Lexical Substitution

Text content created by humans or language models is often stolen or mis...

Please sign up or login with your details

Forgot password? Click here to reset