Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings

11/21/2018
by   Joewie J. Koh, et al.
0

Domain generation algorithms (DGAs) are frequently employed by malware to generate domains used for connecting to command-and-control (C2) servers. Recent work in DGA detection leveraged deep learning architectures like convolutional neural networks (CNNs) and character-level long short-term memory networks (LSTMs) to classify domains. However, these classifiers perform poorly with wordlist-based DGA families, which generate domains by pseudorandomly concatenating dictionary words. We propose a novel approach that combines context-sensitive word embeddings with a simple fully-connected classifier to perform classification of domains based on word-level information. The word embeddings were pre-trained on a large unrelated corpus and left frozen during the training on domain data. The resulting small number of trainable parameters enabled extremely short training durations, while the transfer of language knowledge stored in the representations allowed for high-performing models with small training datasets. We show that this architecture reliably outperformed existing techniques on wordlist-based DGA families with just 30 DGA training examples and achieved state-of-the-art performance with around 100 DGA training examples, all while requiring an order of magnitude less time to train compared to current techniques. Of special note is the technique's performance on the matsnu DGA: the classifier attained a 89.5 positive rate (FPR) after training on only 30 examples of the DGA domains, and a 91.2 some of these DGAs have wordlists of several hundred words, our results demonstrate that this technique does not rely on the classifier learning the DGA wordlists. Instead, the classifier is able to learn the semantic signatures of the wordlist-based DGA families.

READ FULL TEXT
research
05/10/2018

Learning Domain-Sensitive and Sentiment-Aware Word Embeddings

Word embeddings have been widely used in sentiment classification becaus...
research
03/03/2021

Malware Classification Using Long Short-Term Memory Models

Signature and anomaly based techniques are the quintessential approaches...
research
04/20/2017

BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs

In this paper we describe our attempt at producing a state-of-the-art Tw...
research
11/02/2016

Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

Various families of malware use domain generation algorithms (DGAs) to g...
research
07/22/2020

IITK at the FinSim Task: Hypernym Detection in Financial Domain via Context-Free and Contextualized Word Embeddings

In this paper, we present our approaches for the FinSim 2020 shared task...
research
09/21/2020

Domain-Embeddings Based DGA Detection with Incremental Training Method

DGA-based botnet, which uses Domain Generation Algorithms (DGAs) to evad...
research
04/27/2023

Analyzing Vietnamese Legal Questions Using Deep Neural Networks with Biaffine Classifiers

In this paper, we propose using deep neural networks to extract importan...

Please sign up or login with your details

Forgot password? Click here to reset