Making Use of NXt to Nothing: The Effect of Class Imbalances on DGA Detection Classifiers

07/01/2020
by   Arthur Drichel, et al.
0

Numerous machine learning classifiers have been proposed for binary classification of domain names as either benign or malicious, and even for multiclass classification to identify the domain generation algorithm (DGA) that generated a specific domain name. Both classification tasks have to deal with the class imbalance problem of strongly varying amounts of training samples per DGA. Currently, it is unclear whether the inclusion of DGAs for which only a few samples are known to the training sets is beneficial or harmful to the overall performance of the classifiers. In this paper, we perform a comprehensive analysis of various contextless DGA classifiers, which reveals the high value of a few training samples per class for both classification tasks. We demonstrate that the classifiers are able to detect various DGAs with high probability by including the underrepresented classes which were previously hardly recognizable. Simultaneously, we show that the classifiers' detection capabilities of well represented classes do not decrease.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2019

Binary Classification using Pairs of Minimum Spanning Trees or N-ary Trees

One-class classifiers are trained with target class only samples. Intuit...
research
05/30/2022

Detecting Unknown DGAs without Context Information

New malware emerges at a rapid pace and often incorporates Domain Genera...
research
08/29/2018

Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers

Classification tasks usually assume that all possible classes are presen...
research
08/22/2018

Controversy Rules - Discovering Regions Where Classifiers (Dis-)Agree Exceptionally

Finding regions for which there is higher controversy among different cl...
research
12/11/2017

Identifying the Mislabeled Training Samples of ECG Signals using Machine Learning

The classification accuracy of electrocardiogram signal is often affecte...
research
12/09/2020

Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity

Few-shot classifiers excel under limited training samples, making it use...
research
05/21/2020

Global Multiclass Classification from Heterogeneous Local Models

Multiclass classification problems are most often solved by either train...

Please sign up or login with your details

Forgot password? Click here to reset