Jasmine: A New Active Learning Approach to Combat Cybercrime

08/13/2021
by   Jan Klein, et al.
0

Over the past decade, the advent of cybercrime has accelarated the research on cybersecurity. However, the deployment of intrusion detection methods falls short. One of the reasons for this is the lack of realistic evaluation datasets, which makes it a challenge to develop techniques and compare them. This is caused by the large amounts of effort it takes for a cyber analyst to classify network connections. This has raised the need for methods (i) that can learn from small sets of labeled data, (ii) that can make predictions on large sets of unlabeled data, and (iii) that request the label of only specially selected unlabeled data instances. Hence, Active Learning (AL) methods are of interest. These approaches choose specific unlabeled instances by a query function that are expected to improve overall classification performance. The resulting query observations are labeled by a human expert and added to the labeled set. In this paper, we propose a new hybrid AL method called Jasmine. Firstly, it determines how suitable each observation is for querying, i.e., how likely it is to enhance classification. These properties are the uncertainty score and anomaly score. Secondly, Jasmine introduces dynamic updating. This allows the model to adjust the balance between querying uncertain, anomalous and randomly selected observations. To this end, Jasmine is able to learn the best query strategy during the labeling process. This is in contrast to the other AL methods in cybersecurity that all have static, predetermined query functions. We show that dynamic updating, and therefore Jasmine, is able to consistently obtain good and more robust results than querying only uncertainties, only anomalies or a fixed combination of the two.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

TiDAL: Learning Training Dynamics for Active Learning

Active learning (AL) aims to select the most useful data samples from an...
research
07/22/2020

DEAL: Deep Evidential Active Learning for Image Classification

Convolutional Neural Networks (CNNs) have proven to be state-of-the-art ...
research
12/16/2021

ATM: An Uncertainty-aware Active Self-training Framework for Label-efficient Text Classification

Despite the great success of pre-trained language models (LMs) in many n...
research
04/08/2021

Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Deep learning (DL) relies on massive amounts of labeled data, and improv...
research
03/07/2019

Active Scene Learning

Sketch recognition allows natural and efficient interaction in pen-based...
research
02/24/2021

Active Learning to Classify Macromolecular Structures in situ for Less Supervision in Cryo-Electron Tomography

Motivation: Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool t...
research
06/10/2019

Human-Machine Collaboration for Fast Land Cover Mapping

We propose incorporating human labelers in a model fine-tuning system th...

Please sign up or login with your details

Forgot password? Click here to reset