Towards a better labeling process for network security datasets

05/02/2023
by   Sebastian Garcia, et al.
0

Most network security datasets do not have comprehensive label assignment criteria, hindering the evaluation of the datasets, the training of models, the results obtained, the comparison with other methods, and the evaluation in real-life scenarios. There is no labeling ontology nor tools to help assign the labels, resulting in most analyzed datasets assigning labels in files or directory names. This paper addresses the problem of having a better labeling process by (i) reviewing the needs of stakeholders of the datasets, from creators to model users, (ii) presenting a new ontology of label assignment, (iii) presenting a new tool for assigning structured labels for Zeek network flows based on the ontology, and (iv) studying the differences between generating labels and consuming labels in real-life scenarios. We conclude that a process for structured label assignment is paramount for advancing research in network security and that the new ontology-based label assignation rules should be published as an artifact of every dataset.

READ FULL TEXT
research
05/24/2018

An experimental comparison of label selection methods for hierarchical document clusters

The focus of this paper is on the evaluation of sixteen labeling methods...
research
04/24/2019

Unsupervised Assignment Flow: Label Learning on Feature Manifolds by Spatially Regularized Geometric Assignment

This paper introduces the unsupervised assignment flow that couples the ...
research
08/17/2016

Scene Labeling Through Knowledge-Based Rules Employing Constrained Integer Linear Programing

Scene labeling task is to segment the image into meaningful regions and ...
research
12/10/2016

FOCA: A Methodology for Ontology Evaluation

Modeling an ontology is a hard and time-consuming task. Although methodo...
research
10/12/2021

Datasets are not Enough: Challenges in Labeling Network Traffic

In contrast to previous surveys, the present work is not focused on revi...
research
01/27/2020

An Ontology-Aware Framework for Audio Event Classification

Recent advancements in audio event classification often ignore the struc...
research
09/24/2022

TransPOS: Transformers for Consolidating Different POS Tagset Datasets

In hope of expanding training data, researchers often want to merge two ...

Please sign up or login with your details

Forgot password? Click here to reset