Towards Reproducible Network Traffic Analysis

03/23/2022
by   Jordan Holland, et al.
0

Analysis techniques are critical for gaining insight into network traffic given both the higher proportion of encrypted traffic and increasing data rates. Unfortunately, the domain of network traffic analysis suffers from a lack of standardization, leading to incomparable results and barriers to reproducibility. Unlike other disciplines, no standard dataset format exists, forcing researchers and practitioners to create bespoke analysis pipelines for each individual task. Without standardization researchers cannot compare "apples-to-apples", preventing us from knowing with certainty if a new technique represents a methodological advancement or if it simply benefits from a different interpretation of a given dataset. In this work, we examine irreproducibility that arises from the lack of standardization in network traffic analysis. First, we study the literature, highlighting evidence of irreproducible research based on different interpretations of popular public datasets. Next, we investigate the underlying issues that have lead to the status quo and prevent reproducible research. Third, we outline the standardization requirements that any solution aiming to fix reproducibility issues must address. We then introduce pcapML, an open source system which increases reproducibility of network traffic analysis research by enabling metadata information to be directly encoded into raw traffic captures in a generic manner. Finally, we use the standardization pcapML provides to create the pcapML benchmarks, an open source leaderboard website and repository built to track the progress of network traffic analysis methods.

READ FULL TEXT

page 4

page 5

research
05/31/2022

Computational Reproducibility Within Prognostics and Health Management

Scientific research frequently involves the use of computational tools a...
research
05/26/2023

Cluster Analysis of Open Research Data and a Case for Replication Metadata

Research data are often released upon journal publication to enable resu...
research
12/15/2017

Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network Traffic

In criminal investigations, telecommunication wiretaps have become a com...
research
12/27/2018

We all do better when we work together?

This paper evaluates the impact of a RD signal on traffic crossing the m...
research
07/01/2019

An Open Source AutoML Benchmark

In recent years, an active field of research has developed around automa...
research
06/10/2020

Evaluating Graph Vulnerability and Robustness using TIGER

The study of network robustness is a critical tool in the characterizati...
research
10/28/2022

Measuring the Confidence of Traffic Forecasting Models: Techniques, Experimental Comparison and Guidelines towards Their Actionability

The estimation of the amount of uncertainty featured by predictive machi...

Please sign up or login with your details

Forgot password? Click here to reset