A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

09/06/2023
by   Max Landauer, et al.
0

Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.

READ FULL TEXT

page 1

page 6

page 14

research
01/07/2021

Detecting Log Anomalies with Multi-Head Attention (LAMA)

Anomaly detection is a crucial and challenging subject that has been stu...
research
12/01/2019

An Anomaly Contribution Explainer for Cyber-Security Applications

In this paper, we introduce Anomaly Contribution Explainer or ACE, a too...
research
11/20/2019

Log Message Anomaly Detection and Classification Using Auto-B/LSTM and Auto-GRU

Log messages are now widely used in software systems. They are important...
research
07/07/2023

CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method

Anomaly detection based on system logs plays an important role in intell...
research
03/23/2020

A method to identify geochemical mineralization on linear transect

Mineral exploration in biogeochemistry is related to the detection of an...
research
07/08/2022

Deep Learning for Anomaly Detection in Log Data: A Survey

Automatic log file analysis enables early detection of relevant incident...
research
12/09/2019

Oversampling Log Messages Using a Sequence Generative Adversarial Network for Anomaly Detection and Classification

Dealing with imbalanced data is one the main challenges in machine/deep ...

Please sign up or login with your details

Forgot password? Click here to reset