On the Effectiveness of Log Representation for Log-based Anomaly Detection

08/17/2023
by   Xingfang Wu, et al.
0

Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.

READ FULL TEXT

page 26

page 31

research
05/25/2023

Impact of Log Parsing on Log-based Anomaly Detection

Software systems log massive amounts of data, recording important runtim...
research
07/31/2023

An Empirical Study on Log-based Anomaly Detection Using Machine Learning

The growth of systems complexity increases the need of automated techniq...
research
07/13/2021

Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection

Logs have been an imperative resource to ensure the reliability and cont...
research
08/18/2023

AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

The rapid progress of modern computing systems has led to a growing inte...
research
12/06/2021

UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks

UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks...
research
08/15/2023

LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis

Automated log analysis is crucial in modern software-intensive systems f...
research
04/22/2023

Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction

Due to the sheer size of software logs, developers rely on automated tec...

Please sign up or login with your details

Forgot password? Click here to reset