Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

08/14/2020
by   Shilin He, et al.
0

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2018

Tools and Benchmarks for Automated Log Parsing

Logs are imperative in the development and maintenance process of many s...
research
01/31/2023

LogAI: A Library for Log Analytics and Intelligence

Software and System logs record runtime information about processes exec...
research
08/17/2023

Log Parsing Evaluation in the Era of Modern Software Systems

Due to the complexity and size of modern software systems, the amount of...
research
06/15/2023

A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems

The ability to monitor and interpret of hardware system events and behav...
research
10/24/2021

A Comprehensive Survey of Logging in Software: From Logging Statements Automation to Log Mining and Analysis

Logs are widely used to record runtime information of software systems, ...
research
11/12/2020

Goal-driven Command Recommendations for Analysts

Recent times have seen data analytics software applications become an in...

Please sign up or login with your details

Forgot password? Click here to reset