Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

09/24/2019
by   Jinyang Liu, et al.
0

System logs record detailed runtime information of software systems and are used as the main data source for many tasks around software engineering. As modern software systems are evolving into large scale and complex structures, logs have become one type of fast-growing big data in industry. In particular, such logs often need to be stored for a long time in practice (e.g., a year), in order to analyze recurrent problems or track security issues. However, archiving logs consumes a large amount of storage space and computing resources, which in turn incurs high operational cost. Data compression is essential to reduce the cost of log storage. Traditional compression tools (e.g., gzip) work well for general texts, but are not tailed for system logs. In this paper, we propose a novel and effective log compression method, namely logzip. Logzip is capable of extracting hidden structures from raw logs via fast iterative clustering and further generating coherent intermediate representations that allow for more effective compression. We evaluate logzip on five large log datasets of different system types, with a total of 63.6 GB in size. The results show that logzip can save about half of the storage space on average over traditional compression tools. Meanwhile, the design of logzip is highly parallel and only incurs negligible overhead. In addition, we share our industrial experience of applying logzip to Huawei's real products.

READ FULL TEXT
research
09/15/2020

A Survey on Automated Log Analysis for Reliability Engineering

Logs are semi-structured text generated by logging statements in softwar...
research
08/18/2021

What Distributed Systems Say: A Study of Seven Spark Application Logs

Execution logs are a crucial medium as they record runtime information o...
research
03/11/2021

Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Logs are a common way to record detailed run-time information in softwar...
research
09/18/2023

LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data

Log data is a crucial resource for recording system events and states du...
research
08/10/2023

Accountability of Things: Large-Scale Tamper-Evident Logging for Smart Devices

Our modern world relies on a growing number of interconnected and intera...
research
09/02/2018

Query Log Compression for Workload Analytics

Analyzing database access logs is a key part of performance tuning, intr...
research
12/01/2018

A Big Data Architecture for Log Data Storage and Analysis

We propose an architecture for analysing database connection logs across...

Please sign up or login with your details

Forgot password? Click here to reset