Query Log Compression for Workload Analytics

09/02/2018
by   Ting Xie, et al.
0

Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable use for many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of "pattern" and "pattern mixture" log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.

READ FULL TEXT
research
11/11/2020

Comprehensive and Efficient Workload Compression

This work studies the problem of constructing a representative workload ...
research
01/31/2023

LogAI: A Library for Log Analytics and Intelligence

Software and System logs record runtime information about processes exec...
research
09/24/2019

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

System logs record detailed runtime information of software systems and ...
research
04/10/2023

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

Many services today massively and continuously produce log files of diff...
research
11/01/2018

Defining a Metric Space of Host Logs and Operational Use Cases

Host logs, in particular, Windows Event Logs, are a valuable source of i...
research
01/17/2018

Query2Vec: NLP Meets Databases for Generalized Workload Analytics

We propose methods for learning vector representations of SQL workloads ...
research
09/18/2023

LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data

Log data is a crucial resource for recording system events and states du...

Please sign up or login with your details

Forgot password? Click here to reset