Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

04/10/2023
by   Igor Cherepanov, et al.
0

Many services today massively and continuously produce log files of different and varying formats. These logs are important since they contain information about the application activities, which is necessary for improvements by analyzing the behavior and maintaining the security and stability of the system. It is a common practice to store log files in a compressed form to reduce the sheer size of these files. A compression algorithm identifies frequent patterns in a log file to remove redundant information. This work presents an approach to detect frequent patterns in textual data that can be simultaneously registered during the file compression process with low consumption of resources. The log file can be visualized with the possibility to explore the extracted patterns using metrics based on such properties as frequency, length and root prefixes of the acquired pattern. This allows an analyst to gain the relevant insights more efficiently reducing the need for manual labor-intensive inspection in the log data. The extension of the implemented dictionary-based compression algorithm has the advantage of recognizing patterns in log files of any format and eliminates the need to manually perform preparation for any preprocessing of log files.

READ FULL TEXT
research
05/18/2018

Extending Dynamic Bayesian Networks for Anomaly Detection in Complex Logs

Checking various log files from different processes can be a tedious tas...
research
08/01/2019

A compression based framework for the detection of anomalies in heterogeneous data sources

Nowadays, information and communications technology systems are fundamen...
research
11/27/2018

Wrangling Messy CSV Files by Detecting Row and Type Patterns

It is well known that data scientists spend the majority of their time o...
research
05/17/2019

Parallel decompression of gzip-compressed files and random access to DNA sequences

Decompressing a file made by the gzip program at an arbitrary location i...
research
11/06/2019

Reducing Honeypot Log Storage Capacity Consumption – Cron Job with Perl-Script Approach

Honeypot is a decoy computer system that is used to attract and monitor ...
research
02/13/2021

Discrete Cosine Transform in JPEG Compression

Image Compression has become an absolute necessity in today's day and ag...
research
09/02/2018

Query Log Compression for Workload Analytics

Analyzing database access logs is a key part of performance tuning, intr...

Please sign up or login with your details

Forgot password? Click here to reset