A Big Data Architecture for Log Data Storage and Analysis

12/01/2018
by   Swapneel Mehta, et al.
0

We propose an architecture for analysing database connection logs across different instances of databases within an intranet comprising over 10,000 users and associated devices. Our system uses Flume agents to send notifications to a Hadoop Distributed File System for long-term storage and ElasticSearch and Kibana for short-term visualisation, effectively creating a data lake for the extraction of log data. We adopt machine learning models with an ensemble of approaches to filter and process the indicators within the data and aim to predict anomalies or outliers using feature vectors built from this log data.

READ FULL TEXT
research
12/01/2018

Anomaly Detection for Network Connection Logs

We leverage a streaming architecture based on ELK, Spark and Hadoop in o...
research
11/06/2019

Reducing Honeypot Log Storage Capacity Consumption – Cron Job with Perl-Script Approach

Honeypot is a decoy computer system that is used to attract and monitor ...
research
02/11/2022

A Scalable Database for the Storage of Object-Centric Event Logs

Object-centric process mining provides a set of techniques for the analy...
research
10/20/2022

Dude, where's my NFT? Distributed Infrastructures for Digital Art

We explore issues relating to the storage of digital art, based on an em...
research
09/24/2019

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

System logs record detailed runtime information of software systems and ...
research
06/17/2021

Pre-treatment of outliers and anomalies in plant data: Methodology and case study of a Vacuum Distillation Unit

Data pre-treatment plays a significant role in improving data quality, t...
research
06/04/2020

Towards Long-term and Archivable Reproducibility

Analysis pipelines commonly use high-level technologies that are popular...

Please sign up or login with your details

Forgot password? Click here to reset