Defining a Metric Space of Host Logs and Operational Use Cases

11/01/2018
by   Miki E. Verma, et al.
0

Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.

READ FULL TEXT

page 1

page 9

research
04/25/2022

Topological Data Analysis for Anomaly Detection in Host-Based Logs

Topological Data Analysis (TDA) gives practioners the ability to analyse...
research
11/22/2020

Time series classification for predictive maintenance on event logs

Time series classification (TSC) gained a lot of attention in the past d...
research
12/05/2022

AMORETTO: A Method for Deriving IoT-enriched Event Logs

Process analytics aims to gain insights into the behaviour and performan...
research
10/15/2019

Automated Ransomware Behavior Analysis: Pattern Extraction and Early Detection

Security operation centers (SOCs) typically use a variety of tools to co...
research
07/17/2018

User Manual for the Apple CoreCapture Framework

CoreCapture is Apple's primary logging and tracing framework for IEEE 80...
research
10/11/2022

Client Error Clustering Approaches in Content Delivery Networks (CDN)

Content delivery networks (CDNs) are the backbone of the Internet and ar...
research
09/02/2018

Query Log Compression for Workload Analytics

Analyzing database access logs is a key part of performance tuning, intr...

Please sign up or login with your details

Forgot password? Click here to reset