Log2NS: Enhancing Deep Learning Based Analysis of Logs With Formal to Prevent Survivorship Bias

05/29/2021
by   Charanraj Thimmisetty, et al.
0

Analysis of large observational data sets generated by a reactive system is a common challenge in debugging system failures and determining their root cause. One of the major problems is that these observational data suffer from survivorship bias. Examples include analyzing traffic logs from networks, and simulation logs from circuit design. In such applications, users want to detect non-spurious correlations from observational data and obtain actionable insights about them. In this paper, we introduce log to Neuro-symbolic (Log2NS), a framework that combines probabilistic analysis from machine learning (ML) techniques on observational data with certainties derived from symbolic reasoning on an underlying formal model. We apply the proposed framework to network traffic debugging by employing the following steps. To detect patterns in network logs, we first generate global embedding vector representations of entities such as IP addresses, ports, and applications. Next, we represent large log flow entries as clusters that make it easier for the user to visualize and detect interesting scenarios that will be further analyzed. To generalize these patterns, Log2NS provides an ability to query from static logs and correlation engines for positive instances, as well as formal reasoning for negative and unseen instances. By combining the strengths of deep learning and symbolic methods, Log2NS provides a very powerful reasoning and debugging tool for log-based data. Empirical evaluations on a real internal data set demonstrate the capabilities of Log2NS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Logs are a common way to record detailed run-time information in softwar...
research
03/31/2023

CoSMo: a Framework for Implementing Conditioned Process Simulation Models

Process simulation is an analysis tool in process mining that allows use...
research
05/04/2018

Assessing Data Usefulness for Failure Analysis in Anonymized System Logs

System logs are a valuable source of information for the analysis and un...
research
06/28/2021

Revelio: ML-Generated Debugging Queries for Distributed Systems

A major difficulty in debugging distributed systems lies in manually det...
research
12/13/2019

From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)

This paper proposes a tentative and original survey of meeting points be...
research
12/30/2020

Leveraging User Access Patterns and Advanced Cyberinfrastructure to Accelerate Data Delivery from Shared-use Scientific Observatories

With the growing number and increasing availability of shared-use instru...
research
03/18/2022

Active Meta-Learner for Log Analysis

The analysis of logs is a vital activity undertaken for cyber investigat...

Please sign up or login with your details

Forgot password? Click here to reset