Accelerating System Log Processing by Semi-supervised Learning: A Technical Report

10/29/2018
by   Guofu Li, et al.
0

There is an increasing need for more automated system-log analysis tools for large scale online system in a timely manner. However, conventional way to monitor and classify the log output based on keyword list does not scale well for complex system in which codes contributed by a large group of developers, with diverse ways of encoding the error messages, often with misleading pre-set labels. In this paper, we propose that the design of a large scale online log analysis should follow the "Least Prior Knowledge Principle", in which unsupervised or semi-supervised solution with the minimal prior knowledge of the log should be encoded directly. Thereby, we report our experience in designing a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Tests on empirical data show that our method has obvious advantage regarding to the processing speed and classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2021

HIERMATCH: Leveraging Label Hierarchies for Improving Semi-Supervised Learning

Semi-supervised learning approaches have emerged as an active area of re...
research
05/02/2019

Billion-scale semi-supervised learning for image classification

This paper presents a study of semi-supervised learning with large convo...
research
08/17/2023

Are They All Good? Studying Practitioners' Expectations on the Readability of Log Messages

Developers write logging statements to generate logs that provide run-ti...
research
02/14/2022

UniParser: A Unified Log Parser for Heterogeneous Log Data

Logs provide first-hand information for engineers to diagnose failures i...
research
05/16/2014

Classification using log Gaussian Cox processes

McCullagh and Yang (2006) suggest a family of classification algorithms ...
research
08/03/2016

Fuzzy-based Propagation of Prior Knowledge to Improve Large-Scale Image Analysis Pipelines

Many automatically analyzable scientific questions are well-posed and of...

Please sign up or login with your details

Forgot password? Click here to reset