DeepAI AI Chat
Log In Sign Up

Accelerating System Log Processing by Semi-supervised Learning: A Technical Report

by   Guofu Li, et al.

There is an increasing need for more automated system-log analysis tools for large scale online system in a timely manner. However, conventional way to monitor and classify the log output based on keyword list does not scale well for complex system in which codes contributed by a large group of developers, with diverse ways of encoding the error messages, often with misleading pre-set labels. In this paper, we propose that the design of a large scale online log analysis should follow the "Least Prior Knowledge Principle", in which unsupervised or semi-supervised solution with the minimal prior knowledge of the log should be encoded directly. Thereby, we report our experience in designing a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Tests on empirical data show that our method has obvious advantage regarding to the processing speed and classification accuracy.


page 1

page 2

page 3

page 4


HIERMATCH: Leveraging Label Hierarchies for Improving Semi-Supervised Learning

Semi-supervised learning approaches have emerged as an active area of re...

Billion-scale semi-supervised learning for image classification

This paper presents a study of semi-supervised learning with large convo...

Are They All Good? Studying Practitioners' Expectations on the Readability of Log Messages

Developers write logging statements to generate logs that provide run-ti...

UniParser: A Unified Log Parser for Heterogeneous Log Data

Logs provide first-hand information for engineers to diagnose failures i...

Classification using log Gaussian Cox processes

McCullagh and Yang (2006) suggest a family of classification algorithms ...

Fuzzy-based Propagation of Prior Knowledge to Improve Large-Scale Image Analysis Pipelines

Many automatically analyzable scientific questions are well-posed and of...