Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

02/23/2021
by   Harold Ott, et al.
0

Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users that communicate, compute, and store information. Therefore, timely and accurate anomaly detection is necessary for reliability, security, safe operation, and mitigation of losses in these increasingly important systems. Recently, the evolution of the software industry opens up several problems that need to be tackled including (1) addressing the software evolution due software upgrades, and (2) solving the cold-start problem, where data from the system of interest is not available. In this paper, we propose a framework for anomaly detection in log data, as a major troubleshooting source of system information. To that end, we utilize pre-trained general-purpose language models to preserve the semantics of log messages and map them into log vector embeddings. The key idea is that these representations for the logs are robust and less invariant to changes in the logs, and therefore, result in a better generalization of the anomaly detection models. We perform several experiments on a cloud dataset evaluating different language models for obtaining numerical log representations such as BERT, GPT-2, and XL. The robustness is evaluated by gradually altering log messages, to simulate a change in semantics. Our results show that the proposed approach achieves high performance and robustness, which opens up possibilities for future research in this direction.

READ FULL TEXT
research
08/04/2021

Log-based Anomaly Detection Without Log Parsing

Software systems often record important runtime information in system lo...
research
02/09/2022

Log-based Anomaly Detection with Deep Learning: How Far Are We?

Software-intensive systems produce logs for troubleshooting purposes. Re...
research
06/08/2023

Scalable and Adaptive Log-based Anomaly Detection with Expert in the Loop

System logs play a critical role in maintaining the reliability of softw...
research
08/21/2020

Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs

The detection of anomalies is essential mining task for the security and...
research
11/18/2021

LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model

The system log generated in a computer system refers to large-scale data...
research
06/11/2019

Anomaly Detection in High Performance Computers: A Vicinity Perspective

In response to the demand for higher computational power, the number of ...
research
11/20/2019

Log Message Anomaly Detection and Classification Using Auto-B/LSTM and Auto-GRU

Log messages are now widely used in software systems. They are important...

Please sign up or login with your details

Forgot password? Click here to reset