TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services

by   Dominik Scheinert, et al.

Deployment, operation and maintenance of large IT systems becomes increasingly complex and puts human experts under extreme stress when problems occur. Therefore, utilization of machine learning (ML) and artificial intelligence (AI) is applied on IT system operation and maintenance - summarized in the term AIOps. One specific direction aims at the recognition of re-occurring anomaly types to enable remediation automation. However, due to IT system specific properties, especially their frequent changes (e.g. software updates, reconfiguration or hardware modernization), recognition of reoccurring anomaly types is challenging. Current methods mainly assume a static dimensionality of provided data. We propose a method that is invariant to dimensionality changes of given data. Resource metric data such as CPU utilization, allocated memory and others are modelled as multivariate time series. The extraction of temporal and spatial features together with the subsequent anomaly classification is realized by utilizing TELESTO, our novel graph convolutional neural network (GCNN) architecture. The experimental evaluation is conducted in a real-world cloud testbed deployment that is hosting two applications. Classification results of injected anomalies on a cassandra database node show that TELESTO outperforms the alternative GCNNs and achieves an overall classification accuracy of 85.1 for the other nodes show accuracy values between 85



There are no comments yet.


page 1

page 2

page 6

page 8

page 9

page 10

page 11

page 12


Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Operation and maintenance of large distributed cloud applications can qu...

CloudDet: Interactive Visual Analysis of Anomalous Performances in Cloud Computing Systems

Detecting and analyzing potential anomalous performances in cloud comput...

MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series

Detecting anomalies in time-varying multivariate data is crucial in vari...

The MIT Supercloud Dataset

Artificial intelligence (AI) and Machine learning (ML) workloads are an ...

Machine Learning Framwork for Performance Anomaly in OpenMP Multi-Threaded Systems

Some OpenMP multi-threaded applications increasingly suffer from perform...

Towards Robust and Transferable IIoT Sensor based Anomaly Classification using Artificial Intelligence

The increasing deployment of low-cost industrial IoT (IIoT) sensor platf...

Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction

The rapid growth and distribution of IT systems increases their complexi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.