TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services

02/25/2021
by   Dominik Scheinert, et al.
22

Deployment, operation and maintenance of large IT systems becomes increasingly complex and puts human experts under extreme stress when problems occur. Therefore, utilization of machine learning (ML) and artificial intelligence (AI) is applied on IT system operation and maintenance - summarized in the term AIOps. One specific direction aims at the recognition of re-occurring anomaly types to enable remediation automation. However, due to IT system specific properties, especially their frequent changes (e.g. software updates, reconfiguration or hardware modernization), recognition of reoccurring anomaly types is challenging. Current methods mainly assume a static dimensionality of provided data. We propose a method that is invariant to dimensionality changes of given data. Resource metric data such as CPU utilization, allocated memory and others are modelled as multivariate time series. The extraction of temporal and spatial features together with the subsequent anomaly classification is realized by utilizing TELESTO, our novel graph convolutional neural network (GCNN) architecture. The experimental evaluation is conducted in a real-world cloud testbed deployment that is hosting two applications. Classification results of injected anomalies on a cassandra database node show that TELESTO outperforms the alternative GCNNs and achieves an overall classification accuracy of 85.1 for the other nodes show accuracy values between 85

READ FULL TEXT

page 1

page 2

page 6

page 8

page 9

page 10

page 11

page 12

research
03/09/2021

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Operation and maintenance of large distributed cloud applications can qu...
research
05/28/2021

A Survey on Anomaly Detection for Technical Systems using LSTM Networks

Anomalies represent deviations from the intended system operation and ca...
research
07/30/2019

CloudDet: Interactive Visual Analysis of Anomalous Performances in Cloud Computing Systems

Detecting and analyzing potential anomalous performances in cloud comput...
research
12/10/2021

MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series

Detecting anomalies in time-varying multivariate data is crucial in vari...
research
08/04/2021

The MIT Supercloud Dataset

Artificial intelligence (AI) and Machine learning (ML) workloads are an ...
research
05/18/2020

Anomaly Detection in Cloud Components

Cloud platforms, under the hood, consist of a complex inter-connected st...
research
10/07/2021

Towards Robust and Transferable IIoT Sensor based Anomaly Classification using Artificial Intelligence

The increasing deployment of low-cost industrial IoT (IIoT) sensor platf...

Please sign up or login with your details

Forgot password? Click here to reset