Tracking System Behaviour from Resource Usage Data

05/30/2017
by   Niyazi Sorkunlu, et al.
0

Resource usage data, collected using tools such as TACC Stats, capture the resource utilization by nodes within a high performance computing system. We present methods to analyze the resource usage data to understand the system performance and identify performance anomalies. The core idea is to model the data as a three-way tensor corresponding to the compute nodes, usage metrics, and time. Using the reconstruction error between the original tensor and the tensor reconstructed from a low rank tensor decomposition, as a scalar performance metric, enables us to monitor the performance of the system in an online fashion. This error statistic is then used for anomaly detection that relies on the assumption that the normal/routine behavior of the system can be captured using a low rank approx- imation of the original tensor. We evaluate the performance of the algorithm using information gathered from system logs and show that the performance anomalies identified by the proposed method correlates with critical errors reported in the system logs. Results are shown for data collected for 2013 from the Lonestar4 system at the Texas Advanced Computing Center (TACC)

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2018

dynamicMF: A Matrix Factorization Approach to Monitor Resource Usage in High Performance Computing Systems

High performance computing (HPC) facilities consist of a large number of...
research
04/20/2020

Network Anomaly Detection based on Tensor Decomposition

The problem of detecting anomalies in time series from network measureme...
research
10/08/2021

Hankel-structured Tensor Robust PCA for Multivariate Traffic Time Series Anomaly Detection

Spatiotemporal traffic data (e.g., link speed/flow) collected from senso...
research
04/18/2022

Unveiling User Behavior on Summit Login Nodes as a User

We observe and analyze usage of the login nodes of the leadership class ...
research
11/18/2018

The core consistency of a compressed tensor

Tensor decomposition on big data has attracted significant attention rec...
research
02/02/2018

Representation Learning for Resource Usage Prediction

Creating a model of a computer system that can be used for tasks such as...
research
03/03/2018

Multiresolution Tensor Decomposition for Multiple Spatial Passing Networks

This article is motivated by soccer positional passing networks collecte...

Please sign up or login with your details

Forgot password? Click here to reset