Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems

05/31/2018
by   Chao Liu, et al.
14

Performance monitoring, anomaly detection, and root-cause analysis in complex cyber-physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching (S^3, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association (A^3, a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) S^3 and A^3 approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2020

Root Cause Detection Among Anomalous Time Series Using Temporal State Alignment

The recent increase in the scale and complexity of software systems has ...
research
04/22/2022

NLP Based Anomaly Detection for Categorical Time Series

Identifying anomalies in large multi-dimensional time series is a crucia...
research
04/08/2019

Plant-wide fault and disturbance screening using combined transfer entropy and eigenvector centrality analysis

Finding the source of a disturbance or fault in complex systems such as ...
research
05/17/2022

A Framework for Checkpointing and Recovery of Hierarchical Cyber-Physical Systems

This paper tackles the problem of making complex resource-constrained cy...
research
04/06/2023

Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems

This paper presents the Real-time Adaptive and Interpretable Detection (...
research
03/21/2022

Alarm-Based Root Cause Analysis in Industrial Processes Using Deep Learning

Alarm management systems have become indispensable in modern industry. A...
research
02/03/2023

Deep Reinforcement Learning for Online Error Detection in Cyber-Physical Systems

Reliability is one of the major design criteria in Cyber-Physical System...

Please sign up or login with your details

Forgot password? Click here to reset