DeepAI
Log In Sign Up

Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

With the advancement of huge data generation and data handling capability, Machine Learning and Probabilistic modelling enables an immense opportunity to employ predictive analytics platform in high security critical industries namely data centers, electricity grids, utilities, airport etc. where downtime minimization is one of the primary objectives. This paper proposes a novel, complete architecture of an intelligent predictive analytics platform, Fault Engine, for huge device network connected with electrical/information flow. Three unique modules, here proposed, seamlessly integrate with available technology stack of data handling and connect with middleware to produce online intelligent prediction in critical failure scenarios. The Markov Failure module predicts the severity of a failure along with survival probability of a device at any given instances. The Root Cause Analysis model indicates probable devices as potential root cause employing Bayesian probability assignment and topological sort. Finally, a community detection algorithm produces correlated clusters of device in terms of failure probability which will further narrow down the search space of finding route cause. The whole Engine has been tested with different size of network with simulated failure environments and shows its potential to be scalable in real-time implementation.

READ FULL TEXT

page 8

page 10

02/17/2020

IoTRepair: Systematically Addressing Device Faults in Commodity IoT (Extended Paper)

IoT devices are decentralized and deployed in un-stable environments, wh...
03/21/2020

Causality-Guided Adaptive Interventional Debugging

Runtime nondeterminism is a fact of life in modern database applications...
05/28/2022

Survival Analysis on Structured Data using Deep Reinforcement Learning

Survival analysis is playing a major role in manufacturing sector by ana...
06/11/2022

Rare event failure test case generation in Learning-Enabled-Controllers

Machine learning models have prevalent applications in many real-world p...
09/30/2020

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Cloud computing systems fail in complex and unexpected ways due to unexp...
03/25/2020

NVMe and PCIe SSD Monitoring in Hyperscale Data Centers

With low latency, high throughput and enterprise-grade reliability, SSDs...