DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

08/14/2017
by   You-Luen Lee, et al.
0

When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2022

First CE Matters: On the Importance of Long Term Properties on Memory Failure Prediction

Dynamic random access memory failures are a threat to the reliability of...
research
06/04/2021

Manifold-Aware Deep Clustering: Maximizing Angles between Embedding Vectors Based on Regular Simplex

This paper presents a new deep clustering (DC) method called manifold-aw...
research
06/27/2021

Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Modern magnetic sensor arrays conventionally utilize state of the art lo...
research
12/28/2021

QUIC Throughput and Fairness over Dual Connectivity (extended)

Dual Connectivity (DC) is an important lower-layer feature accelerating ...
research
06/16/2020

Unified SVM algorithm based LS-DC Loss

Over the past two decades, Support Vector Machine (SVM) has been a popul...
research
10/21/2022

Feature Engineering and Classification Models for Partial Discharge in Power Transformers

To ensure reliability, power transformers are monitored for partial disc...
research
11/13/2019

Coarse-Refinement Dilemma: On Generalization Bounds for Data Clustering

The Data Clustering (DC) problem is of central importance for the area o...

Please sign up or login with your details

Forgot password? Click here to reset