Large-scale End-of-Life Prediction of Hard Disks in Distributed Datacenters

03/15/2023
by   Rohan Mohapatra, et al.
0

On a daily basis, data centers process huge volumes of data backed by the proliferation of inexpensive hard disks. Data stored in these disks serve a range of critical functional needs from financial, and healthcare to aerospace. As such, premature disk failure and consequent loss of data can be catastrophic. To mitigate the risk of failures, cloud storage providers perform condition-based monitoring and replace hard disks before they fail. By estimating the remaining useful life of hard disk drives, one can predict the time-to-failure of a particular device and replace it at the right time, ensuring maximum utilization whilst reducing operational costs. In this work, large-scale predictive analyses are performed using severely skewed health statistics data by incorporating customized feature engineering and a suite of sequence learners. Past work suggests using LSTMs as an excellent approach to predicting remaining useful life. To this end, we present an encoder-decoder LSTM model where the context gained from understanding health statistics sequences aid in predicting an output sequence of the number of days remaining before a disk potentially fails. The models developed in this work are trained and tested across an exhaustive set of all of the 10 years of S.M.A.R.T. health data in circulation from Backblaze and on a wide variety of disk instances. It closes the knowledge gap on what full-scale training achieves on thousands of devices and advances the state-of-the-art by providing tangible metrics for evaluation and generalization for practitioners looking to extend their workflow to all years of health data in circulation across disk manufacturers. The encoder-decoder LSTM posted an RMSE of 0.83 during training and 0.86 during testing over the exhaustive 10 year data while being able to generalize competitively over other drives from the Seagate family.

READ FULL TEXT

page 1

page 3

page 10

page 12

page 14

page 15

page 18

page 22

research
09/06/2023

TFBEST: Dual-Aspect Transformer with Learnable Positional Encoding for Failure Prediction

Hard Disk Drive (HDD) failures in datacenters are costly - from catastro...
research
09/11/2021

Remaining Useful Life Estimation of Hard Disk Drives using Bidirectional LSTM Networks

Physical and cloud storage services are well-served by functioning and r...
research
10/21/2018

A Data-driven Prognostic Architecture for Online Monitoring of Hard Disks Using Deep LSTM Networks

With the advent of pervasive cloud computing technologies, service relia...
research
06/08/2023

Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

The refurbishment of an escalator is usually linked with its design life...
research
02/15/2021

A Deep Adversarial Model for Suffix and Remaining Time Prediction of Event Sequences

Event suffix and remaining time prediction are sequence to sequence lear...
research
04/17/2023

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

Predicting the State-of-Health (SoH) of lithium-ion batteries is a funda...
research
08/28/2019

Artificial Neural Networks and Adaptive Neuro-fuzzy Models for Prediction of Remaining Useful Life

The U.S. water distribution system contains thousands of miles of pipes ...

Please sign up or login with your details

Forgot password? Click here to reset