Remaining Useful Life Estimation of Hard Disk Drives using Bidirectional LSTM Networks

09/11/2021
by   Austin Coursey, et al.
27

Physical and cloud storage services are well-served by functioning and reliable high-volume storage systems. Recent observations point to hard disk reliability as one of the most pressing reliability issues in data centers containing massive volumes of storage devices such as HDDs. In this regard, early detection of impending failure at the disk level aids in reducing system downtime and reduces operational loss making proactive health monitoring a priority for AIOps in such settings. In this work, we introduce methods of extracting meaningful attributes associated with operational failure and of pre-processing the highly imbalanced health statistics data for subsequent prediction tasks using data-driven approaches. We use a Bidirectional LSTM with a multi-day look back period to learn the temporal progression of health indicators and baseline them against vanilla LSTM and Random Forest models to come up with several key metrics that establish the usefulness of and superiority of our model under some tightly defined operational constraints. For example, using a 15 day look back period, our approach can predict the occurrence of disk failure with an accuracy of 96.4 days before failure. This helps to alert operations maintenance well in-advance about potential mitigation needs. In addition, our model reports a mean absolute error of 0.12 for predicting failure up to 60 days in advance, placing it among the state-of-the-art in recent literature.

READ FULL TEXT

page 1

page 4

research
03/15/2023

Large-scale End-of-Life Prediction of Hard Disks in Distributed Datacenters

On a daily basis, data centers process huge volumes of data backed by th...
research
10/21/2018

A Data-driven Prognostic Architecture for Online Monitoring of Hard Disks Using Deep LSTM Networks

With the advent of pervasive cloud computing technologies, service relia...
research
09/11/2018

Layerwise Perturbation-Based Adversarial Training for Hard Drive Health Degree Prediction

With the development of cloud computing and big data, the reliability of...
research
09/06/2023

TFBEST: Dual-Aspect Transformer with Learnable Positional Encoding for Failure Prediction

Hard Disk Drive (HDD) failures in datacenters are costly - from catastro...
research
07/21/2021

Predicting Power Electronics Device Reliability under Extreme Conditions with Machine Learning Algorithms

Power device reliability is a major concern during operation under extre...
research
10/13/2022

A Large-Scale Annotated Multivariate Time Series Aviation Maintenance Dataset from the NGAFID

This paper presents the largest publicly available, non-simulated, fleet...
research
08/30/2022

Modeling Soft-Failure Evolution for Triggering Timely Repair with Low QoT Margins

In this work, the capabilities of an encoder-decoder learning framework ...

Please sign up or login with your details

Forgot password? Click here to reset