TFBEST: Dual-Aspect Transformer with Learnable Positional Encoding for Failure Prediction

09/06/2023
by   Rohan Mohapatra, et al.
0

Hard Disk Drive (HDD) failures in datacenters are costly - from catastrophic data loss to a question of goodwill, stakeholders want to avoid it like the plague. An important tool in proactively monitoring against HDD failure is timely estimation of the Remaining Useful Life (RUL). To this end, the Self-Monitoring, Analysis and Reporting Technology employed within HDDs (S.M.A.R.T.) provide critical logs for long-term maintenance of the security and dependability of these essential data storage devices. Data-driven predictive models in the past have used these S.M.A.R.T. logs and CNN/RNN based architectures heavily. However, they have suffered significantly in providing a confidence interval around the predicted RUL values as well as in processing very long sequences of logs. In addition, some of these approaches, such as those based on LSTMs, are inherently slow to train and have tedious feature engineering overheads. To overcome these challenges, in this work we propose a novel transformer architecture - a Temporal-fusion Bi-encoder Self-attention Transformer (TFBEST) for predicting failures in hard-drives. It is an encoder-decoder based deep learning technique that enhances the context gained from understanding health statistics sequences and predicts a sequence of the number of days remaining before a disk potentially fails. In this paper, we also provide a novel confidence margin statistic that can help manufacturers replace a hard-drive within a time frame. Experiments on Seagate HDD data show that our method significantly outperforms the state-of-the-art RUL prediction methods during testing over the exhaustive 10-year data from Backblaze (2013-present). Although validated on HDD failure prediction, the TFBEST architecture is well-suited for other prognostics applications and may be adapted for allied regression problems.

READ FULL TEXT

page 1

page 5

research
03/15/2023

Large-scale End-of-Life Prediction of Hard Disks in Distributed Datacenters

On a daily basis, data centers process huge volumes of data backed by th...
research
06/30/2021

Dual Aspect Self-Attention based on Transformer for Remaining Useful Life Prediction

Remaining useful life prediction (RUL) is one of the key technologies of...
research
04/01/2021

ProcessTransformer: Predictive Business Process Monitoring with Transformer Network

Predictive business process monitoring focuses on predicting future char...
research
02/12/2021

Interpretable Predictive Maintenance for Hard Drives

Existing machine learning approaches for data-driven predictive maintena...
research
09/11/2021

Remaining Useful Life Estimation of Hard Disk Drives using Bidirectional LSTM Networks

Physical and cloud storage services are well-served by functioning and r...
research
11/29/2022

Encoder-Decoder Model for Suffix Prediction in Predictive Monitoring

Predictive monitoring is a subfield of process mining that aims to predi...
research
03/25/2020

NVMe and PCIe SSD Monitoring in Hyperscale Data Centers

With low latency, high throughput and enterprise-grade reliability, SSDs...

Please sign up or login with your details

Forgot password? Click here to reset