The Life and Death of SSDs and HDDs: Similarities, Differences, and Prediction Models

12/22/2020
by   Riccardo Pinciroli, et al.
0

Data center downtime typically centers around IT equipment failure. Storage devices are the most frequently failing components in data centers. We present a comparative study of hard disk drives (HDDs) and solid state drives (SSDs) that constitute the typical storage in data centers. Using a six-year field data of 100,000 HDDs of different models from the same manufacturer from the BackBlaze dataset and a six-year field data of 30,000 SSDs of three models from a Google data center, we characterize the workload conditions that lead to failures and illustrate that their root causes differ from common expectation but remain difficult to discern. For the case of HDDs we observe that young and old drives do not present many differences in their failures. Instead, failures may be distinguished by discriminating drives based on the time spent for head positioning. For SSDs, we observe high levels of infant mortality and characterize the differences between infant and non-infant failures. We develop several machine learning failure prediction models that are shown to be surprisingly accurate, achieving high recall and low false positive rates. These models are used beyond simple prediction as they aid us to untangle the complex interaction of workload characteristics that lead to failures and identify failure root causes from monitored symptoms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2019

Large Scale Studies of Memory, Storage, and Network Failures in a Modern Data Center

The workloads running in the modern data centers of large scale Internet...
research
05/06/2020

On Failure Diagnosis of the Storage Stack

Diagnosing storage system failures is challenging even for professionals...
research
07/24/2019

Live Forensics for Distributed Storage Systems

We present Kaleidoscope an innovative system that supports live forensic...
research
04/29/2018

Investigating Power Outage Effects on Reliability of Solid-State Drives

Solid-State Drives (SSDs) are recently employed in enterprise servers an...
research
03/25/2020

NVMe and PCIe SSD Monitoring in Hyperscale Data Centers

With low latency, high throughput and enterprise-grade reliability, SSDs...
research
07/30/2023

Towards Learned Predictability of Storage Systems

With the rapid development of cloud computing and big data technologies,...
research
02/06/2021

A Data Augmented Bayesian Network for Node Failure Prediction in Optical Networks

Failures in optical network backbone can cause significant interruption ...

Please sign up or login with your details

Forgot password? Click here to reset