Investigating Power Outage Effects on Reliability of Solid-State Drives

04/29/2018
by   Saba Ahmadian, et al.
0

Solid-State Drives (SSDs) are recently employed in enterprise servers and high-end storage systems in order to enhance performance of storage subsystem. Although employing high speed SSDs in the storage subsystems can significantly improve system performance, it comes with significant reliability threat for write operations upon power failures. In this paper, we present a comprehensive analysis investigating the impact of workload dependent parameters on the reliability of SSDs under power failure for variety of SSDs (from top manufacturers). To this end, we first develop a platform to perform two important features required for study: a) a realistic fault injection into the SSD in the computing systems and b) data loss detection mechanism on the SSD upon power failure. In the proposed physical fault injection platform, SSDs experience a real discharge phase of Power Supply Unit (PSU) that occurs during power failure in data centers which was neglected in previous studies. The impact of workload dependent parameters such as workload Working Set Size (WSS), request size, request type, access pattern, and sequence of accesses on the failure of SSDs is carefully studied in the presence of realistic power failures. Experimental results over thousands number of fault injections show that data loss occurs even after completion of the request (up to 700ms) where the failure rate is influenced by the type, size, access pattern, and sequence of IO accesses while other parameters such as workload WSS has no impact on the failure of SSDs.

READ FULL TEXT

page 3

page 4

research
12/01/2019

Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

In this paper, we present a comprehensive analysis investigating the rel...
research
07/01/2019

Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo

We present a set of fault injection experiments performed on the ACES (L...
research
12/23/2021

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

To help reliability of SSD arrays, Redundant Array of Independent Disks ...
research
09/30/2020

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Cloud computing systems fail in complex and unexpected ways due to unexp...
research
12/22/2020

The Life and Death of SSDs and HDDs: Similarities, Differences, and Prediction Models

Data center downtime typically centers around IT equipment failure. Stor...
research
11/27/2019

A Utilization Model for Optimization of Checkpoint Intervals in Distributed Stream Processing Systems

State-of-the-art distributed stream processing systems such as Apache Fl...
research
05/06/2020

On Failure Diagnosis of the Storage Stack

Diagnosing storage system failures is challenging even for professionals...

Please sign up or login with your details

Forgot password? Click here to reset