A Reliability Model for Dependent and Distributed MDS Disk Array Units

10/24/2018
by   Suayb S. Arslan, et al.
0

Archiving and systematic backup of large digital data generates a quick demand for multi-peta byte scale storage systems. As drive capacities continue to grow beyond the few terabytes range to address the demands of today's cloud, the likelihood of having multiple/simultaneous disk failures become a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multi-parity protection schemes based on modern Maximum Distance Separable (MDS) EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.

READ FULL TEXT
research
06/25/2019

Repairing Generalized Reed-Muller Codes

In distributed storage systems, both the repair bandwidth and locality a...
research
05/29/2022

Two New Piggybacking Designs with Lower Repair Bandwidth

Piggybacking codes are a special class of MDS array codes that can achie...
research
01/18/2019

A Note on the Transformation to Enable Optimal Repair in MDS Codes for Distributed Storage Systems

For high-rate maximum distance separable (MDS) codes, most early constru...
research
09/17/2021

Rack-Aware MSR Codes with Multiple Erasure Tolerance

The minimum storage rack-aware regenerating (MSRR) code is a variation o...
research
02/22/2018

A New Design of Binary MDS Array Codes with Asymptotically Weak-Optimal Repair

Binary maximum distance separable (MDS) array codes are a special class ...
research
05/26/2022

Optimal Repair/Access MDS Array Codes with Multiple Repair Degrees

In the literature, most of the known high-rate (n,k) MDS array codes wit...
research
05/12/2022

Optimizing Apportionment of Redundancies in Hierarchical RAID

Large disk arrays are organized into storage nodes – SNs or bricks with ...

Please sign up or login with your details

Forgot password? Click here to reset