RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial

06/14/2023
by   Alexander Thomasian, et al.
0

This is a followup to the 1994 tutorial by Berkeley RAID researchers whose 1988 RAID paper foresaw a revolutionary change in storage industry based on advances in magnetic disk technology, i.e., replacement of large capacity expensive disks with arrays of small capacity inexpensive disks. NAND flash SSDs which use less power, incur very low latency, provide high bandwidth, and are more reliable than HDDs are expected to replace HDDs as their prices drop. Replication in the form of mirrored disks and erasure coding via parity and Reed-Solomon codes are two methods to achieve higher reliability through redundancy in disk arrays. RAID(4+k), k=1,2,... arrays utilizing k check strips makes them k-disk-failure-tolerant with maximum distance separable coding with minimum redundancy. Clustered RAID, local recovery codes, partial MDS, and multilevel RAID are proposals to improve RAID reliability and performance. We discuss RAID5 performance and reliability analysis in conjunction with HDDs w/o and with latent sector errors - LSEs, which can be dealt with by intradisk redundancy and disk scrubbing, the latter enhanced with machine learning algorithms. Undetected disk errors causing silent data corruption are propagated by rebuild. We utilize the M/G/1 queueing model for RAID5 performance evaluation, present approximations for fork/join response time in degraded mode analysis, and the vacationing server model for rebuild analysis. Methods and tools for reliability evaluation with Markov chain modeling and simulation are discussed. Queueing and reliability analysis are based on probability theory and stochastic processes so that the two topics can be studied together. Their application is presented here in the context of RAID arrays in a tutorial manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

Optimizing Apportionment of Redundancies in Hierarchical RAID

Large disk arrays are organized into storage nodes – SNs or bricks with ...
research
01/26/2018

Mirrored and Hybrid Disk Arrays: Organization, Scheduling, Reliability, and Performance

Basic mirroring (BM) classified as RAID level 1 replicates data on two d...
research
12/23/2021

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

To help reliability of SSD arrays, Redundant Array of Independent Disks ...
research
12/20/2019

Analyzing the Download Time of Availability Codes

Availability codes have recently been proposed to facilitate efficient s...
research
02/28/2018

Redundancy allocation in finite-length nested codes for nonvolatile memories

In this paper, we investigate the optimum way to allocate redundancy of ...
research
03/04/2018

Applied Erasure Coding in Networks and Distributed Storage

The amount of digital data is rapidly growing. There is an increasing us...
research
06/19/2023

Seamless Redundancy for High Reliability Wi-Fi

By removing wire harness, Wi-Fi is becoming increasingly pervasive in ev...

Please sign up or login with your details

Forgot password? Click here to reset