Modeling Impact of Human Errors on the Data Unavailability and Data Loss of Storage Systems

05/26/2018
by   Mostafa Kishani, et al.
0

Data storage systems and their availability play a crucial role in contemporary datacenters. Despite using mechanisms such as automatic fail-over in datacenters, the role of human agents and consequently their destructive errors is inevitable. Due to very large number of disk drives used in exascale datacenters and their high failure rates, the disk subsystem in storage systems has become a major source of Data Unavailability (DU) and Data Loss (DL) initiated by human errors. In this paper, we investigate the effect of Incorrect Disk Replacement Service (IDRS) on the availability and reliability of data storage systems. To this end, we analyze the consequences of IDRS in a disk array, and conduct Monte Carlo simulations to evaluate DU and DL during mission time. The proposed modeling framework can cope with a) different storage array configurations and b) Data Object Survivability (DOS), representing the effect of system level redundancies such as remote backups and mirrors. In the proposed framework, the model parameters are obtained from industrial and scientific reports alongside field data which have been extracted from a datacenter operating with 70 storage racks. The results show that ignoring the impact of IDRS leads to unavailability underestimation by up to three orders of magnitude. Moreover, our study suggests that by considering the effect of human errors, the conventional beliefs about the dependability of different Redundant Array of Independent Disks (RAID) mechanisms should be revised. The results show that RAID1 can result in lower availability compared to RAID5 in the presence of human errors. The results also show that employing automatic fail-over policy (using hot spare disks) can reduce the drastic impacts of human errors by two orders of magnitude.

READ FULL TEXT

page 10

page 11

research
05/26/2018

Evaluating Impact of Human Errors on the Availability of Data Storage Systems

In this paper, we investigate the effect of incorrect disk replacement s...
research
12/23/2021

Dependability Analysis of Data Storage Systems in Presence of Soft Errors

In recent years, high availability and reliability of Data Storage Syste...
research
12/23/2021

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

To help reliability of SSD arrays, Redundant Array of Independent Disks ...
research
09/01/2018

Eliminating Boundaries in Cloud Storage with Anna

In this paper, we describe how we extended a distributed key-value store...
research
04/17/2018

Simplex Queues for Hot-Data Download

In cloud storage systems, hot data is usually replicated over multiple n...
research
11/01/2019

A Data-Assisted Reliability Model for Carrier-Assisted Cold Data Storage Systems

Cold data storage systems are used to allow long term digital preservati...
research
01/22/2023

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

This initial version of this document was written back in 2014 for the s...

Please sign up or login with your details

Forgot password? Click here to reset