Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

01/22/2023
by   Suayb S. Arslan, et al.
0

This initial version of this document was written back in 2014 for the sole purpose of providing fundamentals of reliability theory as well as to identify the theoretical types of machinery for the prediction of durability/availability of erasure-coded storage systems. Since the definition of a "system" is too broad, we specifically focus on warm/cold storage systems where the data is stored in a distributed fashion across different storage units with or without continuous operation. The contents of this document are dedicated to a review of fundamentals, a few major improved stochastic models, and several contributions of my work relevant to the field. One of the contributions of this document is the introduction of the most general form of Markov models for the estimation of mean time to failure. This work was partially later published in IEEE Transactions on Reliability. Very good approximations for the closed-form solutions for this general model are also investigated. Various storage configurations under different policies are compared using such advanced models. Later in a subsequent chapter, we have also considered multi-dimensional Markov models to address detached drive-medium combinations such as those found in optical disk and tape storage systems. It is not hard to anticipate such a system structure would most likely be part of future DNA storage libraries. This work is partially published in Elsevier Reliability and System Safety. Topics that include simulation modelings for more accurate estimations are included towards the end of the document by noting the deficiencies of the simplified canonical as well as more complex Markov models, due mainly to the stationary and static nature of Markovinity. Throughout the document, we shall focus on concurrently maintained systems although the discussions will only slightly change for the systems repaired one device at a time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

A Data-Assisted Reliability Model for Carrier-Assisted Cold Data Storage Systems

Cold data storage systems are used to allow long term digital preservati...
research
05/21/2020

Modeling and Optimization of Latency in Erasure-coded Storage Systems

As consumers are increasingly engaged in social networking and E-commerc...
research
04/26/2022

Managing Reliability Skew in DNA Storage

DNA is emerging as an increasingly attractive medium for data storage du...
research
09/13/2020

Exploring System Resiliency and Supporting Design Methods

This paper provides a survey of the industry perspective on System Resil...
research
07/20/2018

Finding Structure in Dynamic Networks

This document is the first part of the author's habilitation thesis (HDR...
research
10/03/2020

Codes for Distributed Storage

This chapter deals with the topic of designing reliable and efficient co...
research
05/26/2018

Modeling Impact of Human Errors on the Data Unavailability and Data Loss of Storage Systems

Data storage systems and their availability play a crucial role in conte...

Please sign up or login with your details

Forgot password? Click here to reset