Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems

09/14/2017
by   Sungjoon Koh, et al.
0

Large-scale systems with arrays of solid state disks (SSDs) have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding can offer a significantly lower storage cost than replication. To understand the impact of using erasure coding on system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster consisting of approximately one hundred processor cores with more than fifty high-performance SSDs, and evaluate the cluster with a popular open-source distributed parallel file system, Ceph. Then we analyze behaviors of systems adopting erasure coding from the following five viewpoints, compared with those of systems using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes; (5) the impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, which are used by Google and Facebook file systems, and compare them with triple replication that a typical parallel file system employs as a default fault tolerance mechanism. Lastly, we collect 54 block-level traces from the cluster and make them available for other researchers.

READ FULL TEXT

page 3

page 6

page 8

page 9

research
06/12/2019

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters

Large-scale systems with all-flash arrays have become increasingly commo...
research
06/20/2022

Building Blocks for Network-Accelerated Distributed File Systems

High-performance clusters and datacenters pose increasingly demanding re...
research
12/27/2018

Extending TCP for Accelerating Replication on Cluster File Systems over SDNs

This paper explores the changes required of TCP to efficiently support c...
research
03/04/2018

Applied Erasure Coding in Networks and Distributed Storage

The amount of digital data is rapidly growing. There is an increasing us...
research
01/31/2022

Fragmented ARES: Dynamic Storage for Large Objects

Data availability is one of the most important features in distributed s...
research
08/01/2017

Performance Measurements of Supercomputing and Cloud Storage Solutions

Increasing amounts of data from varied sources, particularly in the fiel...
research
03/22/2023

How does SSD Cluster Perform for Distributed File Systems: An Empirical Study

As the capacity of Solid-State Drives (SSDs) is constantly being optimis...

Please sign up or login with your details

Forgot password? Click here to reset