A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems

03/21/2018
by   Awais Khan, et al.
0

Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared- nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

The Approach to Managing Provenance Metadata and Data Access Rights in Distributed Storage Using the Hyperledger Blockchain Platform

The paper suggests a new approach based on blockchain technologies and s...
research
11/26/2019

LogPlayer: Fault-tolerant Exactly-once Delivery using gRPC Asynchronous Streaming

In this paper, we present the design of our LogPlayer that is a componen...
research
04/18/2023

RPDP: An Efficient Data Placement based on Residual Performance for P2P Storage Systems

Storage systems using Peer-to-Peer (P2P) architecture are an alternative...
research
05/21/2022

BunchBFT: Across-Cluster Consensus Protocol

In this paper, we present BunchBFT Byzantine fault-tolerant state-machin...
research
08/19/2021

Byzantine Cluster-Sending in Expected Constant Communication

Traditional resilient systems operate on fully-replicated fault-tolerant...
research
06/12/2019

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters

Large-scale systems with all-flash arrays have become increasingly commo...
research
05/29/2021

SMURF: Efficient and Scalable Metadata Access for Distributed Applications

In parallel with big data processing and analysis dominating the usage o...

Please sign up or login with your details

Forgot password? Click here to reset