Multi-level Forwarding and Scheduling Recovery Algorithm in Rapidly-changing Network for Erasure-coded Clusters

11/03/2020
by   Hai Zhou, et al.
0

A key design goal of erasure-coded clusters is to reduce the repair time. The existing Erasure-coded data repair schemes are roughly classified into two categories: 1. Designing rapid data repair (e.g., PPR) in a homogeneous environment. 2. Constructing data repair (e.g., PPT) based on bandwidth in a heterogeneous environment. However, these solutions are difficult to cope with the heterogeneous and Rapidly-changing network in erasure-coded clusters. To address this problem, a bandwidth-aware multi-level forwarding repair algorithm, called BMFRepair, is proposed. BMFRepair monitors the network bandwidth in real time when data is forwarded, and selects idle nodes with high-bandwidth links to assist in forwarding. Thus, it can reduce the time bottleneck caused by low link transmission. At the same time, multi-node repair becomes very complicated when the bandwidth changes drastically. A multi-node scheduling repairing algorithm, called MSRepair, is proposed for multi-node repairing problems, which can repair multiple failed blocks in parallel by scheduling node resources. The two algorithms can flexibly adapt to the rapidly changing network environment and make full use of the bandwidth resources of idle nodes. Most importantly, algorithms can continuously adjust the repair plan according to the bandwidth change in fast and dynamic network. The algorithms have been evaluated by both simulations on Mininet and real experiments on Aliyun cloud platform ECS. Results show that compared with the state-of-the-art repair schemes PPR and PPT, the algorithms can significantly reduce the repair time in rapidly-changing network.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 8

page 9

research
08/05/2019

Repair Pipelining for Erasure-Coded Storage: Algorithms and Evaluation

We propose repair pipelining, a technique that speeds up the repair perf...
research
05/19/2022

An Efficient Piggybacking Design Framework with Sub-packetization l≤ r for All-Node Repair

Piggybacking design has been widely applied in distributed storage syste...
research
09/20/2022

Two Piggybacking Codes with Flexible Sub-Packetization to Achieve Lower Repair Bandwidth

As a special class of array codes, (n,k,m) piggybacking codes are MDS co...
research
01/10/2019

Capacity of Distributed Storage Systems with Clusters and Separate Nodes

In distributed storage systems (DSSs), the optimal tradeoff between node...
research
11/02/2022

Node repair on connected graphs, Part II

We continue our study of regenerating codes in distributed storage syste...
research
06/30/2018

Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair

Repair of multiple partially failed cache nodes is studied in a distribu...
research
11/02/2020

Fast Biconnectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance

Maintaining a robust communication network plays an important role in th...

Please sign up or login with your details

Forgot password? Click here to reset