Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

01/27/2020
by   A. Katsarakis, et al.
0

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5 latency of Hermes is 3.6X lower than that of CRAQ and ZAB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2021

Invalidation-Based Protocols for Replicated Datastores

Distributed in-memory datastores underpin cloud applications that run wi...
research
10/26/2017

Exploiting Commutativity For Practical Fast Replication

Traditional approaches to replication require client requests to be orde...
research
06/26/2023

BBCA-LEDGER: High Throughput Consensus meets Low Latency

This paper presents BBCA-LEDGER, a Byzantine log replication technology ...
research
06/25/2020

Fast General Distributed Transactions with Opacity using Global Time

Transactions can simplify distributed applications by hiding data distri...
research
09/20/2022

Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction

Combining persistent memory (PM) with RDMA is a promising approach to pe...
research
09/21/2020

Resilient Cloud-based Replication with Low Latency

Existing approaches to tolerate Byzantine faults in geo-replicated envir...
research
10/31/2022

Low-Latency, High-Throughput Garbage Collection (Extended Version)

Production garbage collectors make substantial compromises in pursuit of...

Please sign up or login with your details

Forgot password? Click here to reset