NetChain: Scale-Free Sub-RTT Coordination (Extended Version)

by   Xin Jin, et al.

Coordination services are a fundamental building block of modern cloud systems, providing critical functionalities like configuration management and distributed locking. The major challenge is to achieve low latency and high throughput while providing strong consistency and fault-tolerance. Traditional server-based solutions require multiple round-trip times (RTTs) to process a query. This paper presents NetChain, a new approach that provides scale-free sub-RTT coordination in datacenters. NetChain exploits recent advances in programmable switches to store data and process queries entirely in the network data plane. This eliminates the query processing at coordination servers and cuts the end-to-end latency to as little as half of an RTT---clients only experience processing delay from their own software stack plus network delay, which in a datacenter setting is typically much smaller. We design new protocols and algorithms based on chain replication to guarantee strong consistency and to efficiently handle switch failures. We implement a prototype with four Barefoot Tofino switches and four commodity servers. Evaluation results show that compared to traditional server-based solutions like ZooKeeper, our prototype provides orders of magnitude higher throughput and lower latency, and handles failures gracefully.



There are no comments yet.


page 3


RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)

Low-latency online services have strict Service Level Objectives (SLOs) ...

Scaling Out Acid Applications with Operation Partitioning

OLTP applications with high workloads that cannot be served by a single ...

Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

Distributed storage employs replication to mask failures and improve ava...

Fault Tolerance for Service Function Chains

Traffic in enterprise networks typically traverses a sequence of middleb...

Designing knowledge plane to optimize leaf and spine data center

In the last few decades, data center architecture evolved from the tradi...

LB Scalability: Achieving the Right Balance Between Being Stateful and Stateless

A high performance Layer-4 load balancer (LB) is one of the most importa...

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.