Cost-effective BlackWater Raft on Highly Unreliable Nodes at Scale Out
The Raft algorithm maintains strong consistency across data replicas in Cloud. This algorithm divides nodes into leaders and followers, to satisfy read/write requests spanning geo-diverse sites. With the increase of workload, Raft shall provide scale-out performance in proportion. However, traditional scale-out techniques encounter bottlenecks in Raft, and when the provisioned sites exhaust local resources, the performance loss will grow exponentially. To provide scalability in Raft, this paper proposes a cost-effective mechanism for elastic auto-scaling in Raft, called BlackWater-Raft or BW-Raft. BW-Raft extends the original Raft with the following abstractions: (1) secretary nodes that take over expensive log synchronization operations from the leader, relaxing the performance constraints on locks. (2) massive low cost observer nodes that handle reads only, improving throughput for typical data intensive services. These abstractions are stateless, allowing elastic scale-out on unreliable yet cheap spot instances. In theory, we demonstrate that BW-Raft can maintain Raft's strong consistency guarantees when scaling out, processing a 50X increase in the number of nodes compared to the original Raft. We have prototyped the BW-Raft on key-value services and evaluated it with many state-of-the-arts on Amazon EC2 and Alibaba Cloud. Our results show that within the same budget, BW-Raft's resource footprint increments are 5-7X smaller than Multi-Raft, and 2X better than original Raft. Using spot instances, BW-Raft can reduces costs by 84.5% compared to Multi-Raft. In the real world experiments, BW-Raft improves goodput of the 95th-percentile SLO by 9.4X, thus serving as an alternative for services scaling out with strong consistency.
READ FULL TEXT