Kollaps: Decentralized and Dynamic Topology Emulation

04/05/2020
by   Paulo Gouveia, et al.
0

The performance and behavior of large-scale distributed applications is highly influenced by network properties such as latency, bandwidth, packet loss, and jitter. For instance, an engineer might need to answer questions such as: What is the impact of an increase in network latency in application response time? How does moving a cluster between geographical regions affect application throughput? How network dynamics affects application stability? Answering these questions in a systematic and reproducible way is very hard, given the variability and lack of control over the underlying network. Unfortunately, state-of-the-art network emulation or testbeds scale poorly (i.e., MiniNet), focus exclusively on the control-plane (i.e., CrystalNet) or ignore network dynamics (i.e., EmuLab). Kollaps is a fully distributed network emulator that address these limitations. Kollaps hinges on two key observations. First, from an application's perspective, what matters are the emergent end-to-end properties (e.g., latency, bandwidth, packet loss, and jitter) rather than the internal state of the routers and switches leading to those properties. This premise allows us to build a simpler, dynamically adaptable, emulation model that circumvent maintaining the full network state. Second, this simplified model is maintainable in a fully decentralized way, allowing the emulation to scale with the number of machines for the application. Kollaps is fully decentralized, agnostic of the application language and transport protocol, scales to thousands of processes and is accurate when compared against a bare-metal deployment or state-of-the-art approaches that emulate the full state of the network. We showcase how Kollaps can accurately reproduce results from the literature and predict the behaviour of a complex unmodified distributed key-value store (i.e., Cassandra) under different deployments.

READ FULL TEXT

page 3

page 4

page 11

page 12

research
11/11/2022

Lamda: The Last Mile of the Datacenter Network Does matter

In this paper, we conduct systematic measurement studies to show that th...
research
09/21/2020

NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration

We present NetReduce, a novel RDMA-compatible in-network reduction archi...
research
12/29/2021

KRCORE: a microsecond-scale RDMA control plane for elastic computing

This paper presents KRCORE, an RDMA library with a microsecond-scale con...
research
01/29/2021

A Model of WiFi Performance With Bounded Latency

In September 2020, the Broadband Forum published a new industry standard...
research
12/13/2022

Enabling the Reflex Plane with the nanoPU

Many recent papers have demonstrated fast in-network computation using p...
research
06/16/2021

Dynamic Recompilation of Software Network Services with Morpheus

State-of-the-art approaches to design, develop and optimize software pac...
research
04/19/2022

Network Bandwidth Variation-Adapted State Transfer for Geo-Replicated State Machines and its Application to Dynamic Replica Replacement

This paper proposes a new state transfer method for geographic state mac...

Please sign up or login with your details

Forgot password? Click here to reset