Gleam: An RDMA-accelerated Multicast Protocol for Datacenter Networks

07/26/2023
by   Wenxue Li, et al.
0

RDMA has been widely adopted for high-speed datacenter networks. However, native RDMA merely supports one-to-one reliable connection, which mismatches various applications with group communication patterns (e.g., one-to-many). While there are some multicast enhancements to address it, they all fail to simultaneously achieve optimal multicast forwarding and fully unleash the distinguished RDMA capabilities. In this paper, we present Gleam, an RDMA-accelerated multicast protocol that simultaneously supports optimal multicast forwarding, efficient utilization of the prominent RDMA capabilities, and compatibility with the commodity RNICs. At its core, Gleam re-purposes the existing RDMA RC logic with careful switch coordination as an efficient multicast transport. Gleam performs the one-to-many connection maintenance and many-to-one feedback aggregation, based on an extended multicast forwarding table structure, to achieve integration between standard RC logic and in-fabric multicast. We implement a fully functional Gleam prototype. With extensive testbed experiments and simulations, we demonstrate Gleam's significant improvement in accelerating multicast communication of realistic applications. For instance, Gleam achieves 2.9X lower communication time of an HPC benchmark application and 2.7X higher data replication throughput.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2023

MCQUIC: Multicast and unicast in a single transport protocol

Multicast enables efficient one-to-many communications. Several applicat...
research
04/02/2021

Efficient Replication via Timestamp Stability (Extended Version)

Modern web applications replicate their data across the globe and requir...
research
04/13/2023

accelerating wrf i/o performance with adios2 and network-based streaming

With the approach of Exascale computing power for large-scale High Perfo...
research
02/27/2018

Elmo: Source-Routed Multicast for Cloud Services

Modern data-center applications frequently exhibit one-to-many communica...
research
09/21/2020

NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration

We present NetReduce, a novel RDMA-compatible in-network reduction archi...
research
06/29/2020

The Interblockchain Communication Protocol: An Overview

The interblockchain communication protocol (IBC) is an end-to-end, conne...
research
12/19/2017

Enabling Work-conserving Bandwidth Guarantees for Multi-tenant Datacenters via Dynamic Tenant- eue Binding

Today's cloud networks are shared among many tenants. Bandwidth guarante...

Please sign up or login with your details

Forgot password? Click here to reset