GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters

08/09/2023
by   Guillermo Bernárdez, et al.
0

Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols – and their main variants – are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution relies on a novel combination of Multi-agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN), and it is compatible with widely deployed ECN-based CC protocols. GraphCC deploys distributed agents on switches that communicate with their neighbors to cooperate and optimize the global ECN configuration. In our evaluation, we test the performance of GraphCC under a wide variety of scenarios, focusing on the capability of this solution to adapt to new scenarios unseen during training (e.g., new traffic workloads, failures, upgrades). We compare GraphCC with a state-of-the-art MARL-based solution for ECN tuning – ACC – and observe that our proposed solution outperforms the state-of-the-art baseline in all of the evaluation scenarios, showing improvements up to 20% in Flow Completion Time as well as significant reductions in buffer occupancy (38.0-85.7%).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

MAGNNETO: A Graph Neural Network-based Multi-Agent system for Traffic Engineering

Current trends in networking propose the use of Machine Learning (ML) fo...
research
01/29/2023

A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers

Various congestion control protocols have been designed to achieve high ...
research
06/04/2022

MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Congestion Control (CC), as the core networking task to efficiently util...
research
04/30/2023

SFC: Near-Source Congestion Signaling and Flow Control

State-of-the-art congestion control algorithms for data centers alone do...
research
12/29/2022

Neighbor Auto-Grouping Graph Neural Networks for Handover Parameter Configuration in Cellular Network

The mobile communication enabled by cellular networks is the one of the ...
research
09/22/2019

Backpressure Flow Control

Effective congestion control in a multi-tenant data center is becoming i...
research
07/09/2020

Automatic Detection of Major Freeway Congestion Events Using Wireless Traffic Sensor Data: A Machine Learning Approach

Monitoring the dynamics of traffic in major corridors can provide invalu...

Please sign up or login with your details

Forgot password? Click here to reset