A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers

01/29/2023
by   Shiva Ketabi, et al.
0

Various congestion control protocols have been designed to achieve high performance in different network environments. Modern online learning solutions that delegate the congestion control actions to a machine cannot properly converge in the stringent time scales of data centers. We leverage multiagent reinforcement learning to design a system for dynamic tuning of congestion control parameters at end-hosts in a data center. The system includes agents at the end-hosts to monitor and report the network and traffic states, and agents to run the reinforcement learning algorithm given the states. Based on the state of the environment, the system generates congestion control parameters that optimize network performance metrics such as throughput and latency. As a case study, we examine BBR, an example of a prominent recently-developed congestion control protocol. Our experiments demonstrate that the proposed system has the potential to mitigate the problems of static parameters.

READ FULL TEXT
research
06/04/2022

MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Congestion Control (CC), as the core networking task to efficiently util...
research
08/30/2022

ZEUS: An Experimental Toolkit for Evaluating Congestion Control Algorithms in 5G Environments

As global cellular networks converge to 5G, one question lingers: Are we...
research
08/09/2023

GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters

Congestion Control (CC) plays a fundamental role in optimizing traffic i...
research
01/11/2022

Congestion Control Mechanisms for Inter-Datacenter Networks

Applications running in geographically distributed setting are becoming ...
research
06/27/2023

Learning to Sail Dynamic Networks: The MARLIN Reinforcement Learning Framework for Congestion Control in Tactical Environments

Conventional Congestion Control (CC) algorithms,such as TCP Cubic, strug...
research
07/05/2022

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

Cloud datacenters are exponentially growing both in numbers and size. Th...
research
02/09/2023

RayNet: A Simulation Platform for Developing Reinforcement Learning-Driven Network Protocols

Reinforcement Learning has gained significant momentum in the developmen...

Please sign up or login with your details

Forgot password? Click here to reset