Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

11/21/2022
by   Junjie Sheng, et al.
0

Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying the some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and propose an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization (20%∼ 86%) under different levels of safety constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios

Communication technologies enable coordination among connected and auton...
research
10/25/2019

MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding

Reinforcement learning is a promising approach to learning control polic...
research
10/26/2019

Convergent Policy Optimization for Safe Reinforcement Learning

We study the safe reinforcement learning problem with nonlinear function...
research
09/23/2020

ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources

Cloud data center capacities are over-provisioned to handle demand peaks...
research
11/10/2021

DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning

In recent years, multi-agent reinforcement learning (MARL) has presented...
research
05/13/2023

Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems

Asynchronous action coordination presents a pervasive challenge in Multi...
research
09/17/2022

A Robust and Constrained Multi-Agent Reinforcement Learning Framework for Electric Vehicle AMoD Systems

Electric vehicles (EVs) play critical roles in autonomous mobility-on-de...

Please sign up or login with your details

Forgot password? Click here to reset