Nara: Learning Network-Aware Resource Allocation Algorithms for Cloud Data Centres

06/04/2021
by   Zacharaya Shabka, et al.
0

Data centres (DCs) underline many prominent future technological trends such as distributed training of large scale machine learning models and internet-of-things based platforms. DCs will soon account for over 3% of global energy demand, so efficient use of DC resources is essential. Robust DC networks (DCNs) are essential to form the large scale systems needed to handle this demand, but can bottleneck how efficiently DC-server resources can be used when servers with insufficient connectivity between them cannot be jointly allocated to a job. However, allocating servers' resources whilst accounting for their inter-connectivity maps to an NP-hard combinatorial optimisation problem, and so is often ignored in DC resource management schemes. We present Nara, a framework based on reinforcement learning (RL) and graph neural networks (GNN) to learn network-aware allocation policies that increase the number of requests allocated over time compared to previous methods. Unique to our solution is the use of a GNN to generate representations of server-nodes in the DCN, which are then interpreted as actions by a RL policy-network which chooses from which servers resources will be allocated to incoming requests. Nara is agnostic to the topology size and shape and is trained end-to-end. The method can accept up to 33% more requests than the best baseline when deployed on DCNs with up to the order of 10× more compute nodes than the DCN seen during training and is able to maintain its policy's performance on DCNs with the order of 100× more servers than seen during training. It also generalises to unseen DCN topologies with varied network structure and unseen request distributions without re-training.

READ FULL TEXT

page 7

page 17

research
11/17/2020

Reinforcement Learning of Graph Neural Networks for Service Function Chaining

In the management of computer network systems, the service function chai...
research
06/02/2021

Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning

We propose a framework to learn to schedule a job-shop problem (JSSP) us...
research
10/04/2021

Reinforcement Learning for Admission Control in Wireless Virtual Network Embedding

Using Service Function Chaining (SFC) in wireless networks became popula...
research
06/15/2023

Scalable Resource Management for Dynamic MEC: An Unsupervised Link-Output Graph Neural Network Approach

Deep learning has been successfully adopted in mobile edge computing (ME...
research
02/20/2019

Competitive Concurrent Distributed Scheduling

We introduce a new scheduling problem in distributed computing that we c...
research
05/24/2023

Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

In this paper, we study an optimal online resource reservation problem i...

Please sign up or login with your details

Forgot password? Click here to reset