Distributed Training for Deep Learning Models On An Edge Computing Network Using ShieldedReinforcement Learning

06/01/2022
by   Tanmoy Sen, et al.
0

Edge devices with local computation capability has made distributed deep learning training on edges possible. In such method, the cluster head of a cluster of edges schedules DL training jobs from the edges. Using such centralized scheduling method, the cluster head knows all loads of edges, which can avoid overloading the cluster edges, but the head itself may become overloaded. To handle this problem, we propose a multi-agent RL (MARL) system that enables each edge to schedule its jobs using RL. However, without coordination among edges, action collision may occur, in which multiple edges schedule tasks to the same edge and make it overloaded. For this reason, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, the shield deployed in an edge checks action collisions and provides alternative actions to avoid collisions. As the central shield for entire cluster may become a bottleneck, we further propose a decentralized shielding method, where different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our emulation and real device experiments show SROLE reduces training time by 59

READ FULL TEXT

page 1

page 8

page 9

research
12/11/2021

Efficient Device Scheduling with Multi-Job Federated Learning

Recent years have witnessed a large amount of decentralized data in mult...
research
10/22/2019

Deep Learning at the Edge

The ever-increasing number of Internet of Things (IoT) devices has creat...
research
07/22/2023

Online Container Scheduling for Low-Latency IoT Services in Edge Cluster Upgrade: A Reinforcement Learning Approach

In Mobile Edge Computing (MEC), Internet of Things (IoT) devices offload...
research
11/24/2022

Multi-Job Intelligent Scheduling with Cross-Device Federated Learning

Recent years have witnessed a large amount of decentralized data in vari...
research
04/12/2019

Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing

In this paper, we propose a Distributed Intelligent Video Surveillance (...
research
03/16/2021

Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Use of Deep Learning (DL) in commercial applications such as image class...
research
03/13/2022

Cluster Assignment in Multi-Agent Systems

We study cluster assignment in multi-agent networks. We consider homogen...

Please sign up or login with your details

Forgot password? Click here to reset