GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

06/16/2022
by   Yuke Wang, et al.
0

With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and Horovod (up to 2.34X) support in training throughput on the latest DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.

READ FULL TEXT

page 2

page 10

research
06/24/2019

Modern Deep Reinforcement Learning Algorithms

Recent advances in Reinforcement Learning, grounded on combining classic...
research
09/13/2018

Deep Reinforcement Learning for Event-Triggered Control

Event-triggered control (ETC) methods can achieve high-performance contr...
research
01/17/2018

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Modern communication networks have become very complicated and highly dy...
research
05/30/2022

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

Training deep reinforcement learning (DRL) models usually requires high ...
research
04/02/2019

Learning a Partitioning Advisor with Deep Reinforcement Learning

Commercial data analytics products such as Microsoft Azure SQL Data Ware...
research
04/09/2022

MR-iNet Gym: Framework for Edge Deployment of Deep Reinforcement Learning on Embedded Software Defined Radio

Dynamic resource allocation plays a critical role in the next generation...
research
12/13/2017

Multi-focus Attention Network for Efficient Deep Reinforcement Learning

Deep reinforcement learning (DRL) has shown incredible performance in le...

Please sign up or login with your details

Forgot password? Click here to reset