The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

12/08/2020
by   Ahmet Inci, et al.
26

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems. However, improving the performance scalability and power efficiency of RL training through understanding the architectural implications of CPU-GPU systems remains an open problem. In this work we investigate and improve the performance and power efficiency of distributed RL training on CPU-GPU systems by approaching the problem not solely from the GPU microarchitecture perspective but following a holistic system-level analysis approach. We quantify the overall hardware utilization on a state-of-the-art distributed RL training framework and empirically identify the bottlenecks caused by GPU microarchitectural, algorithmic, and system-level design choices. We show that the GPU microarchitecture itself is well-balanced for state-of-the-art RL frameworks, but further investigation reveals that the number of actors running the environment interactions and the amount of hardware resources available to them are the primary performance and power efficiency limiters. To this end, we introduce a new system design metric, CPU/GPU ratio, and show how to find the optimal balance between CPU and GPU resources when designing scalable and efficient CPU-GPU systems for RL training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2018

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibit...
research
08/13/2023

InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

Deep learning-based recommender models (DLRMs) have become an essential ...
research
11/26/2019

Summarizing CPU and GPU Design Trends with Product Data

Moore's Law and Dennard Scaling have guided the semiconductor industry f...
research
03/14/2019

More Bang for Your Buck: Improved use of GPU Nodes for GROMACS 2018

We identify hardware that is optimal to produce molecular dynamics traje...
research
02/08/2021

RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads

Deep reinforcement learning (RL) has made groundbreaking advancements in...
research
09/27/2019

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

We present an overview of SURREAL-System, a reproducible, flexible, and ...
research
11/01/2021

Human-Level Control without Server-Grade Hardware

Deep Q-Network (DQN) marked a major milestone for reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset