Log In Sign Up

High-Throughput Synchronous Deep RL

by   Iou-Jen Liu, et al.

Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6× faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.


page 1

page 2

page 3

page 4


IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

The practical usage of reinforcement learning agents is often bottleneck...

SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning

We introduce SLM Lab, a software framework for reproducible reinforcemen...

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

We propose XPipe, an efficient asynchronous pipeline model parallelism a...

Make TCP Great (again?!) in Cellular Networks: A Deep Reinforcement Learning Approach

Can we instead of designing just another new TCP, design a TCP plug-in w...

An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Deep reinforcement learning (DRL) algorithms and evolution strategies (E...

NAPPO: Modular and scalable reinforcement learning in pytorch

Reinforcement learning (RL) has been very successful in recent years but...