DeepAI AI Chat
Log In Sign Up

Distributed Distributional Deterministic Policy Gradients

by   Gabriel Barth-Maron, et al.

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of N-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.


Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement le...

Exploration by Distributional Reinforcement Learning

We propose a framework based on distributional reinforcement learning an...

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Learning in high dimensional continuous tasks is challenging, mainly whe...

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

The experience replay mechanism allows agents to use the experiences mul...

Conjugated Discrete Distributions for Distributional Reinforcement Learning

In this work we continue to build upon recent advances in reinforcement ...

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Actor-critic algorithms that make use of distributional policy evaluatio...