The problem with DDPG: understanding failures in deterministic environments with sparse rewards

11/26/2019
by   Guillaume Matheron, et al.
0

In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms. The resulting perspective casts a new light on already existing solutions to the issues we have highlighted, and suggests other potential approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Exploration in multi-agent reinforcement learning is a challenging probl...
research
06/16/2021

Unbiased Methods for Multi-Goal Reinforcement Learning

In multi-goal reinforcement learning (RL) settings, the reward for each ...
research
10/04/2020

FORK: A Forward-Looking Actor For Model-Free Reinforcement Learning

In this paper, we propose a new type of Actor, named forward-looking Act...
research
01/18/2020

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Actor critic methods with sparse rewards in model-based deep reinforceme...
research
09/19/2018

Deterministic limit of temporal difference reinforcement learning for stochastic games

Reinforcement learning in multi-agent systems has been studied in the fi...
research
06/10/2019

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

In the context of learning deterministic policies in continuous domains,...
research
09/15/2019

Biased Estimates of Advantages over Path Ensembles

The estimation of advantage is crucial for a number of reinforcement lea...

Please sign up or login with your details

Forgot password? Click here to reset