Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

06/10/2019
by   Matthieu Zimmer, et al.
0

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.

READ FULL TEXT
research
02/19/2021

Decentralized Deterministic Multi-Agent Reinforcement Learning

[Zhang, ICML 2018] provided the first decentralized actor-critic algorit...
research
08/17/2017

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

In this work, we propose to apply trust region optimization to deep rein...
research
01/15/2022

Recursive Least Squares Advantage Actor-Critic Algorithms

As an important algorithm in deep reinforcement learning, advantage acto...
research
10/09/2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients

Policy gradient algorithms have proven to be successful in diverse decis...
research
05/17/2021

Controlling an Inverted Pendulum with Policy Gradient Methods-A Tutorial

This paper provides the details of implementing two important policy gra...
research
05/18/2018

Learning Permutations with Sinkhorn Policy Gradient

Many problems at the intersection of combinatorics and computer science ...
research
11/26/2019

The problem with DDPG: understanding failures in deterministic environments with sparse rewards

In environments with continuous state and action spaces, state-of-the-ar...

Please sign up or login with your details

Forgot password? Click here to reset