Deep Reinforcement Learning with Relative Entropy Stochastic Search

05/22/2017
by   Voot Tangkaratt, et al.
0

Many reinforcement learning methods for continuous control tasks are based on updating a policy function by maximizing an approximated action-value function or Q-function. However, the Q-function also depends on the policy and this dependency often leads to unstable policy learning. To overcome this issue, we propose a method that does not greedily exploit the Q-function. To do so, we upper-bound the Kullback-Leibler divergence of the new policy while maximizing the Q-function. Furthermore, we also lower-bound the entropy of the new policy to maintain its exploratory behavior. We show that by using a Gaussian policy and a Q-function that is quadratic in actions, we can solve the corresponding constrained optimization problem in a closed form. In addition, we show that our method can be regarded as a variant of the well-known deterministic policy gradient method. Through experiments, we evaluate the proposed method using a neural network as a function approximator and show that it gives more stable learning performance than the deep deterministic policy gradient method and the continuous Q-learning method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2019

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Maximum entropy deep reinforcement learning (RL) methods have been demon...
research
05/18/2020

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

Entropy augmented to reward is known to soften the greedy argmax policy ...
research
03/11/2021

Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with

This paper focuses on a class of reinforcement learning problems where s...
research
01/27/2021

OffCon^3: What is state of the art anyway?

Two popular approaches to model-free continuous control tasks are SAC an...
research
11/15/2018

Orthogonal Policy Gradient and Autonomous Driving Application

One less addressed issue of deep reinforcement learning is the lack of g...
research
09/26/2019

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Some of the most successful applications of deep reinforcement learning ...
research
09/23/2019

Constrained Attractor Selection Using Deep Reinforcement Learning

This paper describes an approach for attractor selection in nonlinear dy...

Please sign up or login with your details

Forgot password? Click here to reset