Mean Actor Critic

09/01/2017
by   Kavosh Asadi, et al.
0

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the need for a variance reduction baseline. We show empirical results on two control domains where MAC performs as well as or better than other policy gradient approaches, and on five Atari games, where MAC is competitive with state-of-the-art policy search algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with ...
research
10/30/2017

Sample-efficient Policy Optimization with Stein Control Variate

Policy gradient methods have achieved remarkable successes in solving ch...
research
11/05/2016

Combining policy gradient and Q-learning

Policy gradient is an efficient technique for improving a policy in a re...
research
05/23/2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control

We identify a fundamental problem in policy gradient-based methods in co...
research
06/14/2022

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal...
research
05/18/2018

Learning Permutations with Sinkhorn Policy Gradient

Many problems at the intersection of combinatorics and computer science ...
research
02/10/2020

Discrete Action On-Policy Learning with Action-Value Critic

Reinforcement learning (RL) in discrete action space is ubiquitous in re...

Please sign up or login with your details

Forgot password? Click here to reset