Discrete Action On-Policy Learning with Action-Value Critic

02/10/2020
by   Yuguang Yue, et al.
14

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenge to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.

READ FULL TEXT
research
09/01/2017

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action...
research
03/08/2021

A Crash Course on Reinforcement Learning

The emerging field of Reinforcement Learning (RL) has led to impressive ...
research
06/15/2017

Expected Policy Gradients

We propose expected policy gradients (EPG), which unify stochastic polic...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...
research
09/13/2023

Investigating the Impact of Action Representations in Policy Gradient Algorithms

Reinforcement learning (RL) is a versatile framework for learning to sol...
research
01/21/2023

Quasi-optimal Learning with Continuous Treatments

Many real-world applications of reinforcement learning (RL) require maki...
research
11/14/2018

Large-scale Interactive Recommendation with Tree-structured Policy Gradient

Reinforcement learning (RL) has recently been introduced to interactive ...

Please sign up or login with your details

Forgot password? Click here to reset