Ranking Policy Gradient

06/24/2019
by   Kaixiang Lin, et al.
0

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art uses value function to derive policy while it usually requires an extensive search over the state-action space, which is one reason for the inefficiency. Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal ranking of a set of discrete actions. To accelerate the learning of policy gradient methods, we describe a novel off-policy learning framework and establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles. These results lead to a general sample-efficient off-policy learning framework, which accelerates learning and reduces variance. Furthermore, the sample complexity of RPG does not depend on the dimension of state space, which enables RPG for large-scale problems. We conduct extensive experiments showing that when consolidating with the off-policy learning framework, RPG substantially reduces the sample complexity, comparing to the state-of-the-art.

READ FULL TEXT
research
08/17/2020

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

We study the optimal sample complexity in large-scale Reinforcement Lear...
research
05/18/2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in re...
research
02/23/2021

Mixed Policy Gradient

Reinforcement learning (RL) has great potential in sequential decision-m...
research
06/02/2023

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

We consider the reinforcement learning (RL) problem with general utiliti...
research
05/05/2019

P3O: Policy-on Policy-off Policy Optimization

On-policy reinforcement learning (RL) algorithms have high sample comple...
research
03/02/2018

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application

In e-commerce platforms such as Amazon and TaoBao, ranking items in a se...
research
07/10/2018

Generalized deterministic policy gradient algorithms

We study a setting of reinforcement learning, where the state transition...

Please sign up or login with your details

Forgot password? Click here to reset