Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

05/18/2023
by   Qinghua Liu, et al.
0

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited – they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary. This paper proposes a simple efficient policy optimization framework – Optimistic NPG for online RL. Optimistic NPG can be viewed as simply combining of the classic natural policy gradient (NPG) algorithm [Kakade, 2001] with optimistic policy evaluation subroutines to encourage exploration. For d-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an ε-optimal policy within Õ(d^2/ε^3) samples, which is the first computationally efficient algorithm whose sample complexity has the optimal dimension dependence Θ̃(d^2). It also improves over state-of-the-art results of policy optimization algorithms [Zanette et al., 2021] by a factor of d. For general function approximation that subsumes linear MDPs, Optimistic NPG, to our best knowledge, is also the first policy optimization algorithm that achieves the polynomial sample complexity for learning near-optimal policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2019

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning ...
research
06/15/2023

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Policy optimization methods are powerful algorithms in Reinforcement Lea...
research
10/16/2021

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Q-learning is a popular Reinforcement Learning (RL) algorithm which is w...
research
05/31/2023

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic pr...
research
05/23/2018

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

Efficient exploration is one of the key challenges for reinforcement lea...
research
05/05/2019

P3O: Policy-on Policy-off Policy Optimization

On-policy reinforcement learning (RL) algorithms have high sample comple...
research
02/01/2021

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

Finding the minimal structural assumptions that empower sample-efficient...

Please sign up or login with your details

Forgot password? Click here to reset