Variance-Reduced Conservative Policy Iteration

12/12/2022
by   Naman Agarwal, et al.
0

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a ε-functional local optimum from O(ε^-4) to O(ε^-3). Under state-coverage and policy-completeness assumptions, the algorithm enjoys ε-global optimality after sampling O(ε^-2) times, improving upon the previously established O(ε^-3) sample requirement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Improving the sample efficiency in reinforcement learning has been a lon...
research
09/16/2022

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Provably efficient Model-Based Reinforcement Learning (MBRL) based on op...
research
05/14/2008

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Several approximate policy iteration schemes without value functions, wh...
research
05/25/2023

Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

This paper considers a class of reinforcement learning problems, which i...
research
11/14/2022

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Deep Q-learning based algorithms have been applied successfully in many ...
research
06/06/2013

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...
research
06/15/2023

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Policy optimization methods are powerful algorithms in Reinforcement Lea...

Please sign up or login with your details

Forgot password? Click here to reset