Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

06/06/2013
by   Bruno Scherrer, et al.
0

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a paramet erized policy space in order to maximize the associated value function averaged over some predefined distribution. It is probably commonly b elieved that the best one can hope in general from such an approach is to get a local optimum of this criterion. In this article, we show th e following surprising result: any (approximate) local optimum enjoys a global performance guarantee. We compare this g uarantee with the one that is satisfied by Direct Policy Iteration, an approximate dynamic programming algorithm that does some form of Poli cy Search: if the approximation error of Local Policy Search may generally be bigger (because local search requires to consider a space of s tochastic policies), we argue that the concentrability coefficient that appears in the performance bound is much nicer. Finally, we discuss several practical and theoretical consequences of our analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2014

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning...
research
06/03/2013

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem form...
research
01/18/2022

Programmatic Policy Extraction by Iterative Local Search

Reinforcement learning policies are often represented by neural networks...
research
05/12/2014

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem form...
research
12/12/2022

Variance-Reduced Conservative Policy Iteration

We study the sample complexity of reducing reinforcement learning to a s...
research
01/23/2013

My Brain is Full: When More Memory Helps

We consider the problem of finding good finite-horizon policies for POMD...

Please sign up or login with your details

Forgot password? Click here to reset