Near Optimal Policy Optimization via REPS

03/17/2021
by   Aldo Pacchiano, et al.
0

Since its introduction a decade ago, relative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms. While REPS is commonly known in the community, there exist no guarantees on its performance when using stochastic and gradient-based solvers. In this paper we aim to fill this gap by providing guarantees and convergence rates for the sub-optimality of a policy learned using first-order optimization methods applied to the REPS objective. We first consider the setting in which we are given access to exact gradients and demonstrate how near-optimality of the objective translates to near-optimality of the policy. We then consider the practical setting of stochastic gradients, and introduce a technique that uses generative access to the underlying Markov decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2021

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, a...
research
11/19/2020

Provable Multi-Objective Reinforcement Learning with Generative Models

Multi-objective reinforcement learning (MORL) is an extension of ordinar...
research
07/13/2020

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used pol...
research
06/30/2020

Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

A microgrid is an innovative system that integrates distributed energy r...
research
10/18/2021

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Improving the resilience of a network protects the system from natural d...
research
02/16/2021

Improper Learning with Gradient-based Policy Optimization

We consider an improper reinforcement learning setting where the learner...
research
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...

Please sign up or login with your details

Forgot password? Click here to reset