DeepAI
Log In Sign Up

Near Optimal Policy Optimization via REPS

03/17/2021
by   Aldo Pacchiano, et al.
0

Since its introduction a decade ago, relative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms. While REPS is commonly known in the community, there exist no guarantees on its performance when using stochastic and gradient-based solvers. In this paper we aim to fill this gap by providing guarantees and convergence rates for the sub-optimality of a policy learned using first-order optimization methods applied to the REPS objective. We first consider the setting in which we are given access to exact gradients and demonstrate how near-optimality of the objective translates to near-optimality of the policy. We then consider the practical setting of stochastic gradients, and introduce a technique that uses generative access to the underlying Markov decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/29/2021

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, a...
11/19/2020

Provable Multi-Objective Reinforcement Learning with Generative Models

Multi-objective reinforcement learning (MORL) is an extension of ordinar...
11/28/2018

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic...
10/18/2021

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Improving the resilience of a network protects the system from natural d...
02/16/2021

Improper Learning with Gradient-based Policy Optimization

We consider an improper reinforcement learning setting where the learner...
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...