Log In Sign Up

Improper Learning with Gradient-based Policy Optimization

by   Mohammadi Zaki, et al.

We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. We propose a gradient-based approach that operates over a class of improper mixtures of the controllers. The value function of the mixture and its gradient may not be available in closed-form; however, we show that we can employ rollouts and simultaneous perturbation stochastic approximation (SPSA) for explicit gradient descent optimization. We derive convergence and convergence rate guarantees for the approach assuming access to a gradient oracle. Numerical results on a challenging constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when each constituent policy at its disposal is unstable.


page 1

page 2

page 3

page 4


Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner i...

Fast Global Convergence of Policy Optimization for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the pr...

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used t...

Near Optimal Policy Optimization via REPS

Since its introduction a decade ago, relative entropy policy search (REP...

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for pol...

Off-policy Learning with Eligibility Traces: A Survey

In the framework of Markov Decision Processes, off-policy learning, that...

Robust Detection of Adaptive Spammers by Nash Reinforcement Learning

Online reviews provide product evaluations for customers to make decisio...