A Boosting Approach to Reinforcement Learning

by   Nataly Brukhim, et al.

We study efficient algorithms for reinforcement learning in Markov decision processes whose complexity is independent of the number of states. This formulation succinctly captures large scale problems, but is also known to be computationally hard in its general form. Previous approaches attempt to circumvent the computational hardness by assuming structure in either transition function or the value function, or by relaxing the solution guarantee to a local optimality condition. We consider the methodology of boosting, borrowed from supervised learning, for converting weak learners into an accurate policy. The notion of weak learning we study is that of sampled-based approximate optimization of linear functions over policies. Under this assumption of weak learnability, we give an efficient algorithm that is capable of improving the accuracy of such weak learning methods, till global optimality is reached. We prove sample complexity and running time bounds on our method, that are polynomial in the natural parameters of the problem: approximation guarantee, discount factor, distribution mismatch and number of actions. In particular, our bound does not depend on the number of states. A technical difficulty in applying previous boosting results, is that the value function over policy space is not convex. We show how to use a non-convex variant of the Frank-Wolfe method, coupled with recent advances in gradient boosting that allow incorporating a weak learner with multiplicative approximation guarantee, to overcome the non-convexity and attain global convergence.


page 1

page 2

page 3

page 4


Efficient Planning in Large MDPs with Weak Linear Function Approximation

Large-scale Markov decision processes (MDPs) require planning algorithms...

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Policy gradient methods are among the most effective methods in challeng...

Optimizing over a Restricted Policy Class in Markov Decision Processes

We address the problem of finding an optimal policy in a Markov decision...

Multiclass Boosting: Simple and Intuitive Weak Learning Criteria

We study a generalization of boosting to the multiclass setting. We intr...

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for p...

Unsupervised Basis Function Adaptation for Reinforcement Learning

When using reinforcement learning (RL) algorithms to evaluate a policy i...

Please sign up or login with your details

Forgot password? Click here to reset