Sparse Q-learning with Mirror Descent

10/16/2012
by   Sridhar Mahadevan, et al.
0

This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in highdimensional spaces. Unlike traditional gradient methods, mirror descent undertakes gradient updates of weights in both the dual space and primal space, which are linked together using a Legendre transform. Mirror descent can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. A new class of proximal-gradient based temporal-difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. Examples of Bregman divergences that are studied include p-norm functions, and Mahalanobis distance based on the covariance of sample gradients. A new family of sparse mirror-descent reinforcement learning methods are proposed, which are able to find sparse fixed points of an l1-regularized Bellman equation at significantly less computational cost than previous methods based on second-order matrix methods. An experimental study of mirror-descent reinforcement learning is presented using discrete and continuous Markov decision processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Entropy regularized Markov decision processes have been widely used in r...
research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
01/31/2019

A Theory of Regularized Markov Decision Processes

Many recent successful (deep) reinforcement learning algorithms make use...
research
08/04/2023

Adaptive Proximal Gradient Method for Convex Optimization

In this paper, we explore two fundamental first-order algorithms in conv...
research
07/10/2023

Dynamics of Temporal Difference Reinforcement Learning

Reinforcement learning has been successful across several applications i...
research
10/13/2022

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based poli...
research
05/20/2020

Infinite-dimensional gradient-based descent for alpha-divergence minimisation

This paper introduces the (α, Γ)-descent, an iterative algorithm which o...

Please sign up or login with your details

Forgot password? Click here to reset