On the connection between Bregman divergence and value in regularized Markov decision processes

10/21/2022
by   Brendan O'Donoghue, et al.
0

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

READ FULL TEXT

page 1

page 2

research
06/30/2019

Detecting Spiky Corruption in Markov Decision Processes

Current reinforcement learning methods fail if the reward function is im...
research
12/11/2021

Formalising the Foundations of Discrete Reinforcement Learning in Isabelle/HOL

We present a formalisation of finite Markov decision processes with rewa...
research
09/05/2019

√(n)-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

In this paper, we consider the problem of online learning of Markov deci...
research
12/10/2019

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning...
research
05/24/2021

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which learns the policy of interest by maximizing t...
research
01/31/2019

The Value Function Polytope in Reinforcement Learning

We establish geometric and topological properties of the space of value ...
research
06/24/2016

Is the Bellman residual a bad proxy?

This paper aims at theoretically and empirically comparing two standard ...

Please sign up or login with your details

Forgot password? Click here to reset