Blackwell Online Learning for Markov Decision Processes

12/28/2020
by   Tao Li, et al.
5

This work provides a novel interpretation of Markov Decision Processes (MDP) from the online optimization viewpoint. In such an online optimization context, the policy of the MDP is viewed as the decision variable while the corresponding value function is treated as payoff feedback from the environment. Based on this interpretation, we construct a Blackwell game induced by MDP, which bridges the gap among regret minimization, Blackwell approachability theory, and learning theory for MDP. Specifically, from the approachability theory, we propose 1) Blackwell value iteration for offline planning and 2) Blackwell Q-learning for online learning in MDP, both of which are shown to converge to the optimal solution. Our theoretical guarantees are corroborated by numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2019

Online Convex Optimization in Adversarial Markov Decision Processes

We consider online learning in episodic loop-free Markov decision proces...
research
05/27/2021

Exploitation vs Caution: Risk-sensitive Policies for Offline Learning

Offline model learning for planning is a branch of machine learning that...
research
04/07/2016

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Information-theoretic principles for learning and acting have been propo...
research
12/28/2021

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

We consider large-scale Markov decision processes with an unknown cost f...
research
04/26/2022

BATS: Best Action Trajectory Stitching

The problem of offline reinforcement learning focuses on learning a good...
research
06/14/2019

Online Allocation and Pricing: Constant Regret via Bellman Inequalities

We develop a framework for designing tractable heuristics for Markov Dec...
research
04/05/2016

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision p...

Please sign up or login with your details

Forgot password? Click here to reset