-
Online Convex Optimization in Adversarial Markov Decision Processes
We consider online learning in episodic loop-free Markov decision proces...
read it
-
On Online Learning in Kernelized Markov Decision Processes
We develop algorithms with low regret for learning episodic Markov decis...
read it
-
On the Complexity of Value Iteration
Value iteration is a fundamental algorithm for solving Markov Decision P...
read it
-
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Information-theoretic principles for learning and acting have been propo...
read it
-
Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering
Partially Observable Markov Decision Processes (POMDPs) offer an elegant...
read it
-
Is the Bellman residual a bad proxy?
This paper aims at theoretically and empirically comparing two standard ...
read it
-
Algorithms for Batch Hierarchical Reinforcement Learning
Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction ...
read it
Blackwell Online Learning for Markov Decision Processes
This work provides a novel interpretation of Markov Decision Processes (MDP) from the online optimization viewpoint. In such an online optimization context, the policy of the MDP is viewed as the decision variable while the corresponding value function is treated as payoff feedback from the environment. Based on this interpretation, we construct a Blackwell game induced by MDP, which bridges the gap among regret minimization, Blackwell approachability theory, and learning theory for MDP. Specifically, from the approachability theory, we propose 1) Blackwell value iteration for offline planning and 2) Blackwell Q-learning for online learning in MDP, both of which are shown to converge to the optimal solution. Our theoretical guarantees are corroborated by numerical experiments.
READ FULL TEXT
Comments
There are no comments yet.