On Online Learning in Kernelized Markov Decision Processes

11/04/2019
by   Sayak Ray Chowdhury, et al.
0

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2015

A Notation for Markov Decision Processes

This paper specifies a notation for Markov decision processes....
research
05/21/2018

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic M...
research
03/12/2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

We study the problem of learning Markov decision processes with finite s...
research
11/05/2021

Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Most existing theoretical analysis of reinforcement learning (RL) is lim...
research
10/26/2020

Expert Selection in High-Dimensional Markov Decision Processes

In this work we present a multi-armed bandit framework for online expert...
research
04/19/2018

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

We consider Markov Decision Problems defined over continuous state and a...
research
05/04/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

In this paper, a rather general online problem called dynamic resource a...

Please sign up or login with your details

Forgot password? Click here to reset