A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

by   Omar Darwiche Domingues, et al.

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.


page 1

page 2

page 3

page 4


Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision proce...

Regret Bounds for Kernel-Based Reinforcement Learning

We consider the exploration-exploitation dilemma in finite-horizon reinf...

Action Pick-up in Dynamic Action Space Reinforcement Learning

Most reinforcement learning algorithms are based on a key assumption tha...

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...

Online Reinforcement Learning for Periodic MDP

We study learning in periodic Markov Decision Process(MDP), a special ty...

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

We consider reinforcement learning in changing Markov Decision Processes...

Online Reinforcement Learning in Periodic MDP

We study learning in periodic Markov Decision Process (MDP), a special t...

Please sign up or login with your details

Forgot password? Click here to reset