Autonomous exploration for navigating in non-stationary CMPs

10/18/2019
by   Pratik Gajane, et al.
10

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prove an upper bound on the exploration steps in terms of the number of changes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2023

Local Differential Privacy for Sequential Decision Making in a Changing Environment

We study the problem of preserving privacy while still providing high ut...
research
05/18/2023

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Non-stationary multi-armed bandit (NS-MAB) problems have recently receiv...
research
10/07/2020

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

We consider model-free reinforcement learning (RL) in non-stationary Mar...
research
04/01/2023

Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

We consider the problem of learning in a non-stationary reinforcement le...
research
09/15/2017

The Uncertainty Bellman Equation and Exploration

We consider the exploration/exploitation problem in reinforcement learni...
research
01/03/2022

Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

Many past attempts at modeling repeated Cournot games assume that demand...

Please sign up or login with your details

Forgot password? Click here to reset