Active Model Estimation in Markov Decision Processes

03/06/2020
by   Jean Tarbouriech, et al.
25

We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first algorithm to learn an ϵ-accurate estimate of the dynamics, and provide its sample complexity analysis. While this algorithm enjoys strong guarantees in the large-sample regime, it tends to have a poor performance in early stages of exploration. To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. The main idea here is cover the entire state-action space with the weight proportional to the noise in the transitions. Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algo...
research
12/06/2018

Provably Efficient Maximum Entropy Exploration

Suppose an agent is in a (possibly unknown) Markov decision process (MDP...
research
02/28/2019

Active Exploration in Markov Decision Processes

We introduce the active exploration problem in Markov decision processes...
research
03/14/2023

Fast Rates for Maximum Entropy Exploration

We consider the reinforcement learning (RL) setting, in which the agent ...
research
05/25/2018

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

We consider reinforcement learning in changing Markov Decision Processes...
research
08/19/2021

Smoother Entropy for Active State Trajectory Estimation and Obfuscation in POMDPs

We study the problem of controlling a partially observed Markov decision...
research
06/10/2009

On Maximum a Posteriori Estimation of Hidden Markov Processes

We present a theoretical analysis of Maximum a Posteriori (MAP) sequence...

Please sign up or login with your details

Forgot password? Click here to reset