Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

06/23/2020
by   Dongruo Zhou, et al.
16

Modern tasks in reinforcement learning are always with large state and action spaces. To deal with them efficiently, one often uses predefined feature mapping to represents states and actions in a low dimensional space. In this paper, we study reinforcement learning with feature mapping for discounted Markov Decision Processes (MDPs). We propose a novel algorithm which makes use of the feature mapping and obtains a Õ(d√(T)/(1-γ)^2) regret, where d is the dimension of the feature space, T is the time horizon and γ is the discount factor of the MDP. To the best of our knowledge, this is the first polynomial regret bound without accessing to a generative model or making strong assumptions such as ergodicity of the MDP. By constructing a special class of MDPs, we also show that for any algorithms, the regret is lower bounded by Ω(d√(T)/(1-γ)^1.5). Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a (1-γ)^-0.5 factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2014

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision...
research
02/12/2020

Regret Bounds for Discounted MDPs

Recently, it has been shown that carefully designed reinforcement learni...
research
11/11/2016

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Designing effective exploration-exploitation algorithms in Markov decisi...
research
02/21/2023

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

In this paper, we revisit the regret of undiscounted reinforcement learn...
research
02/06/2020

Near-optimal Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms for the Non-episodic Setting

We study reinforcement learning in factored Markov decision processes (F...
research
03/25/2017

Exploration--Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended ac...
research
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...

Please sign up or login with your details

Forgot password? Click here to reset