Provably Efficient Reinforcement Learning with Aggregated States

12/13/2019
by   Shi Dong, et al.
0

We establish that an optimistic variant of Q-learning applied to a finite-horizon episodic Markov decision process with an aggregated state representation incurs regret Õ(√(H^5 M K) + ϵ HK), where H is the horizon, M is the number of aggregate states, K is the number of episodes, and ϵ is the largest difference between any pair of optimal state-action values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions. To the best of our knowledge, this is the first such result pertaining to a reinforcement learning algorithm applied with nontrivial value function approximation without any restrictions on the Markov decision process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2012

Metrics for Finite Markov Decision Processes

We present metrics for measuring the similarity of states in a finite Ma...
research
02/21/2022

Double Thompson Sampling in Finite stochastic Games

We consider the trade-off problem between exploration and exploitation u...
research
03/01/2021

UCB Momentum Q-learning: Correcting the bias without forgetting

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
research
08/01/2018

Robbins-Mobro conditions for persistent exploration learning strategies

We formulate simple assumptions, implying the Robbins-Monro conditions f...
research
05/16/2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabul...
research
04/20/2020

Data-Driven Learning and Load Ensemble Control

Demand response (DR) programs aim to engage distributed small-scale flex...
research
02/08/2016

Graying the black box: Understanding DQNs

In recent years there is a growing interest in using deep representation...

Please sign up or login with your details

Forgot password? Click here to reset