Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

09/28/2019
by   Wesley Cowan, et al.
17

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method based on Posterior sampling (MDP-PS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

In this paper we consider the basic version of Reinforcement Learning (R...
research
07/18/2021

A note on the article "On Exploiting Spectral Properties for Solving MDP with Large State Space"

We improve a theoretical result of the article "On Exploiting Spectral P...
research
03/31/2023

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision pr...
research
01/02/2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Many real-world applications, such as those in medical domains, recommen...
research
03/16/2023

Online Reinforcement Learning in Periodic MDP

We study learning in periodic Markov Decision Process (MDP), a special t...
research
06/20/2019

Near-optimal Reinforcement Learning using Bayesian Quantiles

We study model-based reinforcement learning in finite communicating Mark...
research
02/16/2019

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

Spatial puzzles composed of rigid objects, flexible strings and holes of...

Please sign up or login with your details

Forgot password? Click here to reset