Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

09/13/2019
by   Wesley Cowan, et al.
1

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of bkmdp97 with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...
research
12/31/2021

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized ...
research
07/21/2022

Strategising template-guided needle placement for MR-targeted prostate biopsy

Clinically significant prostate cancer has a better chance to be sampled...
research
05/30/2017

Universal Reinforcement Learning Algorithms: Survey and Experiments

Many state-of-the-art reinforcement learning (RL) algorithms typically a...
research
06/27/2021

A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Spatial Transformer Networks (STN) can generate geometric transformation...
research
07/12/2012

Learning Diagnostic Policies from Examples by Systematic Search

A diagnostic policy specifies what test to perform next, based on the re...
research
04/03/2023

Investigation of risk-aware MDP and POMDP contingency management autonomy for UAS

Unmanned aircraft systems (UAS) are being increasingly adopted for vario...

Please sign up or login with your details

Forgot password? Click here to reset