Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

11/16/2020
by   Akshay Mete, et al.
0

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptive Control (RBMLE) that was proposed in Kumar and Becker (1982) is an alternative approach to the Upper Confidence Bound Based (UCB) Approach (Lai and Robbins, 1985) for employing the principle now known as "optimism in the face of uncertainty" (Auer et al., 2002). It utilizes a modified maximum likelihood estimate, with a bias towards those Markov Decision Process (MDP) models that yield a higher average reward. However, its regret performance has never been analyzed earlier for reinforcement learning (RL (Sutton et al., 1998)) tasks that involve the optimal control of unknown MDPs. We show that it has a learning regret of O(log T ) where T is the time-horizon, similar to the state-of-art algorithms. It provides an alternative general purpose method for solving RL problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
research
09/23/2022

Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

Finding unified complexity measures and algorithms for sample-efficient ...
research
01/25/2022

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

We consider the problem of controlling a stochastic linear system with q...
research
03/08/2022

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...
research
07/02/2019

Bandit Learning Through Biased Maximum Likelihood Estimation

We propose BMLE, a new family of bandit algorithms, that are formulated ...
research
12/15/2010

Adaptive Parallel Tempering for Stochastic Maximum Likelihood Learning of RBMs

Restricted Boltzmann Machines (RBM) have attracted a lot of attention of...
research
10/08/2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally propose...

Please sign up or login with your details

Forgot password? Click here to reset