Online Reinforcement Learning in Markov Decision Process Using Linear Programming

03/31/2023
by   Vincent Leon, et al.
0

We consider online reinforcement learning in episodic Markov decision process (MDP) with an unknown transition matrix and stochastic rewards drawn from a fixed but unknown distribution. The learner aims to learn the optimal policy and minimize their regret over a finite time horizon through interacting with the environment. We devise a simple and efficient model-based algorithm that achieves Õ(LX√(TA)) regret with high probability, where L is the episode length, T is the number of episodes, and X and A are the cardinalities of the state space and the action space, respectively. The proposed algorithm, which is based on the concept of "optimism in the face of uncertainty", maintains confidence sets of transition and reward functions and uses occupancy measures to connect the online MDP with linear programming. It achieves a tighter regret bound compared to the existing works that use a similar confidence sets framework and improves the computational effort compared to those that use a different framework but with a slightly tighter regret bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2023

Online Reinforcement Learning in Periodic MDP

We study learning in periodic Markov Decision Process (MDP), a special t...
research
06/20/2019

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...
research
12/27/2022

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning (RL) for episodic Markov dec...
research
10/09/2019

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Leveraging an equivalence property in the state-space of a Markov Decisi...
research
08/18/2011

Feature Reinforcement Learning In Practice

Following a recent surge in using history-based methods for resolving pe...
research
07/25/2022

Online Reinforcement Learning for Periodic MDP

We study learning in periodic Markov Decision Process(MDP), a special ty...
research
09/28/2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...

Please sign up or login with your details

Forgot password? Click here to reset