Regret Analysis in Deterministic Reinforcement Learning

06/27/2021
by   Damianos Tranos, et al.
5

We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter (in contrast to previous minimax approaches) and thus, truly quantify the fundamental limit of performance achievable by any learning algorithm. Deterministic MDPs can be interpreted as graphs and analyzed in terms of their cycles, a fact which we leverage in order to identify a class of deterministic MDPs whose regret lower bound can be determined numerically. We further exemplify this result on a deterministic line search problem, and a deterministic MDP with state-dependent rewards, whose regret lower bounds we can state explicitly. These bounds share similarities with the known problem-specific bound of the multi-armed bandit problem and suggest that navigation on a deterministic MDP need not have an effect on the performance of a learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2021

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

We derive a novel asymptotic problem-dependent lower-bound for regret mi...
research
11/03/2021

Online Learning in Adversarial MDPs: Is the Communicating Case Harder than Ergodic?

We study online learning in adversarial communicating Markov Decision Pr...
research
01/31/2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

We study variance-dependent regret bounds for Markov decision processes ...
research
05/10/2019

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

We consider a stochastic inventory control problem under censored demand...
research
11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...
research
10/05/2022

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning

We study the problem of episodic reinforcement learning in continuous st...
research
04/19/2018

Algorithms and Conditional Lower Bounds for Planning Problems

We consider planning problems for graphs, Markov decision processes (MDP...

Please sign up or login with your details

Forgot password? Click here to reset