Exploration in Structured Reinforcement Learning

06/03/2018
by   Jungseul Ok, et al.
0

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructured MDPs and for those whose transition probabilities and average reward function are Lipschitz continuous w.r.t. the state and action. For Lipschitz MDPs, the bounds are shown not to scale with the sizes S and A of the state and action spaces, i.e., they are smaller than c T where T is the time horizon and the constant c only depends on the Lipschitz structure, the span of the bias function, and the minimal action sub-optimality gap. This contrasts with unstructured MDPs where the regret lower bound typically scales as SA T . We devise DEL (Directed Exploration Learning), an algorithm that matches our regret lower bounds. We further simplify the algorithm for Lipschitz MDPs, and show that the simplified version is still able to efficiently exploit the structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision p...
research
04/12/2020

Regret Bounds for Kernel-Based Reinforcement Learning

We consider the exploration-exploitation dilemma in finite-horizon reinf...
research
06/24/2021

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

We derive a novel asymptotic problem-dependent lower-bound for regret mi...
research
10/07/2020

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

In this paper, we propose new problem-independent lower bounds on the sa...
research
10/24/2022

Opportunistic Episodic Reinforcement Learning

In this paper, we propose and study opportunistic reinforcement learning...
research
10/25/2021

Can Q-Learning be Improved with Advice?

Despite rapid progress in theoretical reinforcement learning (RL) over t...
research
09/07/2022

On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs

In reinforcement learning, Monte Carlo algorithms update the Q function ...

Please sign up or login with your details

Forgot password? Click here to reset