Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

by   Andrea Zanette, et al.

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide strong problem-dependent bounds could help illuminate the key features of what makes a RL problem hard and reduce the barrier to using RL algorithms in practice. As a step towards this we derive an algorithm for finite horizon discrete MDPs and associated analysis that both yields state-of-the art worst-case regret bounds in the dominant terms and yields substantially tighter bounds if the RL environment has small environmental norm, which is a function of the variance of the next-state value functions. An important benefit of our algorithmic is that it does not require apriori knowledge of a bound on the environmental norm. As a result of our analysis, we also help address an open learning theory question jiang2018open about episodic MDPs with a constant upper-bound on the sum of rewards, providing a regret bound with no H-dependence in the leading term that scales a polynomial function of the number of episodes.



There are no comments yet.



Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

This paper studies regret minimization with randomized value functions i...

Can Q-Learning be Improved with Advice?

Despite rapid progress in theoretical reinforcement learning (RL) over t...

Regret Bounds for Discounted MDPs

Recently, it has been shown that carefully designed reinforcement learni...

An Analysis of Frame-skipping in Reinforcement Learning

In the practice of sequential decision making, agents are often designed...

Branching Reinforcement Learning

In this paper, we propose a novel Branching Reinforcement Learning (Bran...

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

This paper studies a recent proposal to use randomized value functions t...

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic vari...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.