When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

05/23/2018
by   Yao Liu, et al.
0

Efficient exploration is one of the key challenges for reinforcement learning (RL) algorithms. Most traditional sample efficiency bounds require strategic exploration. Recently many deep RL algorithm with simple heuristic exploration strategies that have few formal guarantees, achieve surprising success in many domains. These results pose an important question about understanding these exploration strategies such as e-greedy, as well as understanding what characterize the difficulty of exploration in MDPs. In this work we propose problem specific sample complexity bounds of Q learning with random walk exploration that rely on several structural properties. We also link our theoretical results to some empirical benchmark domains, to illustrate if our bound gives polynomial sample complexity or not in these domains and how that is related with the empirical performance in these domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in re...
research
01/31/2012

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration

We present an implementation of model-based online reinforcement learnin...
research
05/25/2016

A PAC RL Algorithm for Episodic POMDPs

Many interesting real world domains involve reinforcement learning (RL) ...
research
01/20/2020

Reinforcement Learning with Probabilistically Complete Exploration

Balancing exploration and exploitation remains a key challenge in reinfo...
research
06/19/2022

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian...
research
05/29/2018

Depth and nonlinearity induce implicit exploration for RL

The question of how to explore, i.e., take actions with uncertain outcom...
research
09/29/2020

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often eit...

Please sign up or login with your details

Forgot password? Click here to reset