Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

06/19/2022
by   Christoph Dann, et al.
0

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by alpha, that captures a structural property of the MDP, the exploration policy and the given value function class. We show that the sample-complexity of myopic exploration scales quadratically with the inverse of this quantity, 1 / alpha^2. We further demonstrate through concrete examples that myopic exploration gap is indeed favorable in several tasks where myopic exploration succeeds, due to the corresponding dynamics and reward structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2019

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems wit...
research
07/02/2014

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning...
research
05/23/2018

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

Efficient exploration is one of the key challenges for reinforcement lea...
research
06/13/2021

Bellman-consistent Pessimism for Offline Reinforcement Learning

The use of pessimism, when reasoning about datasets lacking exhaustive e...
research
10/29/2021

Adaptive Discretization in Online Reinforcement Learning

Discretization based approaches to solving online reinforcement learning...
research
06/29/2022

Active Exploration via Experiment Design in Markov Chains

A key challenge in science and engineering is to design experiments to l...
research
08/29/2018

Approximate Exploration through State Abstraction

Although exploration in reinforcement learning is well understood from a...

Please sign up or login with your details

Forgot password? Click here to reset