One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

05/31/2022
by   Pedro Cisneros-Velarde, et al.
0

While parallelism has been extensively used in Reinforcement Learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL for linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature focused on approaches that encourage agents to explore over a diverse set of policies, we show that using a single policy to guide exploration across all agents is sufficient to obtain an almost-linear speedup in all cases compared to their fully sequential counterpart. Further, we show that this simple procedure is minimax optimal up to logarithmic factors in the reward-free setting for both linear MDPs and two-player zero-sum MGs. From a practical perspective, our paper shows that a single policy is sufficient and provably optimal for incorporating parallelism during the exploration phase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

We develop several provably efficient model-free reinforcement learning ...
research
10/19/2021

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

To achieve sample efficiency in reinforcement learning (RL), it necessit...
research
03/17/2023

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Many model-based reinforcement learning (RL) algorithms can be viewed as...
research
06/23/2023

Active Coverage for PAC Reinforcement Learning

Collecting and leveraging data with good coverage properties plays a cru...
research
06/20/2023

Reward Shaping via Diffusion Process in Reinforcement Learning

Reinforcement Learning (RL) models have continually evolved to navigate ...
research
06/16/2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation

We consider the problem of two-player zero-sum game. In this setting, th...
research
05/30/2023

Solving Robust MDPs through No-Regret Dynamics

Reinforcement Learning is a powerful framework for training agents to na...

Please sign up or login with your details

Forgot password? Click here to reset