Reward-Free Exploration for Reinforcement Learning

02/07/2020
by   Chi Jin, et al.
0

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent first collects trajectories from an MDP M without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for M for a collection of given reward functions. This framework is particularly suitable when there are many reward functions of interest, or when the reward function is shaped by an external agent to elicit desired behavior. We give an efficient algorithm that conducts Õ(S^2Apoly(H)/ϵ^2) episodes of exploration and returns ϵ-suboptimal policies for an arbitrary number of reward functions. We achieve this by finding exploratory policies that visit each "significant" state with probability proportional to its maximum visitation probability under any possible policy. Moreover, our planning procedure can be instantiated by any black-box approximate planner, such as value iteration or natural policy gradient. We also give a nearly-matching Ω(S^2AH^2/ϵ^2) lower bound, demonstrating the near-optimality of our algorithm in this setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Task-agnostic Exploration in Reinforcement Learning

Efficient exploration is one of the main challenges in reinforcement lea...
research
06/11/2020

Exploration by Maximizing Rényi Entropy for Zero-Shot Meta RL

Exploring the transition dynamics is essential to the success of reinfor...
research
10/19/2021

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

To achieve sample efficiency in reinforcement learning (RL), it necessit...
research
10/12/2020

Nearly Minimax Optimal Reward-free Reinforcement Learning

We study the reward-free reinforcement learning framework, which is part...
research
03/14/2021

Learning One Representation to Optimize All Rewards

We introduce the forward-backward (FB) representation of the dynamics of...
research
06/28/2022

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

While the primary goal of the exploration phase in reward-free reinforce...
research
02/10/2021

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

We design a simple reinforcement learning agent that, with a specificati...

Please sign up or login with your details

Forgot password? Click here to reset