Efficient Exploration via State Marginal Matching

06/12/2019
by   Lisa Lee, et al.
0

To solve tasks with sparse rewards, reinforcement learning algorithms must be equipped with suitable exploration techniques. However, it is unclear what underlying objective is being optimized by existing exploration algorithms, or how they can be altered to incorporate prior knowledge about the task. Most importantly, it is difficult to use exploration experience from one task to acquire exploration strategies for another task. We address these shortcomings by learning a single exploration policy that can quickly solve a suite of downstream tasks in a multi-task setting, amortizing the cost of learning to explore. We recast exploration as a problem of State Marginal Matching (SMM): we learn a mixture of policies for which the state marginal distribution matches a given target state distribution, which can incorporate prior knowledge about the task. Without any prior knowledge, the SMM objective reduces to maximizing the marginal state entropy. We optimize the objective by reducing it to a two-player, zero-sum game, where we iteratively fit a state density model and then update the policy to visit states with low density under this model. While many previous algorithms for exploration employ a similar procedure, they omit a crucial historical averaging step, without which the iterative procedure does not converge to a Nash equilibria. To parallelize exploration, we extend our algorithm to use mixtures of policies, wherein we discover connections between SMM and previously-proposed skill learning methods based on mutual information. On complex navigation and manipulation tasks, we demonstrate that our algorithm explores faster and adapts more quickly to new tasks.

READ FULL TEXT

page 7

page 8

research
02/20/2018

Meta-Reinforcement Learning of Structured Exploration Strategies

Exploration is a fundamental challenge in reinforcement learning (RL). M...
research
05/31/2022

k-Means Maximum Entropy Exploration

Exploration in high-dimensional, continuous spaces with sparse rewards i...
research
10/20/2021

Hierarchical Skills for Efficient Exploration

In reinforcement learning, pre-trained low-level skills have the potenti...
research
12/11/2019

Marginalized State Distribution Entropy Regularization in Policy Optimization

Entropy regularization is used to get improved optimization performance ...
research
11/11/2019

MAME : Model-Agnostic Meta-Exploration

Meta-Reinforcement learning approaches aim to develop learning procedure...
research
05/15/2020

Simple Sensor Intentions for Exploration

Modern reinforcement learning algorithms can learn solutions to increasi...
research
02/29/2020

Contextual Policy Reuse using Deep Mixture Models

Reinforcement learning methods that consider the context, or current sta...

Please sign up or login with your details

Forgot password? Click here to reset