Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

05/31/2023
by   Dongyoung Kim, et al.
0

A promising technique for exploration is to maximize the entropy of visited state distribution, i.e., state entropy, by encouraging uniform coverage of visited state space. While it has been effective for an unsupervised setup, it tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states to exploit the task reward. Such a preference can cause an imbalance between the distributions of high-value states and low-value states, which biases exploration towards low-value state regions as a result of the state entropy increasing when the distribution becomes more uniform. This issue is exacerbated when high-value states are narrowly distributed within the state space, making it difficult for the agent to complete the tasks. In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which separately estimates the state entropies that are conditioned on the value estimates of each state, then maximizes their average. By only considering the visited states with similar value estimates for computing the intrinsic bonus, our method prevents the distribution of low-value states from affecting exploration around high-value states, and vice versa. We demonstrate that the proposed alternative to the state entropy baseline significantly accelerates various reinforcement learning algorithms across a variety of tasks within MiniGrid, DeepMind Control Suite, and Meta-World benchmarks. Source code is available at https://sites.google.com/view/rl-vcse.

READ FULL TEXT

page 9

page 15

research
12/11/2019

Marginalized State Distribution Entropy Regularization in Policy Optimization

Entropy regularization is used to get improved optimization performance ...
research
02/18/2021

State Entropy Maximization with Random Encoders for Efficient Exploration

Recent exploration methods have proven to be a recipe for improving samp...
research
10/05/2017

Exploration in Feature Space for Reinforcement Learning

The infamous exploration-exploitation dilemma is one of the oldest and m...
research
01/30/2019

InfoBot: Transfer and Exploration via the Information Bottleneck

A central challenge in reinforcement learning is discovering effective p...
research
07/15/2021

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Exploration in reinforcement learning is a challenging problem: in the w...
research
08/24/2021

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

Encouraging exploration is a critical issue in deep reinforcement learni...
research
08/07/2023

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Offline reinforcement learning (RL) methods strike a balance between exp...

Please sign up or login with your details

Forgot password? Click here to reset