Exploring Unknown States with Action Balance

03/10/2020
by   Yan Song, et al.
0

Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma's Revenge, which assign additional bonus (e.g., intrinsic reward) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods the next-state bonus methods. However, the next-state bonus methods bring extra issues. It may lead agent to be trapped in states that fewer being visited and ignore to explore unknown states. Moreover, the behavior policy of the agent is also influenced by the bonus added to the state (or state-action) values indirectly. In contrast to the bonus-based methods which explore in known states, in this paper, we focus on the other part of exploration: exploration for finding unknown states. We propose the action balance exploration method to overcome the defects of the next-state bonus methods, which balances the chosen time of each action in each state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning. To take both the advantages of the next-state bonus method and our action balance exploration method, we propose the action balance RND method, which takes both parts of exploration into consideration. The experiments on grid world and Atari games demonstrate action balance exploration has a better capability in finding unknown states and can improve the real performance of RND in some hard exploration environments respectively.

READ FULL TEXT
research
03/06/2019

Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation

An important facet of reinforcement learning (RL) has to do with how the...
research
03/23/2022

Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees

Modern day computer games have extremely large state and action spaces. ...
research
06/19/2019

QXplore: Q-learning Exploration by Maximizing Temporal Difference Error

A major challenge in reinforcement learning for continuous state-action ...
research
10/28/2019

Learning Transferable Graph Exploration

This paper considers the problem of efficient exploration of unseen envi...
research
07/27/2020

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. ...
research
02/13/2023

Improving robot navigation in crowded environments using intrinsic rewards

Autonomous navigation in crowded environments is an open problem with ma...
research
09/28/2015

Efficient Empowerment

Empowerment quantifies the influence an agent has on its environment. Th...

Please sign up or login with your details

Forgot password? Click here to reset