DQN with model-based exploration: efficient learning on environments with sparse rewards

03/22/2019
by   Stephen Zhen Gou, et al.
0

We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a general-purpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it's first proposed by Minh et el. However, like many other reinforcement learning (RL) algorithms, DQN suffers from poor sample efficiency when rewards are sparse in an environment. As a result, most of the transitions stored in the replay memory have no informative reward signal, and provide limited value to the convergence and training of the Q-Network. However, one insight is that these transitions can be used to learn the dynamics of the environment as a supervised learning problem. The transitions also provide information of the distribution of visited states. Our algorithm utilizes these two observations to perform a one-step planning during exploration to pick an action that leads to states least likely to be seen, thus improving the performance of exploration. We demonstrate our agent's performance in two classic environments with sparse rewards in OpenAI gym: Mountain Car and Lunar Lander.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

Learning with sparse rewards remains a significant challenge in reinforc...
research
10/03/2020

Episodic Memory for Learning Subjective-Timescale Models

In model-based learning, an agent's model is commonly defined over trans...
research
12/08/2020

Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) methods have shown strong samp...
research
02/22/2021

Explore the Context: Optimal Data Collection for Context-Conditional Dynamics Models

In this paper, we learn dynamics models for parametrized families of dyn...
research
06/12/2018

Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration

Q-Ensembles are a model-free approach where input images are fed into di...
research
05/22/2019

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Unsupervised exploration and representation learning become increasingly...
research
07/01/2019

MULEX: Disentangling Exploitation from Exploration in Deep RL

An agent learning through interactions should balance its action selecti...

Please sign up or login with your details

Forgot password? Click here to reset