DeepAI AI Chat
Log In Sign Up

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

by   Per-Arne Andersen, et al.
Universitetet Agder

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.


page 7

page 8

page 11

page 12


A Survey of Exploration Methods in Reinforcement Learning

Exploration is an essential component of reinforcement learning algorith...

Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

We consider scenarios from the real-time strategy game StarCraft as new ...

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...

What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning

Long-term planning poses a major difficulty to many reinforcement learni...

Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management

Warehouse Management Systems have been evolving and improving thanks to ...

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

In model-based reinforcement learning, the agent interleaves between mod...

Plan-Space State Embeddings for Improved Reinforcement Learning

Robot control problems are often structured with a policy function that ...