DeepAI AI Chat
Log In Sign Up

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

by   Per-Arne Andersen, et al.
Universitetet Agder

Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence.


Flexible Option Learning

Temporal abstraction in reinforcement learning (RL), offers the promise ...

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...

Discovering hierarchies using Imitation Learning from hierarchy aware policies

Learning options that allow agents to exhibit temporally higher order be...

Disentangling Options with Hellinger Distance Regularizer

In reinforcement learning (RL), temporal abstraction still remains as an...

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...

Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space

We present a novel generative method for producing unseen and plausible ...

Efficient Online Estimation of Empowerment for Reinforcement Learning

Training artificial agents to acquire desired skills through model-free ...