Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

10/03/2022
by   Per-Arne Andersen, et al.
0

Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence.

READ FULL TEXT
research
12/06/2021

Flexible Option Learning

Temporal abstraction in reinforcement learning (RL), offers the promise ...
research
02/26/2019

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...
research
12/01/2018

Discovering hierarchies using Imitation Learning from hierarchy aware policies

Learning options that allow agents to exhibit temporally higher order be...
research
04/15/2019

Disentangling Options with Hellinger Distance Regularizer

In reinforcement learning (RL), temporal abstraction still remains as an...
research
11/10/2017

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...
research
01/16/2021

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can...
research
07/15/2022

Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space

We present a novel generative method for producing unseen and plausible ...

Please sign up or login with your details

Forgot password? Click here to reset