DeepAI AI Chat
Log In Sign Up

ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process

by   Jiayu Chen, et al.

Learning rich skills through temporal abstractions without supervision of external rewards is at the frontier of Reinforcement Learning research. Existing works mainly fall into two distinctive categories: variational and Laplacian-based option discovery. The former maximizes the diversity of the discovered options through a mutual information loss but overlooks coverage of the state space, while the latter focuses on improving the coverage of options by increasing connectivity during exploration, but does not consider diversity. In this paper, we propose a unified framework that quantifies diversity and coverage through a novel use of the Determinantal Point Process (DPP) and enables unsupervised option discovery explicitly optimizing both objectives. Specifically, we define the DPP kernel matrix with the Laplacian spectrum of the state transition graph and use the expected mode number in the trajectories as the objective to capture and enhance both diversity and coverage of the learned options. The proposed option discovery algorithm is extensively evaluated using challenging tasks built with Mujoco and Atari, demonstrating that our proposed algorithm substantially outperforms SOTA baselines from both diversity- and coverage-driven categories. The codes are available at


page 1

page 2

page 7

page 8

page 9


Reward-Respecting Subtasks for Model-Based Reinforcement Learning

To achieve the ambitious goals of artificial intelligence, reinforcement...

Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs

Covering option discovery has been developed to improve the exploration ...

Option Discovery in the Absence of Rewards with Manifold Analysis

Options have been shown to be an effective tool in reinforcement learnin...

Unsupervised Skill Discovery with Bottleneck Option Learning

Having the ability to acquire inherent skills from environments without ...

Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

The Laplacian representation recently gains increasing attention for rei...

Variational Option Discovery Algorithms

We explore methods for option discovery based on variational inference a...

PODNet: A Neural Network for Discovery of Plannable Options

Learning from demonstration has been widely studied in machine learning ...