DeepAI AI Chat
Log In Sign Up

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between skills and state transitions. In contrast to most prior approaches, CIC uses a decomposition of the mutual information that explicitly incentivizes diverse behaviors by maximizing state entropy. We derive a novel lower bound estimate for the mutual information which combines a particle estimator for state entropy to generate diverse behaviors and contrastive learning to distill these behaviors into distinct skills. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC substantially improves over prior unsupervised skill discovery methods and outperforms the next leading overall exploration algorithm in terms of downstream task performance.

READ FULL TEXT

page 5

page 16

page 17

05/08/2023

Behavior Contrastive Learning for Unsupervised Skill Discovery

In reinforcement learning, unsupervised skill discovery aims to learn di...
10/06/2021

The Information Geometry of Unsupervised Reinforcement Learning

How can a reinforcement learning (RL) agent prepare to solve downstream ...
08/31/2021

APS: Active Pretraining with Successor Features

We introduce a new unsupervised pretraining objective for reinforcement ...
10/27/2021

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

Learning meaningful behaviors in the absence of reward is a difficult pr...
07/15/2019

Mutual Reinforcement Learning

Recently, collaborative robots have begun to train humans to achieve com...
10/05/2020

Conditional Negative Sampling for Contrastive Learning of Visual Representations

Recent methods for learning unsupervised visual representations, dubbed ...
03/21/2022

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

In reinforcement learning, the graph Laplacian has proved to be a valuab...

Code Repositories

cic

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery


view repo