Self-Supervised Learning Disentangled Group Representation as Feature

10/28/2021
by   Tan Wang, et al.
0

A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of "good" representation from a group-theoretic view using Higgins' definition of disentangled representation, and show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization, thus unable to modularize the remaining semantics. To break the limitation, we propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract semantics and the group acting on them into concrete contrastive learning. At each iteration, IP-IRM first partitions the training samples into two subsets that correspond to an entangled group element. Then, it minimizes a subset-invariant contrastive loss, where the invariance guarantees to disentangle the group element. We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks. Codes are available at https://github.com/Wangt-CN/IP-IRM.

READ FULL TEXT

page 2

page 7

page 9

page 10

page 27

page 28

research
03/08/2023

Self-Supervised Learning for Group Equivariant Neural Networks

This paper proposes a method to construct pretext tasks for self-supervi...
research
11/24/2022

Pose-disentangled Contrastive Learning for Self-supervised Facial Representation

Self-supervised facial representation has recently attracted increasing ...
research
08/09/2023

Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation

Invariance against rotations of 3D objects is an important property in a...
research
04/11/2021

Disentangled Contrastive Learning for Learning Robust Textual Representations

Although the self-supervised pre-training of transformer models has resu...
research
08/22/2023

GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

Large-scale foundation models, such as CLIP, have demonstrated remarkabl...
research
07/07/2022

Equivariant Representation Learning via Class-Pose Decomposition

We introduce a general method for learning representations that are equi...
research
08/13/2022

A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection

Co-saliency detection (CoSOD) aims at discovering the repetitive salient...

Please sign up or login with your details

Forgot password? Click here to reset