Contrastive Multi-Modal Clustering

06/21/2021
by   Jie Xu, et al.
0

Multi-modal clustering, which explores complementary information from multiple modalities or views, has attracted people's increasing attentions. However, existing works rarely focus on extracting high-level semantic information of multiple modalities for clustering. In this paper, we propose Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic information via contrastive learning. Concretely, our framework consists of three parts. (1) Multiple autoencoders are optimized to maintain each modality's diversity to learn complementary information. (2) A feature contrastive module is proposed to learn common high-level semantic features from different modalities. (3) A label contrastive module aims to learn consistent cluster assignments for all modalities. By the proposed multi-modal contrastive learning, the mutual information of high-level features is maximized, while the diversity of the low-level latent features is maintained. In addition, to utilize the learned high-level semantic features, we further generate pseudo labels by solving a maximum matching problem to fine-tune the cluster assignments. Extensive experiments demonstrate that CMMC has good scalability and outperforms state-of-the-art multi-modal clustering methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2022

Multi-modal Contrastive Representation Learning for Entity Alignment

Multi-modal entity alignment aims to identify equivalent entities betwee...
research
08/01/2023

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

Person clustering with multi-modal clues, including faces, bodies, and v...
research
12/04/2020

Rethinking movie genre classification with fine-grained semantic clustering

Movie genre classification is an active research area in machine learnin...
research
05/28/2022

Contrastive Learning for Multi-Modal Automatic Code Review

Automatic code review (ACR), aiming to relieve manual inspection costs, ...
research
05/16/2022

Noise-Tolerant Learning for Audio-Visual Action Recognition

Recently, video recognition is emerging with the help of multi-modal lea...
research
04/21/2023

Deep Multiview Clustering by Contrasting Cluster Assignments

Multiview clustering (MVC) aims to reveal the underlying structure of mu...
research
05/29/2023

Contrastive Learning Based Recursive Dynamic Multi-Scale Network for Image Deraining

Rain streaks significantly decrease the visibility of captured images an...

Please sign up or login with your details

Forgot password? Click here to reset