Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

08/01/2023
by   Kaijian Liu, et al.
0

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific uniqueness. As a result, these methods are not suitable for person clustering. In this paper, we propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues. The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering. Accordingly, we introduce a graph-based method to construct distribution representation and employ a cyclic update policy to refine distribution representation progressively. Our method achieves substantial improvements of +6 VoxCeleb2 multi-view clustering dataset, respectively. Codes will be released publicly upon acceptance.

READ FULL TEXT

page 1

page 9

research
06/21/2021

Contrastive Multi-Modal Clustering

Multi-modal clustering, which explores complementary information from mu...
research
05/24/2023

Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases

The existing collaborative recommendation models that use multi-modal in...
research
08/27/2023

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

Video temporal character grouping locates appearing moments of major cha...
research
10/03/2016

Multi-View Representation Learning: A Survey from Shallow Methods to Deep Methods

Recently, multi-view representation learning has become a rapidly growin...
research
05/25/2023

Dynamic Enhancement Network for Partial Multi-modality Person Re-identification

Many existing multi-modality studies are based on the assumption of moda...
research
01/04/2019

MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

In this paper, we propose a method for clustering image-caption pairs by...
research
01/26/2021

A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

In this work, we explore different approaches to combine modalities for ...

Please sign up or login with your details

Forgot password? Click here to reset