Self-supervised Video-centralised Transformer for Video Face Clustering

03/24/2022
by   Yujiang Wang, et al.
15

This paper presents a novel method for face clustering in videos using a video-centralised transformer. Previous works often employed contrastive learning to learn frame-level representation and used average pooling to aggregate the features along the temporal dimension. This approach may not fully capture the complicated video dynamics. In addition, despite the recent progress in video-based contrastive learning, few have attempted to learn a self-supervised clustering-friendly face representation that benefits the video face clustering task. To overcome these limitations, our method employs a transformer to directly learn video-level representations that can better reflect the temporally-varying property of faces in videos, while we also propose a video-centralised self-supervised framework to train the transformer model. We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering. To this end, we present and release the first large-scale egocentric video face clustering dataset named EasyCom-Clustering. We evaluate our proposed method on both the widely used Big Bang Theory (BBT) dataset and the new EasyCom-Clustering dataset. Results show the performance of our video-centralised transformer has surpassed all previous state-of-the-art methods on both benchmarks, exhibiting a self-attentive understanding of face videos.

READ FULL TEXT

page 6

page 13

research
09/09/2023

Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection

Face forgery videos have caused severe social public concern, and variou...
research
03/03/2019

Self-Supervised Learning of Face Representations for Video Face Clustering

Analyzing the story behind TV series and movies often requires understan...
research
04/05/2020

Clustering based Contrastive Learning for Improving Face Representations

A good clustering algorithm can discover natural groupings in data. Thes...
research
08/25/2020

Multi-Face: Self-supervised Multiview Adaptation for Robust Face Clustering in Videos

Robust face clustering is a key step towards computational understanding...
research
11/13/2022

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

Shadows in videos are difficult to detect because of the large shadow de...
research
07/29/2022

Face-to-Face Contrastive Learning for Social Intelligence Question-Answering

Creating artificial social intelligence - algorithms that can understand...
research
08/27/2022

Self-Supervised Face Presentation Attack Detection with Dynamic Grayscale Snippets

Face presentation attack detection (PAD) plays an important role in defe...

Please sign up or login with your details

Forgot password? Click here to reset