Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

08/27/2023
by   Xiujun Shu, et al.
0

Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of models, which are unfriendly to deployment. In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping. This is accomplished firstly by a unified representation network that learns representations of multiple modalities within the same space and still preserves the modality's uniqueness simultaneously. Secondly, we present a dynamic graph clustering where the neighbors of different quantities are dynamically constructed for each node via a cyclic matching strategy, leading to a more reliable affinity graph. Thirdly, a progressive association method is introduced to exploit spatial and temporal contexts among different modalities, allowing multi-modal clustering results to be well fused. As current datasets only provide pre-extracted features, we evaluate our UniDG method on a collected dataset named MTCG, which contains each character's appearing clips of face and body and speaking voice tracks. We also evaluate our key components on existing clustering and retrieval datasets to verify the generalization ability. Experimental results manifest that our method can achieve promising results and outperform several state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 7

page 10

research
07/29/2020

Dynamic Character Graph via Online Face Clustering for Movie Analysis

An effective approach to automated movie content analysis involves build...
research
08/01/2023

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

Person clustering with multi-modal clues, including faces, bodies, and v...
research
09/01/2019

Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context

Predicting the emotional impact of videos using machine learning is a ch...
research
04/06/2021

Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Bottom-up approaches for image-based multi-person pose estimation consis...
research
03/03/2023

Multi-modal Multi-kernel Graph Learning for Autism Prediction and Biomarker Discovery

Multi-modal integration and classification based on graph learning is am...
research
12/31/2014

ModDrop: adaptive multi-modal gesture recognition

We present a method for gesture detection and localisation based on mult...
research
08/09/2019

Video Face Clustering with Unknown Number of Clusters

Understanding videos such as TV series and movies requires analyzing who...

Please sign up or login with your details

Forgot password? Click here to reset