A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

04/06/2020
by   Anyi Rao, et al.
3

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment. Identifying the composition of scenes serves as a critical step towards semantic understanding of movies. This is very challenging – compared to the videos studied in conventional vision problems, e.g. action recognition, as scenes in movies usually contain much richer temporal structures and more complex semantic information. Towards this goal, we scale up the scene segmentation task by building a large-scale video dataset MovieScenes, which contains 21K annotated scene segments from 150 movies. We further propose a local-to-global scene segmentation framework, which integrates multi-modal information across three levels, i.e. clip, segment, and movie. This framework is able to distill complex semantics from hierarchical temporal structures over a long movie, providing top-down guidance for scene segmentation. Our experiments show that the proposed network is able to segment a movie into scenes with high accuracy, consistently outperforming previous methods. We also found that pretraining on our MovieScenes can bring significant improvements to the existing approaches.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
12/09/2022

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

Temporal video segmentation and classification have been advanced greatl...
research
02/22/2022

Movies2Scenes: Learning Scene Representations Using Movie Similarities

Automatic understanding of movie-scenes is an important problem with mul...
research
10/24/2019

A Graph-Based Framework to Bridge Movies and Synopses

Inspired by the remarkable advances in video analytics, research teams a...
research
07/26/2023

Human-centric Scene Understanding for 3D Large-scale Scenarios

Human-centric scene understanding is significant for real-world applicat...
research
10/20/2022

MovieCLIP: Visual Scene Recognition in Movies

Longform media such as movies have complex narrative structures, with ev...
research
08/08/2020

A Unified Framework for Shot Type Classification Based on Subject Centric Lens

Shots are key narrative elements of various videos, e.g. movies, TV seri...
research
05/11/2022

Scene Consistency Representation Learning for Video Scene Segmentation

A long-term video, such as a movie or TV show, is composed of various sc...

Please sign up or login with your details

Forgot password? Click here to reset