A Unified Framework for Shot Type Classification Based on Subject Centric Lens

by   Anyi Rao, et al.

Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.


page 2

page 13

page 14


The Art of Prompting: Event Detection based on Type Specific Prompts

We compare various forms of prompts to represent event types and develop...

LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5

Existing approaches to lifelong language learning rely on plenty of labe...

Revisiting Few-shot Activity Detection with Class Similarity Control

Many interesting events in the real world are rare making preannotated m...

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Scene, as the crucial unit of storytelling in movies, contains complex a...

Effectively leveraging Multi-modal Features for Movie Genre Classification

Movie genre classification has been widely studied in recent years due t...

Online Multi-modal Person Search in Videos

The task of searching certain people in videos has seen increasing poten...

Fast Video Shot Transition Localization with Deep Structured Models

Detection of video shot transition is a crucial pre-processing step in v...