MovieCLIP: Visual Scene Recognition in Movies

10/20/2022
by   Digbalay Bose, et al.
17

Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated with visual scenes in movies include transitions, person coverage, and a wide array of real-life and fictional scenarios. Existing visual scene datasets in movies have limited taxonomies and don't consider the visual scene transition within movie clips. In this work, we address the problem of visual scene recognition in movies by first automatically curating a new and extensive movie-centric taxonomy of 179 scene labels derived from movie scripts and auxiliary web-based video datasets. Instead of manual annotations which can be expensive, we use CLIP to weakly label 1.12 million shots from 32K movie clips based on our proposed taxonomy. We provide baseline visual models trained on the weakly labeled dataset called MovieCLIP and evaluate them on an independent dataset verified by human raters. We show that leveraging features from models pretrained on MovieCLIP benefits downstream tasks such as multi-label scene and genre classification of web videos and movie trailers.

READ FULL TEXT

page 1

page 6

research
02/22/2022

Movies2Scenes: Learning Scene Representations Using Movie Similarities

Automatic understanding of movie-scenes is an important problem with mul...
research
09/21/2023

Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification

In the film industry, movie posters have been an essential part of adver...
research
06/05/2019

Detecting Kissing Scenes in a Database of Hollywood Films

Detecting scene types in a movie can be very useful for application such...
research
08/06/2023

"Kurosawa": A Script Writer's Assistant

Storytelling is the lifeline of the entertainment industry – movies, TV ...
research
02/14/2023

A dataset for Audio-Visual Sound Event Detection in Movies

Audio event detection is a widely studied audio processing task, with ap...
research
04/06/2020

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Scene, as the crucial unit of storytelling in movies, contains complex a...
research
12/04/2015

Toward a Taxonomy and Computational Models of Abnormalities in Images

The human visual system can spot an abnormal image, and reason about wha...

Please sign up or login with your details

Forgot password? Click here to reset