MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

09/12/2021
by   Alejandro Pardo, et al.
0

Understanding movies and their structural patterns is a crucial task to decode the craft of video editing. While previous works have developed tools for general analysis such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the cut type recognition task, which requires modeling of multi-modal information. To ignite research in the new task, we construct a large-scale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types. We benchmark a series of audio-visual approaches, including some that deal with the problem's multi-modal and multi-label nature. Our best model achieves 45.7 suggests that the task is challenging and that attaining highly accurate cut type recognition is an open research problem.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 11

page 13

research
09/23/2021

Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Recognizing the emotional state of people is a basic but challenging tas...
research
09/03/2018

LRS3-TED: a large-scale dataset for visual speech recognition

This paper introduces a new multi-modal dataset for visual and audio-vis...
research
06/15/2021

Is this Harmful? Learning to Predict Harmfulness Ratings from Video

Automatically identifying harmful content in video is an important task ...
research
08/09/2021

Learning to Cut by Watching Movies

Video content creation keeps growing at an incredible pace; yet, creatin...
research
04/02/2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Video editing tools are widely used nowadays for digital design. Althoug...
research
06/16/2023

Multi-task 3D building understanding with multi-modal pretraining

This paper explores various learning strategies for 3D building type cla...
research
07/28/2020

Families In Wild Multimedia (FIW-MM): A Multi-Modal Database for Recognizing Kinship

Recognizing kinship - a soft biometric with vast applications - in photo...

Please sign up or login with your details

Forgot password? Click here to reset