Trailers12k: Evaluating Transfer Learning for Movie Trailer Genre Classification

10/14/2022
by   Ricardo Montalvo-Lezama, et al.
0

Transfer learning is a cornerstone for a wide range of computer vision problems.It has been broadly studied for image analysis tasks. However, literature for video analysis is scarce and has been mainly focused on transferring representations learned from ImageNet to human action recognition tasks. In this paper, we study transfer learning for Multi-label Movie Trailer Genre Classification (MTGC). In particular, we introduce Trailers12k, a new manually-curated movie trailer dataset and evaluate the transferability of spatial and spatio-temporal representations learned from ImageNet and/or Kinetics to Trailers12k MTGC. In order to reduce the spatio-temporal structure gap between the source and target tasks and improve transferability, we propose a method that performs shot detection so as to segment the trailer into highly correlated clips. We study different aspects that influence transferability, such as segmentation strategy, frame rate, input video extension, and spatio-temporal modeling. Our results demonstrate that representations learned on either ImageNet or Kinetics are comparatively transferable to Trailers12k, although they provide complementary information that can be combined to improve classification performance. Having a similar number of parameters and FLOPS, Transformers provide a better transferability base than ConvNets. Nevertheless, competitive performance can be achieved using lightweight ConvNets, becoming an attractive option for low-resource environments.

READ FULL TEXT

page 5

page 6

page 9

page 10

research
06/28/2023

Theater Aid System for the Visually Impaired Through Transfer Learning of Spatio-Temporal Graph Convolution Networks

The aim of this research is to recognize human actions performed on stag...
research
02/27/2020

LEEP: A New Measure to Evaluate Transferability of Learned Representations

We introduce a new measure to evaluate the transferability of representa...
research
12/01/2019

Gate-Shift Networks for Video Action Recognition

Deep 3D CNNs for video action recognition are designed to learn powerful...
research
09/13/2022

Vision Transformers for Action Recognition: A Survey

Vision transformers are emerging as a powerful tool to solve computer vi...
research
05/19/2014

Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification

We consider the application of KronPCA spatio-temporal modeling techniqu...
research
11/19/2015

Delving Deeper into Convolutional Networks for Learning Video Representations

We propose an approach to learn spatio-temporal features in videos from ...
research
08/21/2019

Transferability and Hardness of Supervised Classification Tasks

We propose a novel approach for estimating the difficulty and transferab...

Please sign up or login with your details

Forgot password? Click here to reset