Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

12/09/2022
by   Jie Jiang, et al.
0

Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation' (TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS.

READ FULL TEXT

page 2

page 7

research
09/16/2021

Overview of Tencent Multi-modal Ads Video Understanding Challenge

Multi-modal Ads Video Understanding Challenge is the first grand challen...
research
04/25/2019

Holistic Large Scale Video Understanding

Action recognition has been advanced in recent years by benchmarks with ...
research
08/23/2023

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Non-photorealistic videos are in demand with the wave of the metaverse, ...
research
04/06/2020

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Scene, as the crucial unit of storytelling in movies, contains complex a...
research
09/25/2022

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

Advertisement video editing aims to automatically edit advertising video...
research
03/24/2022

Effectively leveraging Multi-modal Features for Movie Genre Classification

Movie genre classification has been widely studied in recent years due t...
research
10/23/2020

Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

With the rising of short video apps, such as TikTok, Snapchat and Kwai, ...

Please sign up or login with your details

Forgot password? Click here to reset