Overview of Tencent Multi-modal Ads Video Understanding Challenge

09/16/2021
by   Zhenzhi Wang, et al.
0

Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification. It asks the participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. Therefore, our task has four distinguishing features from previous ones: ads domain, multi-modal information, temporal segmentation, and multi-label classification. It will advance the foundation of ads video understanding and have a significant impact on many ads applications like video recommendation. This paper presents an overview of our challenge, including the background of ads videos, an elaborate description of task and dataset, evaluation protocol, and our proposed baseline. By ablating the key components of our baseline, we would like to reveal the main challenges of this task and provide useful guidance for future research of this area. In this paper, we give an extended version of our challenge overview. The dataset will be publicly available at https://algo.qq.com/.

READ FULL TEXT
research
12/09/2022

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

Temporal video segmentation and classification have been advanced greatl...
research
09/04/2021

Multi-modal Representation Learning for Video Advertisement Content Structuring

Video advertisement content structuring aims to segment a given video ad...
research
10/13/2019

VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning

Multi-modal information is essential to describe what has happened in a ...
research
06/14/2022

Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous Meme Classification Framework

In this paper we describe our work towards building a generic framework ...
research
06/16/2023

Multi-task 3D building understanding with multi-modal pretraining

This paper explores various learning strategies for 3D building type cla...
research
05/04/2022

Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

Multi-modal aerial view object classification (MAVOC) in Automatic targe...
research
09/21/2019

Video Skimming: Taxonomy and Comprehensive Survey

Video skimming, also known as dynamic video summarization, generates a t...

Please sign up or login with your details

Forgot password? Click here to reset