Multi-modal Representation Learning for Video Advertisement Content Structuring

09/04/2021
by   Daya Guo, et al.
0

Video advertisement content structuring aims to segment a given video advertisement and label each segment on various dimensions, such as presentation form, scene, and style. Different from real-life videos, video advertisements contain sufficient and useful multi-modal content like caption and speech, which provides crucial video semantics and would enhance the structuring process. In this paper, we propose a multi-modal encoder to learn multi-modal representation from video advertisements by interacting between video-audio and text. Based on multi-modal representation, we then apply Boundary-Matching Network to generate temporal proposals. To make the proposals more accurate, we refine generated proposals by scene-guided alignment and re-ranking. Finally, we incorporate proposal located embeddings into the introduced multi-modal encoder to capture temporal relationships between local features of each proposal and global features of the whole video for classification. Experimental results show that our method achieves significantly improvement compared with several baselines and Rank 1 on the task of Multi-modal Ads Video Understanding in ACM Multimedia 2021 Grand Challenge. Ablation study further shows that leveraging multi-modal content like caption and speech in video advertisements significantly improve the performance.

READ FULL TEXT
research
09/25/2022

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

Advertisement video editing aims to automatically edit advertising video...
research
06/04/2023

Predicting Information Pathways Across Online Communities

The problem of community-level information pathway prediction (CLIPP) ai...
research
01/25/2021

Using Angle of Arrival for Improving Indoor Localization

In this paper, we primarily explore the improvement of single stream aud...
research
12/27/2022

Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

Research into multi-modal perception, human cognition, behavior, and att...
research
09/16/2021

Overview of Tencent Multi-modal Ads Video Understanding Challenge

Multi-modal Ads Video Understanding Challenge is the first grand challen...
research
01/20/2021

Video Relation Detection with Trajectory-aware Multi-modal Features

Video relation detection problem refers to the detection of the relation...
research
04/09/2021

Video-aided Unsupervised Grammar Induction

We investigate video-aided grammar induction, which learns a constituenc...

Please sign up or login with your details

Forgot password? Click here to reset