Multimodal Feature Fusion for Video Advertisements Tagging Via Stacking Ensemble

08/02/2021
by   Qingsong Zhou, et al.
0

Automated tagging of video advertisements has been a critical yet challenging problem, and it has drawn increasing interests in last years as its applications seem to be evident in many fields. Despite sustainable efforts have been made, the tagging task is still suffered from several challenges, such as, efficiently feature fusion approach is desirable, but under-explored in previous studies. In this paper, we present our approach for Multimodal Video Ads Tagging in the 2021 Tencent Advertising Algorithm Competition. Specifically, we propose a novel multi-modal feature fusion framework, with the goal to combine complementary information from multiple modalities. This framework introduces stacking-based ensembling approach to reduce the influence of varying levels of noise and conflicts between different modalities. Thus, our framework can boost the performance of the tagging task, compared to previous methods. To empirically investigate the effectiveness and robustness of the proposed framework, we conduct extensive experiments on the challenge datasets. The obtained results suggest that our framework can significantly outperform related approaches and our method ranks as the 1st place on the final leaderboard, with a Global Average Precision (GAP) of 82.63 promote the research in this field, we will release our code in the final version.

READ FULL TEXT

page 3

page 4

research
09/25/2022

Multimodal Learning with Channel-Mixing and Masked Autoencoder on Facial Action Unit Detection

Recent studies utilizing multi-modal data aimed at building a robust mod...
research
09/06/2022

Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial Attention

Due to the instability and limitations of unimodal biometric systems, mu...
research
08/19/2023

Interpretation on Multi-modal Visual Fusion

In this paper, we present an analytical framework and a novel metric to ...
research
12/21/2021

Multimodal Entity Tagging with Multimodal Knowledge Base

To enhance research on multimodal knowledge base and multimodal informat...
research
06/16/2023

M3PT: A Multi-Modal Model for POI Tagging

POI tagging aims to annotate a point of interest (POI) with some informa...
research
08/20/2021

Video Ads Content Structuring by Combining Scene Confidence Prediction and Tagging

Video ads segmentation and tagging is a challenging task due to two main...
research
05/30/2021

Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing

For multimodal tasks, a good feature extraction network should extract i...

Please sign up or login with your details

Forgot password? Click here to reset