Log In Sign Up

Which Ads to Show? Advertisement Image Assessment with Auxiliary Information via Multi-step Modality Fusion

by   Kyung-Wha Park, et al.

Assessing aesthetic preference is a fundamental task related to human cognition. It can also contribute to various practical applications such as image creation for online advertisements. Despite crucial influences of image quality, auxiliary information of ad images such as tags and target subjects can also determine image preference. Existing studies mainly focus on images and thus are less useful for advertisement scenarios where rich auxiliary data are available. Here we propose a modality fusion-based neural network that evaluates the aesthetic preference of images with auxiliary information. Our method fully utilizes auxiliary data by introducing multi-step modality fusion using both conditional batch normalization-based low-level and attention-based high-level fusion mechanisms, inspired by the findings from statistical analyses on real advertisement data. Our approach achieved state-of-the-art performance on the AVA dataset, a widely used dataset for aesthetic assessment. Besides, the proposed method is evaluated on large-scale real-world advertisement image data with rich auxiliary attributes, providing promising preference prediction results. Through extensive experiments, we investigate how image and auxiliary information together influence click-through rate.


page 3

page 7


M2FN: Multi-step Modality Fusion for Advertisement Image Assessment

Assessing advertisements, specifically on the basis of user preferences ...

A Cross-Modal Image Fusion Theory Guided by Human Visual Characteristics

The characteristics of feature selection, nonlinear combination and mult...

Evaluation of Retinal Image Quality Assessment Networks in Different Color-spaces

Retinal image quality assessment (RIQA) is essential for controlling the...

M5Product: A Multi-modal Pretraining Benchmark for E-commercial Product Downstream Tasks

In this paper, we aim to advance the research of multi-modal pre-trainin...

Cross Attention-guided Dense Network for Images Fusion

In recent years, various applications in computer vision have achieved s...

Cross-Modal Image Fusion Theory Guided by Subjective Visual Attention

The human visual perception system has very strong robustness and contex...

PixelCNN Models with Auxiliary Variables for Natural Image Modeling

We study probabilistic models of natural images and extend the autoregre...