Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention

07/04/2018
by   Karuna Ahuja, et al.
0

We tackle the problem of understanding visual ads where given an ad image, our goal is to rank appropriate human generated statements describing the purpose of the ad. This problem is generally addressed by jointly embedding images and candidate statements to establish correspondence. Decoding a visual ad requires inference of both semantic and symbolic nuances referenced in an image and prior methods may fail to capture such associations especially with weakly annotated symbols. In order to create better embeddings, we leverage an attention mechanism to associate image proposals with symbols and thus effectively aggregate information from aligned multimodal representations. We propose a multihop co-attention mechanism that iteratively refines the attention map to ensure accurate attention estimation. Our attention based embedding model is learned end-to-end guided by a max-margin loss function. We show that our model outperforms other baselines on the benchmark Ad dataset and also show qualitative results to highlight the advantages of using multihop co-attention.

READ FULL TEXT

page 1

page 6

research
05/25/2019

Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding

Images with visual and scene text content are ubiquitous in everyday lif...
research
11/17/2017

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

In order to convey the most content in their limited space, advertisemen...
research
06/10/2022

Symbolic image detection using scene and knowledge graphs

Sometimes the meaning conveyed by images goes beyond the list of objects...
research
04/09/2018

AMNet: Memorability Estimation with Attention

In this paper we present the design and evaluation of an end-to-end trai...
research
05/09/2019

Embedding Human Knowledge in Deep Neural Network via Attention Map

Human-in-the-loop (HITL), which introduces human knowledge to machine le...
research
09/02/2023

Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers

In this paper, we propose and realize a new deep learning architecture f...
research
05/11/2022

TextMatcher: Cross-Attentional Neural Network to Compare Image and Text

We study a novel multimodal-learning problem, which we call text matchin...

Please sign up or login with your details

Forgot password? Click here to reset