Visual Explanations from Hadamard Product in Multimodal Deep Networks

12/18/2017
by   Jin-Hwa Kim, et al.
0

The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model.

READ FULL TEXT

page 4

page 6

page 7

page 8

research
06/06/2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Modeling textual or visual information with vector representations train...
research
02/15/2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Deep models that are both effective and explainable are desirable in man...
research
11/17/2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Deep models are the defacto standard in visual decision problems due to ...
research
04/08/2019

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

In this paper, we propose a novel end-to-end trainable Video Question An...
research
06/05/2016

Multimodal Residual Learning for Visual QA

Deep neural networks continue to advance the state-of-the-art of image r...
research
10/07/2016

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

We propose a technique for producing "visual explanations" for decisions...
research
04/20/2021

Towards Solving Multimodal Comprehension

This paper targets the problem of procedural multimodal machine comprehe...

Please sign up or login with your details

Forgot password? Click here to reset