Probabilistic framework for solving Visual Dialog

09/11/2019
by   Badri N. Patro, et al.
19

In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog'. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been proposed to solve this task by variants of multi-modal deep learning techniques that combine visual and language representations. However, we believe that it is crucial to understand and analyze the sources of uncertainty for solving this task. Our approach allows for estimating uncertainty and also aids a diverse generation of answers. The proposed approach is obtained through a probabilistic representation module that provides us with representations for image, question and conversation history, a module that ensures that diverse latent representations for candidate answers are obtained given the probabilistic representations and an uncertainty representation module that chooses the appropriate answer that minimizes uncertainty. We thoroughly evaluate the model with a detailed ablation analysis, comparison with state of the art and visualization of the uncertainty that aids in the understanding of the method. Using the proposed probabilistic framework, we thus obtain an improved visual dialog system that is also more explainable.

READ FULL TEXT

page 4

page 27

page 28

page 29

research
04/15/2022

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Visual Dialog is a challenging vision-language task since the visual dia...
research
02/26/2019

Image-Question-Answer Synergistic Network for Visual Dialog

The image, question (combined with the history for de-referencing), and ...
research
02/01/2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

This paper presents Recurrent Dual Attention Network (ReDAN) for visual ...
research
10/13/2019

Granular Multimodal Attention Networks for Visual Dialog

Vision and language tasks have benefited from attention. There have been...
research
01/23/2020

Deep Bayesian Network for Visual Question Generation

Generating natural questions from an image is a semantic task that requi...
research
07/08/2022

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...
research
09/13/2021

Learning to Ground Visual Objects for Visual Dialog

Visual dialog is challenging since it needs to answer a series of cohere...

Please sign up or login with your details

Forgot password? Click here to reset