Reasoning Visual Dialogs with Structural and Partial Observations

04/11/2019
by   Zilong Zheng, et al.
0

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

READ FULL TEXT

page 1

page 8

research
09/17/2021

GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

Visual dialog, which aims to hold a meaningful conversation with humans ...
research
04/05/2020

Iterative Context-Aware Graph Inference for Visual Dialog

Visual dialog is a challenging task that requires the comprehension of t...
research
04/14/2020

DialGraph: Sparse Graph Learning Networks for Visual Dialog

Visual dialog is a task of answering a sequence of questions grounded in...
research
11/17/2014

Relations World: A Possibilistic Graphical Model

We explore the idea of using a "possibilistic graphical model" as the ba...
research
06/24/2023

Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles

We automate deep step-by step reasoning in an LLM dialog thread by recur...
research
02/22/2019

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

Answerer in Questioner's Mind (AQM) is an information-theoretic framewor...
research
04/23/2022

Supplementing Missing Visions via Dialog for Scene Graph Generations

Most current AI systems rely on the premise that the input visual data a...

Please sign up or login with your details

Forgot password? Click here to reset