Structured Co-reference Graph Attention for Video-grounded Dialogue

03/24/2021
by   Junyeong Kim, et al.
0

A video-grounded dialogue system referred to as the Structured Co-reference Graph Attention (SCGA) is presented for decoding the answer sequence to a question regarding a given video while keeping track of the dialogue context. Although recent efforts have made great strides in improving the quality of the response, performance is still far from satisfactory. The two main challenging issues are as follows: (1) how to deduce co-reference among multiple modalities and (2) how to reason on the rich underlying semantic structure of video with complex spatial and temporal dynamics. To this end, SCGA is based on (1) Structured Co-reference Resolver that performs dereferencing via building a structured graph over multiple modalities, (2) Spatio-temporal Video Reasoner that captures local-to-global dynamics of video via gradually neighboring graph attention. SCGA makes use of pointer network to dynamically replicate parts of the question for decoding the answer sequence. The validity of the proposed SCGA is demonstrated on AVSD@DSTC7 and AVSD@DSTC8 datasets, a challenging video-grounded dialogue benchmarks, and TVQA dataset, a large-scale videoQA benchmark. Our empirical results show that SCGA outperforms other state-of-the-art dialogue systems on both benchmarks, while extensive ablation study and qualitative analysis reveal performance gain and improved interpretability.

READ FULL TEXT

page 3

page 7

research
05/30/2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Video-grounded dialogue understanding is a challenging problem that requ...
research
06/27/2020

Video-Grounded Dialogues with Pretrained Generation Language Models

Pre-trained language models have shown remarkable success in improving v...
research
07/02/2019

Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems

Developing Video-Grounded Dialogue Systems (VGDS), where a dialogue is c...
research
12/12/2022

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a que...
research
10/20/2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues

Video-grounded dialogues are very challenging due to (i) the complexity ...
research
01/01/2021

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

A video-grounded dialogue system is required to understand both dialogue...
research
10/22/2022

Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation

We study video-grounded dialogue generation, where a response is generat...

Please sign up or login with your details

Forgot password? Click here to reset