Multimodal Dialogue State Tracking By QA Approach with Data Augmentation

07/20/2020
by   Xiangyang Mou, et al.
0

Recently, a more challenging state tracking task, Audio-Video Scene-Aware Dialogue (AVSD), is catching an increasing amount of attention among researchers. Different from purely text-based dialogue state tracking, the dialogue in AVSD contains a sequence of question-answer pairs about a video and the final answer to the given question requires additional understanding of the video. This paper interprets the AVSD task from an open-domain Question Answering (QA) point of view and proposes a multimodal open-domain QA system to deal with the problem. The proposed QA system uses common encoder-decoder framework with multimodal fusion and attention. Teacher forcing is applied to train a natural language generator. We also propose a new data augmentation approach specifically under QA assumption. Our experiments show that our model and techniques bring significant improvements over the baseline model on the DSTC7-AVSD dataset and demonstrate the potentials of our data augmentation techniques.

READ FULL TEXT

page 3

page 6

research
08/14/2019

Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Visual question answering and visual dialogue tasks have been increasing...
research
07/04/2017

DeepStory: Video Story QA by Deep Embedded Memory Networks

Question-answering (QA) on video contents is a significant challenge for...
research
07/30/2019

LEAF-QA: Locate, Encode & Attend for Figure Question Answering

We introduce LEAF-QA, a comprehensive dataset of 250,000 densely annotat...
research
05/19/2020

Matching Questions and Answers in Dialogues from Online Forums

Matching question-answer relations between two turns in conversations is...
research
01/09/2023

MAQA: A Multimodal QA Benchmark for Negation

Multimodal learning can benefit from the representation power of pretrai...
research
12/18/2020

Trying Bilinear Pooling in Video-QA

Bilinear pooling (BLP) refers to a family of operations recently develop...
research
08/07/2018

A Joint Sequence Fusion Model for Video Question Answering and Retrieval

We present an approach named JSFusion (Joint Sequence Fusion) that can m...

Please sign up or login with your details

Forgot password? Click here to reset