SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

01/05/2023
by   Yuxing Long, et al.
0

Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Petrained with Multimodal Questions from INcremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Incremental Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets.

READ FULL TEXT

page 1

page 4

page 7

research
01/31/2018

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

While conversing with chatbots, humans typically tend to ask many questi...
research
04/19/2017

Answering Complex Questions Using Open Information Extraction

While there has been substantial progress in factoid question-answering ...
research
04/12/2021

SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning

This paper proposes a question-answering (QA) benchmark for spatial reas...
research
09/21/2021

Relation-Guided Pre-Training for Open-Domain Question Answering

Answering complex open-domain questions requires understanding the laten...
research
02/13/2021

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Open-domain Question Answering models which directly leverage question-a...
research
09/11/2018

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Visual QA is a pivotal challenge for higher-level reasoning, requiring u...
research
11/29/2021

Action based Network for Conversation Question Reformulation

Conversation question answering requires the ability to interpret a ques...

Please sign up or login with your details

Forgot password? Click here to reset