SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

07/10/2023
by   Bhathiya Hemanthage, et al.
0

SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.

READ FULL TEXT

page 3

page 8

research
05/02/2020

A Simple Language Model for Task-Oriented Dialogue

Task-oriented dialogue is often decomposed into three tasks: understandi...
research
05/11/2020

SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model

This paper presents a new method SOLOIST, which uses transfer learning t...
research
06/28/2021

A Knowledge-Grounded Dialog System Based on Pre-Trained Language Models

We present a knowledge-grounded dialog system developed for the ninth Di...
research
07/16/2022

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Text response generation for multimodal task-oriented dialog systems, wh...
research
12/07/2021

GKS: Graph-based Knowledge Selector for Task-oriented Dialog System

In previous research, knowledge selection tasks mostly rely on language ...
research
12/07/2021

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input

We present our work on the multimodal coreference resolution task of the...
research
03/16/2022

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Visual dialog has witnessed great progress after introducing various vis...

Please sign up or login with your details

Forgot password? Click here to reset