Multimodal Dialogue State Tracking

06/16/2022
by   Hung Le, et al.
0

Designed for tracking user goals in dialogues, a dialogue state tracker is an essential component in a dialogue system. However, the research of dialogue state tracking has largely been limited to unimodality, in which slots and slot values are limited by knowledge domains (e.g. restaurant domain with slots of restaurant name and price range) and are defined by specific database schema. In this paper, we propose to extend the definition of dialogue state tracking to multimodality. Specifically, we introduce a novel dialogue state tracking task to track the information of visual objects that are mentioned in video-grounded dialogues. Each new dialogue utterance may introduce a new video segment, new visual objects, or new object attributes, and a state tracker is required to update these information slots accordingly. We created a new synthetic benchmark and designed a novel baseline, Video-Dialogue Transformer Network (VDTN), for this task. VDTN combines both object-level features and segment-level features and learns contextual dependencies between videos and dialogues to generate multimodal dialogue states. We optimized VDTN for a state generation task as well as a self-supervised video understanding task which recovers video segment or object representations. Finally, we trained VDTN to use the decoded states in a response prediction task. Together with comprehensive ablation and qualitative analysis, we discovered interesting insights towards building more capable multimodal dialogue systems.

READ FULL TEXT

page 2

page 4

page 15

page 19

research
05/19/2018

Global-Locally Self-Attentive Dialogue State Tracker

Dialogue state tracking, which estimates user goals and requests given t...
research
08/08/2020

Point or Generate Dialogue State Tracker

Dialogue state tracking is a key part of a task-oriented dialogue system...
research
08/20/2019

Teacher-Student Framework Enhanced Multi-domain Dialogue Generation

Dialogue systems dealing with multi-domain tasks are highly required. Ho...
research
01/22/2021

Slot Self-Attentive Dialogue State Tracking

An indispensable component in task-oriented dialogue systems is the dial...
research
10/23/2019

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

Multimodalities provide promising performance than unimodality in most t...
research
01/18/2023

KILDST: Effective Knowledge-Integrated Learning for Dialogue State Tracking using Gazetteer and Speaker Information

Dialogue State Tracking (DST) is core research in dialogue systems and h...
research
11/12/2019

Visual Dialogue State Tracking for Question Generation

GuessWhat?! is a visual dialogue task between a guesser and an oracle. T...

Please sign up or login with your details

Forgot password? Click here to reset