Improving Context Modelling in Multimodal Dialogue Generation

10/20/2018
by   Shubham Agarwal, et al.
0

In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system's output.

READ FULL TEXT
research
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
research
05/27/2023

MPCHAT: Towards Multimodal Persona-Grounded Conversation

In order to build self-consistent personalized dialogue agents, previous...
research
09/30/2019

Retrieval-based Goal-Oriented Dialogue Generation

Most research on dialogue has focused either on dialogue generation for ...
research
10/19/2021

A non-hierarchical attention network with modality dropout for textual response generation in multimodal dialogue systems

Existing text- and image-based multimodal dialogue systems use the tradi...
research
03/21/2023

cTBL: Augmenting Large Language Models for Conversational Tables

An open challenge in multimodal conversational AI requires augmenting la...
research
10/16/2021

Multimodal Dialogue Response Generation

Responsing with image has been recognized as an important capability for...
research
04/21/2022

A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

Towards building intelligent dialogue agents, there has been a growing i...

Please sign up or login with your details

Forgot password? Click here to reset