Situated and Interactive Multimodal Conversations

06/02/2020
by   Seungwhan Moon, et al.
8

Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, etc., in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling  13K human-human dialogs ( 169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models will be made publicly available.

READ FULL TEXT

page 1

page 7

page 12

page 16

research
04/18/2021

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

We present a new corpus for the Situated and Interactive Multimodal Conv...
research
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
research
10/05/2018

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Emotion recognition in conversations is a challenging Artificial Intelli...
research
07/08/2020

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

Building multimodal dialogue understanding capabilities situated in the ...
research
01/28/2017

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

The popularity of image sharing on social media and the engagement it cr...
research
04/01/2017

Multimodal Dialogs (MMD): A large-scale dataset for studying multimodal domain-aware conversations

While multimodal conversation agents are gaining importance in several d...
research
04/04/2023

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Fashion illustration is used by designers to communicate their vision an...

Please sign up or login with your details

Forgot password? Click here to reset