DeepAI AI Chat
Log In Sign Up

Situated and Interactive Multimodal Conversations

06/02/2020
by   Seungwhan Moon, et al.
Facebook
8

Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, etc., in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling  13K human-human dialogs ( 169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models will be made publicly available.

READ FULL TEXT

page 1

page 7

page 12

page 16

04/18/2021

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

We present a new corpus for the Situated and Interactive Multimodal Conv...
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
07/08/2020

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

Building multimodal dialogue understanding capabilities situated in the ...
05/13/2022

Multimodal Conversational AI: A Survey of Datasets and Approaches

As humans, we experience the world with all our senses or modalities (so...
01/28/2017

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

The popularity of image sharing on social media and the engagement it cr...
04/01/2017

Multimodal Dialogs (MMD): A large-scale dataset for studying multimodal domain-aware conversations

While multimodal conversation agents are gaining importance in several d...
12/30/2021

An empirical user-study of text-based nonverbal annotation systems for human-human conversations

the substantial increase in the number of online human-human conversatio...

Code Repositories

simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.


view repo

simmc2

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations


view repo

simmc

Situated Interactive MultiModal Conversations (SIMMC) Challenge 2020


view repo