DeepAI AI Chat
Log In Sign Up

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

by   Satwik Kottur, et al.

People capture photos and videos to relive and share memories of personal significance. Recently, media montages (stories) have become a popular mode of sharing these memories due to their intuitive and powerful storytelling capabilities. However, creating such montages usually involves a lot of manual searches, clicks, and selections that are time-consuming and cumbersome, adversely affecting user experiences. To alleviate this, we propose task-oriented dialogs for montage creation as a novel interactive tool to seamlessly search, compile, and edit montages from a media collection. To the best of our knowledge, our work is the first to leverage multi-turn conversations for such a challenging application, extending the previous literature studying simple media retrieval tasks. We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection. We take a simulate-and-paraphrase approach to collect these dialogs to be both cost and time efficient, while drawing from natural language distribution. Our analysis and benchmarking of state-of-the-art language models showcase the multimodal challenges present in the dataset. Lastly, we present a real-world mobile demo application that shows the feasibility of the proposed work in real-world applications. Our code and data will be made publicly available.


page 1

page 3

page 8


Navigating Connected Memories with a Task-oriented Dialog System

Recent years have seen an increasing trend in the volume of personal med...

Simulated Chats for Task-oriented Dialog: Learning to Generate Conversations from Instructions

Popular task-oriented dialog data sets such as MultiWOZ (Budzianowski et...

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

We present a new corpus for the Situated and Interactive Multimodal Conv...

Contextual Media Retrieval Using Natural Language Queries

The widespread integration of cameras in hand-held and head-worn devices...

Story-oriented Image Selection and Placement

Multimodal contents have become commonplace on the Internet today, manif...

Conversational Pattern Mining using Motif Detection

The subject of conversational mining has become of great interest recent...

A multimodal deep learning framework for scalable content based visual media retrieval

We propose a novel, efficient, modular and scalable framework for conten...