Game-Based Video-Context Dialogue

09/12/2018
by   Ramakanth Pasunuru, et al.
0

Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers. Some recent work has investigated static image-based dialogue. However, several real-world human interactions also involve dynamic visual context (similar to videos) as well as dialogue exchanges among multiple speakers. To move closer towards such multimodal conversational skills and visually-situated applications, we introduce a new video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. This challenging testbed allows us to develop visually-grounded dialogue models that should generate relevant temporal and spatial event language from the live video, while also being relevant to the chat history. For strong baselines, we also present several discriminative and generative models, e.g., based on tridirectional attention flow (TriDAF). We evaluate these models via retrieval ranking-recall, automatic phrase-matching metrics, as well as human evaluation studies. We also present dataset analyses, model ablations, and visualizations to understand the contribution of different modalities and model components.

READ FULL TEXT

page 1

page 3

page 9

page 11

research
05/30/2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Video-grounded dialogue understanding is a challenging problem that requ...
research
12/10/2020

Look Before you Speak: Visually Contextualized Utterances

While most conversational AI systems focus on textual dialogue only, con...
research
10/15/2021

Structural Modeling for Dialogue Disentanglement

Tangled multi-party dialogue context leads to challenges for dialogue re...
research
03/23/2023

Dialogue-to-Video Retrieval

Recent years have witnessed an increasing amount of dialogue/conversatio...
research
02/11/2018

FlipDial: A Generative Model for Two-Way Visual Dialogue

We present FlipDial, a generative model for visual dialogue that simulta...
research
12/16/2022

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

Persuasion modeling is a key building block for conversational agents. E...
research
05/16/2023

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

We present a new task, speech dialogue translation mediating speakers of...

Please sign up or login with your details

Forgot password? Click here to reset