Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review

07/02/2022
by   Hao Wang, et al.
5

The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction in the era of artificial intelligence. With the gradually complex human-computer interaction requirements (e.g., multimodal inputs, time sensitivity), it is difficult for traditional text-based dialogue system to meet the demands for more vivid and convenient interaction. Consequently, Visual Context Augmented Dialogue System (VAD), which has the potential to communicate with humans by perceiving and understanding multimodal information (i.e., visual context in images or videos, textual dialogue history), has become a predominant research paradigm. Benefiting from the consistency and complementarity between visual and textual context, VAD possesses the potential to generate engaging and context-aware responses. For depicting the development of VAD, we first characterize the concepts and unique features of VAD, and then present its generic system architecture to illustrate the system workflow. Subsequently, several research challenges and representative works are detailed investigated, followed by the summary of authoritative benchmarks. We conclude this paper by putting forward some open issues and promising research trends for VAD, e.g., the cognitive mechanisms of human-machine dialogue under cross-modal dialogue context, and knowledge-enhanced cross-modal semantic interaction.

READ FULL TEXT

page 1

page 3

page 7

page 9

page 15

research
05/24/2019

Bridging Dialogue Generation and Facial Expression Synthesis

Spoken dialogue systems that assist users to solve complex tasks such as...
research
08/11/2020

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Visual dialogue is a challenging task that needs to extract implicit inf...
research
06/02/2019

A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems - Past, Present and Future Directions

One of the hardest problems in the area of Natural Language Processing a...
research
03/15/2020

Vision-Dialog Navigation by Exploring Cross-modal Memory

Vision-dialog navigation posed as a new holy-grail task in vision-langua...
research
07/25/2023

Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection

Human-Object Interaction (HOI) detection is a challenging computer visio...
research
09/05/2018

Multimodal Dialogue Management for Multiparty Interaction with Infants

We present dialogue management routines for a system to engage in multip...
research
10/13/2020

Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality Interaction

Natural human interactions for Mixed Reality Applications are overwhelmi...

Please sign up or login with your details

Forgot password? Click here to reset