Maria: A Visual Experience Powered Conversational Agent

05/27/2021
by   Zujie Liang, et al.
2

Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image. Then, the response generator is grounded on the extracted visual knowledge and dialog context to generate the target response. Extensive experiments demonstrate Maria outperforms previous state-of-the-art methods on automatic metrics and human evaluation, and can generate informative responses that have some visual commonsense of the physical world.

READ FULL TEXT

page 9

page 14

page 15

page 16

research
04/06/2022

C3KG: A Chinese Commonsense Conversation Knowledge Graph

Existing commonsense knowledge bases often organize tuples in an isolate...
research
09/16/2017

Augmenting End-to-End Dialog Systems with Commonsense Knowledge

Building dialog agents that can converse naturally with humans is a chal...
research
09/11/2019

Proposal Towards a Personalized Knowledge-powered Self-play Based Ensemble Dialog System

This is the application document for the 2019 Amazon Alexa competition. ...
research
06/15/2021

Unsupervised Enrichment of Persona-grounded Dialog with Background Stories

Humans often refer to personal narratives, life experiences, and events ...
research
02/07/2017

A Knowledge-Grounded Neural Conversation Model

Neural network models are capable of generating extremely natural soundi...
research
05/23/2023

R2H: Building Multimodal Navigation Helpers that Respond to Help

The ability to assist humans during a navigation task in a supportive ro...
research
01/17/2020

Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conver...

Please sign up or login with your details

Forgot password? Click here to reset