DeepAI AI Chat
Log In Sign Up

Multimodal Differential Network for Visual Question Generation

08/12/2018
by   Badri N. Patro, et al.
Indian Institute of Technology Kanpur
0

Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations. Images can have multiple visual and language contexts that are relevant for generating questions namely places, captions, and tags. In this paper, we propose the use of exemplars for obtaining the relevant context. We obtain this by using a Multimodal Differential Network to produce natural and engaging questions. The generated questions show a remarkable similarity to the natural questions as validated by a human study. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU, METEOR, ROUGE, and CIDEr).

READ FULL TEXT
01/23/2020

Deep Bayesian Network for Visual Question Generation

Generating natural questions from an image is a semantic task that requi...
10/08/2019

Generating Highly Relevant Questions

The neural seq2seq based question generation (QG) is prone to generating...
11/18/2014

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image ...
06/01/2021

ViTA: Visual-Linguistic Translation by Aligning Object Tags

Multimodal Machine Translation (MMT) enriches the source text with visua...
04/13/2015

Joint Learning of Distributed Representations for Images and Texts

This technical report provides extra details of the deep multimodal simi...
11/14/2022

Multi-VQG: Generating Engaging Questions for Multiple Images

Generating engaging content has drawn much recent attention in the NLP c...
05/15/2019

VICSOM: VIsual Clues from SOcial Media for psychological assessment

Sharing multimodal information (typically images, videos or text) in Soc...