DeepAI AI Chat
Log In Sign Up

Multimodal Differential Network for Visual Question Generation

by   Badri N. Patro, et al.
Indian Institute of Technology Kanpur

Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations. Images can have multiple visual and language contexts that are relevant for generating questions namely places, captions, and tags. In this paper, we propose the use of exemplars for obtaining the relevant context. We obtain this by using a Multimodal Differential Network to produce natural and engaging questions. The generated questions show a remarkable similarity to the natural questions as validated by a human study. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU, METEOR, ROUGE, and CIDEr).


Deep Bayesian Network for Visual Question Generation

Generating natural questions from an image is a semantic task that requi...

Generating Highly Relevant Questions

The neural seq2seq based question generation (QG) is prone to generating...

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image ...

ViTA: Visual-Linguistic Translation by Aligning Object Tags

Multimodal Machine Translation (MMT) enriches the source text with visua...

Joint Learning of Distributed Representations for Images and Texts

This technical report provides extra details of the deep multimodal simi...

Multi-VQG: Generating Engaging Questions for Multiple Images

Generating engaging content has drawn much recent attention in the NLP c...

VICSOM: VIsual Clues from SOcial Media for psychological assessment

Sharing multimodal information (typically images, videos or text) in Soc...