What value do explicit high level concepts have in vision to language problems?

06/03/2015
by   Qi Wu, et al.
0

Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. We propose here a method of incorporating high-level concepts into the very successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art performance in both image captioning and visual question answering. We also show that the same mechanism can be used to introduce external semantic information and that doing so further improves performance. In doing so we provide an analysis of the value of high level semantic information in V2L problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2016

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Much recent progress in Vision-to-Language problems has been achieved th...
research
07/25/2018

Distinctive-attribute Extraction for Image Captioning

Image captioning, an open research issue, has been evolved with the prog...
research
06/20/2021

Exploring Semantic Relationships for Unpaired Image Captioning

Recently, image captioning has aroused great interest in both academic a...
research
05/29/2019

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...
research
10/30/2018

Gated Hierarchical Attention for Image Captioning

Attention modules connecting encoder and decoders have been widely appli...
research
12/21/2013

Intriguing properties of neural networks

Deep neural networks are highly expressive models that have recently ach...
research
10/10/2016

End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering

We propose a high-level concept word detector that can be integrated wit...

Please sign up or login with your details

Forgot password? Click here to reset