Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning

03/30/2016
by   Andrew Shin, et al.
0

The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey information on local, secondary elements. We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3 results in image captions that more accurately take local elements into account.

READ FULL TEXT

page 3

page 6

page 12

page 13

page 17

page 18

research
02/09/2021

SG2Caps: Revisiting Scene Graphs for Image Captioning

The mainstream image captioning models rely on Convolutional Neural Netw...
research
11/25/2019

Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

In this paper a new formulation of event recognition task is examined: i...
research
12/13/2016

Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image

Social event detection in a static image is a very challenging problem a...
research
07/25/2018

Distinctive-attribute Extraction for Image Captioning

Image captioning, an open research issue, has been evolved with the prog...
research
12/06/2017

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decod...
research
11/02/2020

Dual Attention on Pyramid Feature Maps for Image Captioning

Generating natural sentences from images is a fundamental learning task ...
research
12/23/2019

A Robust and Precise ConvNet for small non-coding RNA classification (RPC-snRC)

Functional or non-coding RNAs are attracting more attention as they are ...

Please sign up or login with your details

Forgot password? Click here to reset