Semantic Regularisation for Recurrent Image Annotation

11/16/2016
by   Feng Liu, et al.
0

The "CNN-RNN" design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output. Importantly this makes the end-to-end training of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning.

READ FULL TEXT

page 4

page 10

page 11

page 12

research
07/18/2017

Order-Free RNN with Visual Attention for Multi-Label Classification

In this paper, we propose the joint learning attention and recurrent neu...
research
04/15/2016

CNN-RNN: A Unified Framework for Multi-label Image Classification

While deep convolutional neural networks (CNNs) have shown a great succe...
research
08/28/2019

Image Captioning with Sparse Recurrent Neural Network

Recurrent Neural Network (RNN) has been deployed as the de facto model t...
research
09/11/2018

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

We hypothesize that end-to-end neural image captioning systems work seem...
research
04/15/2022

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorpora...
research
12/04/2016

Multi-Label Image Classification with Regional Latent Semantic Dependencies

Deep convolution neural networks (CNN) have demonstrated advanced perfor...
research
03/16/2019

Fast Interactive Object Annotation with Curve-GCN

Manually labeling objects by tracing their boundaries is a laborious pro...

Please sign up or login with your details

Forgot password? Click here to reset