Recurrent Relational Memory Network for Unsupervised Image Captioning

06/24/2020
by   Dan Guo, et al.
16

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R^2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R^2M implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. R^2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of R^2M than state-of-the-arts on all benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
11/27/2018

Unsupervised Image Captioning

Deep neural networks have achieved great successes on the image captioni...
research
03/26/2019

Unpaired Image Captioning via Scene Graph Alignments

Deep neural networks have achieved great success on the image captioning...
research
04/21/2017

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

We address personalization issues of image captioning, which have not be...
research
10/31/2019

Can adversarial training learn image captioning ?

Recently, generative adversarial networks (GAN) have gathered a lot of i...
research
07/28/2021

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Image Captioning is a task that combines computer vision and natural lan...
research
04/25/2015

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

In this paper, we address the task of learning novel visual concepts, an...
research
03/21/2017

Recurrent Topic-Transition GAN for Visual Paragraph Generation

A natural image usually conveys rich semantic content and can be viewed ...

Please sign up or login with your details

Forgot password? Click here to reset