Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

01/26/2023
by   Dong-Jin Kim, et al.
0

We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sentence dataset. We leverage such massive unpaired image and caption data upon standard paired data by learning to associate them. To this end, our proposed semi-supervised learning method assigns pseudo-labels to unpaired samples in an adversarial learning fashion, where the joint distribution of image and caption is learned. Our method trains a captioner to learn from a paired data and to progressively associate unpaired data. This approach shows noticeable performance improvement even in challenging scenarios including out-of-task data (i.e., relational captioning, where the target task is different from the unpaired data) and web-crawled data. We also show that our proposed method is theoretically well-motivated and has a favorable global optimal property. Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired COCO dataset demonstrate the consistent effectiveness of our semisupervised learning method with unpaired data compared to competing methods.

READ FULL TEXT

page 1

page 9

page 11

research
09/05/2019

Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

Constructing an organized dataset comprised of a large number of images ...
research
11/16/2016

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised trai...
research
06/20/2021

Exploring Semantic Relationships for Unpaired Image Captioning

Recently, image captioning has aroused great interest in both academic a...
research
03/14/2018

Unpaired Image Captioning by Language Pivoting

Image captioning is a multimodal task involving computer vision and natu...
research
09/10/2021

Partially-supervised novel object captioning leveraging context from paired data

In this paper, we propose an approach to improve image captioning soluti...
research
05/21/2018

Turbo Learning for Captionbot and Drawingbot

We study in this paper the problems of both image captioning and text-to...
research
10/22/2021

Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning

The task of image captioning aims to generate captions directly from ima...

Please sign up or login with your details

Forgot password? Click here to reset