Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

09/05/2019
by   Dong-Jin Kim, et al.
0

Constructing an organized dataset comprised of a large number of images and several captions for each image is a laborious task, which requires vast human effort. On the other hand, collecting a large number of images and sentences separately may be immensely easier. In this paper, we develop a novel data-efficient semi-supervised framework for training an image captioning model. We leverage massive unpaired image and caption data by learning to associate them. To this end, our proposed semi-supervised learning method assigns pseudo-labels to unpaired samples via Generative Adversarial Networks to learn the joint distribution of image and caption. To evaluate, we construct scarcely-paired COCO dataset, a modified version of MS COCO caption dataset. The empirical results show the effectiveness of our method compared to several strong baselines, especially when the amount of the paired samples are scarce.

READ FULL TEXT

page 7

page 8

research
01/26/2023

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

We present a novel data-efficient semi-supervised framework to improve t...
research
11/16/2016

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised trai...
research
09/10/2021

Partially-supervised novel object captioning leveraging context from paired data

In this paper, we propose an approach to improve image captioning soluti...
research
05/21/2018

Turbo Learning for Captionbot and Drawingbot

We study in this paper the problems of both image captioning and text-to...
research
06/23/2019

Semi-Supervised Learning for Cancer Detection of Lymph Node Metastases

Pathologists find tedious to examine the status of the sentinel lymph no...
research
05/10/2020

A Simple Semi-Supervised Learning Framework for Object Detection

Semi-supervised learning (SSL) has promising potential for improving the...
research
10/22/2021

Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning

The task of image captioning aims to generate captions directly from ima...

Please sign up or login with your details

Forgot password? Click here to reset