DeepAI AI Chat
Log In Sign Up

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

by   Pierre Dognin, et al.

Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions.


STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), ...

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved subst...

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Image captioning has recently demonstrated impressive progress largely o...

Cooperative image captioning

When describing images with natural language, the descriptions can be ma...

Concadia: Tackling image accessibility with context

Images have become an integral part of online media. This has enhanced s...

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...

Neural Fashion Image Captioning : Accounting for Data Diversity

Image captioning has increasingly large domains of application, and fash...