Concadia: Tackling image accessibility with context

04/16/2021
by   Elisa Kreiss, et al.
0

Images have become an integral part of online media. This has enhanced self-expression and the dissemination of knowledge, but it poses serious accessibility challenges. Adequate textual descriptions are rare. Captions are more abundant, but they do not consistently provide the needed descriptive details, and systems trained on such texts inherit these shortcomings. To address this, we introduce the publicly available Wikipedia-based corpus Concadia, which consists of 96,918 images with corresponding English-language descriptions, captions, and surrounding context. We use Concadia to further characterize the commonalities and differences between descriptions and captions, and this leads us to the hypothesis that captions, while not substitutes for descriptions, can provide a useful signal for creating effective descriptions. We substantiate this hypothesis by showing that image captioning systems trained on Concadia benefit from having caption embeddings as part of their inputs. These experiments also begin to show how Concadia can be a powerful tool in addressing the underlying accessibility issues posed by image data.

READ FULL TEXT
research
05/02/2017

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), ...
research
01/26/2023

Paraphrase Acquisition from Image Captions

We propose to use captions from the Web as a previously underutilized re...
research
12/21/2020

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

Image captioning systems have made substantial progress, largely due to ...
research
01/15/2021

Catching Out-of-Context Misinformation with Self-supervised Learning

Despite the recent attention to DeepFakes and other forms of image manip...
research
06/21/2022

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching

With the increased accessibility of web and online encyclopedias, the am...
research
06/23/2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Image captioning aims to describe visual content in natural language. As...
research
09/21/2022

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

Humans exploit prior knowledge to describe images, and are able to adapt...

Please sign up or login with your details

Forgot password? Click here to reset