#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

03/21/2021
by   Gabriel Oliveira dos Santos, et al.
0

Automatically describing images using natural sentences is an important task to support visually impaired people's inclusion onto the Internet. It is still a big challenge that requires understanding the relation of the objects present in the image and their attributes and actions they are involved in. Then, visual interpretation methods are needed, but linguistic models are also necessary to verbally describe the semantic relations. This problem is known as Image Captioning. Although many datasets were proposed in the literature, the majority contains only English captions, whereas datasets with captions described in other languages are scarce. Recently, a movement called PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Thus, inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images. Further, the captions in our dataset bring additional challenges to the problem: first, in contrast to popular datasets such as MS COCO Captions, #PraCegoVer has only one reference to each image; also, both mean and variance of our reference sentence length are significantly greater than those in the MS COCO Captions. These two characteristics contribute to making our dataset interesting due to the linguistic aspect and the challenges that it introduces to the image captioning problem. We publicly-share the dataset at https://github.com/gabrielsantosrv/PraCegoVer.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

page 10

page 11

page 12

page 13

page 14

page 17

05/02/2017

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), ...
09/28/2021

CIDEr-R: Robust Consensus-based Image Description Evaluation

This paper shows that CIDEr-D, a traditional evaluation metric for image...
02/20/2020

Captioning Images Taken by People Who Are Blind

While an important problem in the vision community is to design algorith...
10/02/2020

CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns

Recently, Deep Learning (DL) methods have shown an excellent performance...
11/07/2015

Generation and Comprehension of Unambiguous Object Descriptions

We propose a method that can generate an unambiguous description (known ...
12/22/2016

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

We introduce a new multi-modal task for computer systems, posed as a com...
12/21/2020

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Image captioning has recently demonstrated impressive progress largely o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.