ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

04/07/2022
by   Sanghyuk Chun, et al.
0

Image-Test matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models. However, existing ITM benchmarks have a significant limitation. They have many missing correspondences, originating from the data construction process itself. For example, a caption is only matched with one image although the caption can be matched with other similar images, and vice versa. To correct the massive false negatives, we construct the Extended COCO Validation (ECCV) Caption dataset by supplying the missing associations with machine and human annotators. We employ five state-of-the-art ITM models with diverse properties for our annotation process. Our dataset provides x3.6 positive image-to-caption associations and x8.5 caption-to-image associations compared to the original MS-COCO. We also propose to use an informative ranking-based metric, rather than the popular Recall@K(R@K). We re-evaluate the existing 25 VL models on existing and proposed benchmarks. Our findings are that the existing benchmarks, such as COCO 1K R@K, COCO 5K R@K, CxC R@1 are highly correlated with each other, while the rankings change when we shift to the ECCV mAP. Lastly, we delve into the effect of the bias introduced by the choice of machine annotator. Source code and dataset are available at https://github.com/naver-ai/eccv-caption

READ FULL TEXT

page 2

page 6

page 8

page 12

page 14

page 17

page 18

page 24

research
04/21/2023

RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models

Recently, large-scale vision-language pre-training models and visual sem...
research
04/30/2020

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

Image captioning datasets have proven useful for multimodal representati...
research
09/06/2019

Visual Semantic Reasoning for Image-Text Matching

Image-text matching has been a hot research topic bridging the vision an...
research
05/29/2023

Improved Probabilistic Image-Text Representations

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...
research
03/30/2023

Neglected Free Lunch – Learning Image Classifiers Using Annotation Byproducts

Supervised learning of image classifiers distills human knowledge into a...
research
05/03/2017

FOIL it! Find One mismatch between Image and Language caption

In this paper, we aim to understand whether current language and vision ...
research
07/21/2022

Efficient Graph-Friendly COCO Metric Computation for Train-Time Model Evaluation

Evaluating the COCO mean average precision (MaP) and COCO recall metrics...

Please sign up or login with your details

Forgot password? Click here to reset