Where Does the Performance Improvement Come From? – A Reproducibility Concern about Image-Text Retrieval

03/08/2022
by   Jun Rao, et al.
0

This paper seeks to provide the information retrieval community with some reflections on the current improvements of retrieval learning through the analysis of the reproducibility aspects of image-text retrieval models. For the latter part of the past decade, image-text retrieval has gradually become a major research direction in the field of information retrieval because of the growth of multi-modal data. Many researchers use benchmark datasets like MS-COCO and Flickr30k to train and assess the performance of image-text retrieval algorithms. Research in the past has mostly focused on performance, with several state-of-the-art methods being proposed in various ways. According to their claims, these approaches achieve better modal interactions and thus better multimodal representations with greater precision. In contrast to those previous works, we focus on the repeatability of the approaches and the overall examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text. To be more specific, we first examine the related reproducibility concerns and why the focus is on image-text retrieval tasks, and then we systematically summarize the current paradigm of image-text retrieval models and the stated contributions of those approaches. Second, we analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models. Based on this, we conducted ablation experiments and obtained some influencing factors that affect retrieval recall more than the improvement claimed in the original paper. Finally, we also present some reflections and issues that should be considered by the retrieval community in the future. Our code is freely available at https://github.com/WangFei-2019/Image-text-Retrieval.

READ FULL TEXT
research
04/21/2023

Rethinking Benchmarks for Cross-modal Image-text Retrieval

Image-text retrieval, as a fundamental and important branch of informati...
research
09/28/2022

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Multimodal learning is a recent challenge that extends unimodal learning...
research
08/27/2023

Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment

Image-text retrieval requires the system to bridge the heterogenous gap ...
research
04/20/2020

Transformer Reasoning Network for Image-Text Matching and Retrieval

Image-text matching is an interesting and fascinating task in modern AI ...
research
04/06/2023

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Cross-modal retrieval methods are the preferred tool to search databases...
research
01/11/2023

HADA: A Graph-based Amalgamation Framework in Image-text Retrieval

Many models have been proposed for vision and language tasks, especially...
research
08/02/2022

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Extracting knowledge from unlabeled texts using machine learning algorit...

Please sign up or login with your details

Forgot password? Click here to reset