Image-text Retrieval via preserving main Semantics of Vision

04/20/2023
by   Xu Zhang, et al.
0

Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches for this task map images and texts into a common space to create correspondences between the two modalities. However, due to the content (semantics) richness of an image, redundant secondary information in an image may cause false matches. To address this issue, this paper presents a semantic optimization approach, implemented as a Visual Semantic Loss (VSL), to assist the model in focusing on an image's main content. This approach is inspired by how people typically annotate the content of an image by describing its main content. Thus, we leverage the annotated texts corresponding to an image to assist the model in capturing the main content of the image, reducing the negative impact of secondary content. Extensive experiments on two benchmark datasets (MSCOCO and Flickr30K) demonstrate the superior performance of our method. The code is available at: https://github.com/ZhangXu0963/VSL.

READ FULL TEXT

page 1

page 6

research
08/08/2023

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

Most existing cross-modal retrieval methods employ two-stream encoders w...
research
04/04/2023

AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

This paper presents the AToMiC (Authoring Tools for Multimedia Content) ...
research
07/11/2022

Intra-Modal Constraint Loss For Image-Text Retrieval

Cross-modal retrieval has drawn much attention in both computer vision a...
research
01/01/2023

Optimizing Readability Using Genetic Algorithms

This research presents ORUGA, a method that tries to automatically optim...
research
05/07/2023

Cross-Modal Retrieval for Motion and Text via MildTriple Loss

Cross-modal retrieval has become a prominent research topic in computer ...
research
05/03/2023

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

Pretrained Vision-Language Models (VLMs) have achieved remarkable perfor...
research
05/29/2023

Improved Probabilistic Image-Text Representations

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...

Please sign up or login with your details

Forgot password? Click here to reset