Adaptive Offline Quintuplet Loss for Image-Text Matching

03/07/2020
by   Tianlang Chen, et al.
2

Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model. For each image or text anchor in a training mini-batch, the model is trained to distinguish between a positive and the most confusing negative of the anchor mined from the mini-batch (i.e. online hard negative). This strategy improves the model's capacity to discover fine-grained correspondences and non-correspondences between image and text inputs. However, the above training approach has the following drawbacks: (1) the negative selection strategy still provides limited chances for the model to learn from very hard-to-distinguish cases. (2) The trained model has weak generalization capability from the training set to the testing set. (3) The penalty lacks hierarchy and adaptiveness for hard negatives with different “hardness” degrees. In this paper, we propose solutions by sampling negatives offline from the whole training set. It provides “harder” offline negatives than online hard negatives for the model to distinguish. Based on the offline hard negatives, a quintuplet loss is proposed to improve the model's generalization capability to distinguish positives and negatives. In addition, a novel loss function that combines the knowledge of positives, offline hard negatives and online hard negatives is created. It leverages offline hard negatives as intermediary to adaptively penalize them based on their distance relations to the anchor. We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets. Significant performance improvements are observed for all the models, demonstrating the effectiveness and generality of the proposed approach.

READ FULL TEXT
research
08/08/2023

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

Most existing image-text matching methods adopt triplet loss as the opti...
research
07/04/2020

Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches

We analyze the effect of offline and online triplet mining for colorecta...
research
07/11/2022

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

The performance of current Scene Graph Generation (SGG) models is severe...
research
11/27/2019

AdaSample: Adaptive Sampling of Hard Positives for Descriptor Learning

Triplet loss has been widely employed in a wide range of computer vision...
research
06/06/2023

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

In-Batch contrastive learning is a state-of-the-art self-supervised meth...
research
04/06/2020

Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers

Existing open set classifiers distinguish between known and unknown inpu...
research
11/05/2021

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval

Matching model is essential for Image-Text Retrieval framework. Existing...

Please sign up or login with your details

Forgot password? Click here to reset