Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching

03/01/2023
by   Zheng Li, et al.
0

Recently, a series of Image-Text Matching (ITM) methods achieve impressive performance. However, we observe that most existing ITM models suffer from gradients vanishing at the beginning of training, which makes these models prone to falling into local minima. Most ITM models adopt triplet loss with Hard Negative mining (HN) as the optimization objective. We find that optimizing an ITM model using only the hard negative samples can easily lead to gradient vanishing. In this paper, we derive the condition under which the gradient vanishes during training. When the difference between the positive pair similarity and the negative pair similarity is close to 0, the gradients on both the image and text encoders will approach 0. To alleviate the gradient vanishing problem, we propose a Selectively Hard Negative Mining (SelHN) strategy, which chooses whether to mine hard negative samples according to the gradient vanishing condition. SelHN can be plug-and-play applied to existing ITM models to give them better training behavior. To further ensure the back-propagation of gradients, we construct a Residual Visual Semantic Embedding model with SelHN, denoted as RVSE++. Extensive experiments on two ITM benchmarks demonstrate the strength of RVSE++, achieving state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2023

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

Most existing image-text matching methods adopt triplet loss as the opti...
research
09/28/2022

Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval

There are two popular loss functions used for vision-language retrieval,...
research
10/21/2022

Dissecting Deep Metric Learning Losses for Image-Text Retrieval

Visual-Semantic Embedding (VSE) is a prevalent approach in image-text re...
research
03/10/2023

Self-supervised Training Sample Difficulty Balancing for Local Descriptor Learning

In the case of an imbalance between positive and negative samples, hard ...
research
04/06/2020

Vanishing Point Guided Natural Image Stitching

Recently, works on improving the naturalness of stitching images gain mo...
research
10/06/2018

Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences

Learning a matching function between two text sequences is a long standi...
research
02/07/2023

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

The strength of modern generative models lies in their ability to be con...

Please sign up or login with your details

Forgot password? Click here to reset