HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

11/22/2019
by   Fangyu Liu, et al.
0

The hubness problem widely exists in high-dimensional embedding space and is a fundamental source of error for cross-modal matching tasks. In this work, we study the emergence of hubs in Visual Semantic Embeddings (VSE) with application to text-image matching. We analyze the pros and cons of two widely adopted optimization objectives for training VSE and propose a novel hubness-aware loss function (HAL) that addresses previous methods' defects. Unlike (Faghri et al.2018) which simply takes the hardest sample within a mini-batch, HAL takes all samples into account, using both local and global statistics to scale up the weights of "hubs". We experiment our method with various configurations of model architectures and datasets. The method exhibits exceptionally good robustness and brings consistent improvement on the task of text-image matching across all settings. Specifically, under the same model architectures as (Faghri et al. 2018) and (Lee at al. 2018), by switching only the learning objective, we report a maximum R@1improvement of 7.4 and 8.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

A Strong and Robust Baseline for Text-Image Matching

We review the current schemes of text-image matching models and propose ...
research
06/11/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Image-text matching plays a central role in bridging the semantic gap be...
research
10/21/2022

Dissecting Deep Metric Learning Losses for Image-Text Retrieval

Visual-Semantic Embedding (VSE) is a prevalent approach in image-text re...
research
08/12/2019

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

A major challenge in matching images and text is that they have intrinsi...
research
11/11/2020

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is ...
research
10/23/2020

Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

Matching information across image and text modalities is a fundamental c...

Please sign up or login with your details

Forgot password? Click here to reset