Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

10/23/2020
by   Li Ren, et al.
0

Matching information across image and text modalities is a fundamental challenge for many applications that involve both vision and natural language processing. The objective is to find efficient similarity metrics to compare the similarity between visual and textual information. Existing approaches mainly match the local visual objects and the sentence words in a shared space with attention mechanisms. The matching performance is still limited because the similarity computation is based on simple comparisons of the matching features, ignoring the characteristics of their distribution in the data. In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words. Specifically, we propose a novel Adversarial Discriminative Domain Regularization (ADDR) learning framework, beyond the paradigm metric learning objective, to construct a set of discriminative data domains within each image-text pairs. Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks by regulating the distribution of the hidden space between the matching pairs. The experimental results show that this new approach significantly improves the overall performance of several popular cross-modal matching techniques (SCAN, VSRN, BFAN) on the MS-COCO and Flickr30K benchmarks.

READ FULL TEXT

page 1

page 8

research
10/07/2020

Universal Weighting Metric Learning for Cross-Modal Matching

Cross-modal matching has been a highlighted research topic in both visio...
research
03/22/2023

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

As one of the most fundamental techniques in multimodal learning, cross-...
research
03/21/2017

Cross-modal Deep Metric Learning with Multi-task Regularization

DNN-based cross-modal retrieval has become a research hotspot, by which ...
research
03/22/2023

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Text-to-image person retrieval aims to identify the target person based ...
research
03/21/2018

Stacked Cross Attention for Image-Text Matching

In this paper, we study the problem of image-text matching. Inferring th...
research
10/12/2020

FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching

Detection of semantic similarity plays a vital role in sentence matching...
research
11/22/2019

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

The hubness problem widely exists in high-dimensional embedding space an...

Please sign up or login with your details

Forgot password? Click here to reset