Universal Weighting Metric Learning for Cross-Modal Matching

10/07/2020
by   Jiwei Wei, et al.
5

Cross-modal matching has been a highlighted research topic in both vision and language areas. Learning appropriate mining strategy to sample and weight informative pairs is crucial for the cross-modal matching performance. However, most existing metric learning methods are developed for unimodal matching, which is unsuitable for cross-modal matching on multimodal data with heterogeneous features. To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions. Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively. Experimental results on two image-text matching benchmarks and two video-text matching benchmarks validate the efficacy of the proposed method.

READ FULL TEXT
research
10/23/2020

Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

Matching information across image and text modalities is a fundamental c...
research
09/10/2021

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Temporal grounding aims to localize a video moment which is semantically...
research
07/07/2019

Informative Visual Storytelling with Cross-modal Rules

Existing methods in the Visual Storytelling field often suffer from the ...
research
06/10/2020

Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems

With the explosive growth of transaction activities in online payment sy...
research
03/22/2023

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

As one of the most fundamental techniques in multimodal learning, cross-...
research
03/25/2021

Rethinking Deep Contrastive Learning with Embedding Memory

Pair-wise loss functions have been extensively studied and shown to cont...
research
10/11/2020

Boosting Continuous Sign Language Recognition via Cross Modality Augmentation

Continuous sign language recognition (SLR) deals with unaligned video-te...

Please sign up or login with your details

Forgot password? Click here to reset