Keep the Caption Information: Preventing Shortcut Learning in Contrastive Image-Caption Retrieval

04/28/2022
by   Maurits Bleeker, et al.
1

To train image-caption retrieval (ICR) methods, contrastive loss functions are a common choice for optimization functions. Unfortunately, contrastive ICR methods are vulnerable to learning shortcuts: decision rules that perform well on the training data but fail to transfer to other testing conditions. We introduce an approach to reduce shortcut feature representations for the ICR task: latent target decoding (LTD). We add an additional decoder to the learning framework to reconstruct the input caption, which prevents the image and caption encoder from learning shortcut features. Instead of reconstructing input captions in the input space, we decode the semantics of the caption in a latent space. We implement the LTD objective as an optimization constraint, to ensure that the reconstruction loss is below a threshold value while primarily optimizing for the contrastive loss. Importantly, LTD does not depend on additional training data or expensive (hard) negative mining strategies. Our experiments show that, unlike reconstructing the input caption, LTD reduces shortcut learning and improves generalizability by obtaining higher recall@k and r-precision scores. Additionally, we show that the evaluation scores benefit from implementing LTD as an optimization constraint instead of a dual loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

Wasserstein autoencoder (WAE) shows that matching two distributions is e...
research
03/10/2022

MetAug: Contrastive Learning via Meta Feature Augmentation

What matters for contrastive learning? We argue that contrastive learnin...
research
01/11/2023

Generative-Contrastive Learning for Self-Supervised Latent Representations of 3D Shapes from Multi-Modal Euclidean Input

We propose a combined generative and contrastive neural architecture for...
research
06/05/2023

SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower Negatives

Dual encoders have been used for retrieval tasks and representation lear...
research
09/06/2023

Contrastive Learning as Kernel Approximation

In standard supervised machine learning, it is necessary to provide a la...
research
03/29/2022

Contrasting the landscape of contrastive and non-contrastive learning

A lot of recent advances in unsupervised feature learning are based on d...
research
05/02/2023

TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

In this paper, we present TMR, a simple yet effective approach for text ...

Please sign up or login with your details

Forgot password? Click here to reset