Noisy-Correspondence Learning for Text-to-Image Person Re-identification

08/19/2023
by   Yang Qin, et al.
0

Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, the image-text pairs inevitably exist under-correlated or even false-correlated, a.k.a noisy correspondence (NC), due to the low quality of the images and annotation errors. To address this problem, we propose a novel Robust Dual Embedding method (RDE) that can learn robust visual-semantic associations even with NC. Specifically, RDE consists of two main components: 1) A Confident Consensus Division (CCD) module that leverages the dual-grained decisions of dual embedding modules to obtain a consensus set of clean training data, which enables the model to learn correct and reliable visual-semantic associations. 2) A Triplet-Alignment Loss (TAL) relaxes the conventional triplet-ranking loss with hardest negatives, which tends to rapidly overfit NC, to a log-exponential upper bound over all negatives, thus preventing the model from overemphasizing false image-text pairs. We conduct extensive experiments on three public benchmarks, namely CUHK-PEDES, ICFG-PEDES, and RSTPReID, to evaluate the performance and robustness of our RDE. Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on all three datasets.

READ FULL TEXT

page 1

page 11

page 13

research
02/26/2022

An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing

The development of accurate and scalable cross-modal image-text retrieva...
research
04/05/2023

Calibrating Cross-modal Feature for Text-Based Person Searching

We present a novel and effective method calibrating cross-modal features...
research
12/13/2021

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Text-based person search aims to retrieve images of a certain pedestrian...
research
07/18/2023

Unleashing the Imagination of Text: A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words

The goal of Text-to-image person retrieval is to retrieve person images ...
research
12/17/2019

In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation

The comparative losses (typically, triplet loss) are appealing choices f...
research
06/03/2023

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

Language-guided image retrieval enables users to search for images and i...
research
03/15/2023

Mining False Positive Examples for Text-Based Person Re-identification

Text-based person re-identification (ReID) aims to identify images of th...

Please sign up or login with your details

Forgot password? Click here to reset