Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

09/13/2022
by   Zijie Wang, et al.
0

The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a cross-modal distribution consensus prediction (CDCP) manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this CDCP dilemma, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C^3M) for text-based person retrieval. The core idea of our method, just as a Chinese saying goes, is to `san si er hou xing', namely, to Look Before yoU Leap (LBUL). The common manifold mapping mechanism of LBUL contains a looking step and a leaping step. Compared to CDCP-based methods, LBUL considers distribution characteristics of both the visual and textual modalities before embedding data from one certain modality into C^3M to achieve a more solid cross-modal distribution consensus, and hence achieve a superior retrieval accuracy. We evaluate our proposed method on two text-based person retrieval datasets CUHK-PEDES and RSTPReid. Experimental results demonstrate that the proposed LBUL outperforms previous methods and achieves the state-of-the-art performance.

READ FULL TEXT

page 4

page 8

research
11/17/2017

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in bo...
research
09/12/2021

DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Many previous methods on text-based person retrieval tasks are devoted t...
research
10/11/2022

Cross-modal Search Method of Technology Video based on Adversarial Learning and Feature Fusion

Technology videos contain rich multi-modal information. In cross-modal i...
research
12/19/2016

Cross-Modal Manifold Learning for Cross-modal Retrieval

This paper presents a new scalable algorithm for cross-modal similarity ...
research
02/04/2017

Simple to Complex Cross-modal Learning to Rank

The heterogeneity-gap between different modalities brings a significant ...
research
05/12/2021

Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization

Multimodal manifold modeling methods extend the spectral geometry-aware ...
research
09/01/2020

Practical Cross-modal Manifold Alignment for Grounded Language

We propose a cross-modality manifold alignment procedure that leverages ...

Please sign up or login with your details

Forgot password? Click here to reset