Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

08/05/2018
by   Dapeng Chen, et al.
0

Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities. Diverse auxiliary information has been utilized to improve the visual feature learning. In this paper, we propose to exploit natural language description as additional training supervisions for effective visual features. Compared with other auxiliary information, language can describe a specific person from more compact and semantic visual aspects, thus is complementary to the pixel-level image data. Our method not only learns better global visual feature with the supervision of the overall description but also enforces semantic consistencies between local visual and linguistic features, which is achieved by building global and local image-language associations. The global image-language association is established according to the identity labels, while the local association is based upon the implicit correspondences between image regions and noun phrases. Extensive experiments demonstrate the effectiveness of employing language as training supervisions with the two association schemes. Our method achieves state-of-the-art performance without utilizing any auxiliary information during testing and shows better performance than other joint embedding methods for the image-language association.

READ FULL TEXT

page 12

page 13

research
12/24/2022

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

In person re-identification (ReID) tasks, many works explore the learnin...
research
08/04/2023

Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Recently, visual-language learning has shown great potential in enhancin...
research
05/15/2020

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Person search by natural language aims at retrieving a specific person i...
research
02/27/2015

Hybrid coding of visual content and local image features

Distributed visual analysis applications, such as mobile visual search o...
research
08/31/2020

Receptive Multi-granularity Representation for Person Re-Identification

A key for person re-identification is achieving consistent local details...
research
10/27/2020

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Spatial Description Resolution, as a language-guided localization task, ...
research
05/22/2022

Evidence for Hypodescent in Visual Semantic AI

We examine the state-of-the-art multimodal "visual semantic" model CLIP ...

Please sign up or login with your details

Forgot password? Click here to reset