ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

05/15/2020
by   Zhe Wang, et al.
1

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.

READ FULL TEXT

page 2

page 6

page 12

page 13

research
08/05/2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

Person re-identification is an important task that requires learning dis...
research
06/06/2020

MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level

Grounding free-form textual queries necessitates an understanding of the...
research
09/29/2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning

3D visual grounding aims to find the objects within point clouds mention...
research
08/03/2020

PhraseCut: Language-based Image Segmentation in the Wild

We consider the problem of segmenting image regions given a natural lang...
research
08/22/2022

TaCo: Textual Attribute Recognition via Contrastive Learning

As textual attributes like font are core design elements of document for...
research
03/31/2021

Scalable Visual Attribute Extraction through Hidden Layers of a Residual ConvNet

Visual attributes play an essential role in real applications based on i...
research
12/06/2019

Visual-Textual Association with Hardest and Semi-Hard Negative Pairs Mining for Person Search

Searching persons in large-scale image databases with the query of natur...

Please sign up or login with your details

Forgot password? Click here to reset