PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition

04/14/2023
by   Xinwen Fan, et al.
0

Pedestrian attribute recognition (PAR) has received increasing attention because of its wide application in video surveillance and pedestrian analysis. Extracting robust feature representation is one of the key challenges in this task. The existing methods mainly use the convolutional neural network (CNN) as the backbone network to extract features. However, these methods mainly focus on small discriminative regions while ignoring the global perspective. To overcome these limitations, we propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules. In the feature extraction module, we build a transformer-based strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks compared with the existing CNN-based baseline methods. In the feature processing module, we propose an effective data augmentation strategy named batch random mask (BRM) block to reinforce the attentive feature learning of random patches. Furthermore, we propose a multi-attribute center loss (MACL) to enhance the inter-attribute discriminability in the feature representations. In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss (MCVL) that enables the network to exploit the viewpoint information. In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions. The above modules interact and jointly learn a highly discriminative feature space, and supervise the generation of the final features. Extensive experimental results show that the proposed PARFormer network performs well compared to the state-of-the-art methods on several public datasets, including PETA, RAP, and PA100K. Code will be released at https://github.com/xwf199/PARFormer.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 9

page 10

research
04/07/2020

Multi-Task Learning via Co-Attentive Sharing for Pedestrian Attribute Recognition

Learning to predict multiple attributes of a pedestrian is a multi-task ...
research
02/08/2021

TransReID: Transformer-based Object Re-Identification

In this paper, we explore the Vision Transformer (ViT), a pure transform...
research
04/02/2020

An Attention-Based Deep Learning Model for Multiple Pedestrian Attributes Recognition

The automatic characterization of pedestrians in surveillance footage is...
research
08/28/2018

Localization Guided Learning for Pedestrian Attribute Recognition

Pedestrian attribute recognition has attracted many attentions due to it...
research
09/28/2017

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

Pedestrian analysis plays a vital role in intelligent video surveillance...
research
09/13/2022

DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

Recent works achieve excellent results in defocus deblurring task based ...
research
04/20/2023

Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition

Existing pedestrian attribute recognition (PAR) algorithms are mainly de...

Please sign up or login with your details

Forgot password? Click here to reset