DeepAI AI Chat
Log In Sign Up

VTBR: Semantic-based Pretraining for Person Re-Identification

by   Suncheng Xiang, et al.

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that ImageNet pretraining has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, we propose a pure semantic-based pretraining approach named VTBR, which uses dense captions to learn visual representations with fewer images. Specifically, we train convolutional networks from scratch on the captions of FineGPR-C dataset, and transfer them to downstream Re-ID tasks. Comprehensive experiments conducted on benchmarks show that our VTBR can achieve competitive performance compared with ImageNet pretraining – despite using up to 1.4x fewer images, revealing its potential in Re-ID pretraining.


page 2

page 3

page 4


VirTex: Learning Visual Representations from Textual Annotations

The de-facto approach to many vision tasks is to start from pretrained v...

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

Person re-identification (re-ID) aims to recognize a person-of-interest ...

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

Recent years have witnessed great progress in person re-identification (...

Stronger Baseline for Person Re-Identification

Person re-identification (re-ID) aims to identify the same person of int...

CytoImageNet: A large-scale pretraining dataset for bioimage transfer learning

Motivation: In recent years, image-based biological assays have steadily...

Learning Visual Representations with Caption Annotations

Pretraining general-purpose visual features has become a crucial part of...

PASS: An ImageNet replacement for self-supervised pretraining without humans

Computer vision has long relied on ImageNet and other large datasets of ...