VirTex: Learning Visual Representations from Textual Annotations

06/11/2020
by   Karan Desai, et al.
16

The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet. Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, we aim to learn high-quality visual representations from fewer images. To this end, we revisit supervised pretraining, and seek data-efficient alternatives to classification-based pretraining. We propose VirTex – a pretraining approach using semantically dense captions to learn visual representations. We train convolutional networks from scratch on COCO Captions, and transfer them to downstream recognition tasks including image classification, object detection, and instance segmentation. On all tasks, VirTex yields features that match or exceed those learned on ImageNet – supervised or unsupervised – despite using up to ten times fewer images.

READ FULL TEXT

page 2

page 5

page 14

page 19

page 20

page 21

research
10/11/2021

VTBR: Semantic-based Pretraining for Person Re-Identification

Pretraining is a dominant paradigm in computer vision. Generally, superv...
research
08/04/2020

Learning Visual Representations with Caption Annotations

Pretraining general-purpose visual features has become a crucial part of...
research
08/11/2022

MILAN: Masked Image Pretraining on Language Assisted Representation

Self-attention based transformer models have been dominating many comput...
research
08/26/2021

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

Computer vision tasks such as object detection and semantic/instance seg...
research
08/05/2022

RadTex: Learning Efficient Radiograph Representations from Text Reports

Automated analysis of chest radiography using deep learning has tremendo...
research
06/11/2020

What makes instance discrimination good for transfer learning?

Unsupervised visual pretraining based on the instance discrimination pre...
research
07/17/2023

Does Visual Pretraining Help End-to-End Reasoning?

We aim to investigate whether end-to-end learning of visual reasoning ca...

Please sign up or login with your details

Forgot password? Click here to reset