LiT: Zero-Shot Transfer with Locked-image Text Tuning

11/15/2021
by   Xiaohua Zhai, et al.
0

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Text tuning" (LiT-tuning), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT-tuned model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT-tuning is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT-tuned model achieves 84.5 on the challenging out-of-distribution ObjectNet test set.

READ FULL TEXT
research
03/23/2023

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

The field of vision and language has witnessed a proliferation of pre-tr...
research
12/14/2022

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Deep neural networks have been successfully adopted to diverse domains i...
research
03/21/2023

CLIP-ReIdent: Contrastive Training for Player Re-Identification

Sports analytics benefits from recent advances in machine learning provi...
research
12/17/2021

SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers

We propose a novel zero-shot multi-frame image restoration method for re...
research
09/18/2023

Image-Text Pre-Training for Logo Recognition

Open-set logo recognition is commonly solved by first detecting possible...
research
05/21/2022

Life after BERT: What do Other Muppets Understand about Language?

Existing pre-trained transformer analysis works usually focus only on on...
research
09/28/2022

Learning Deep Representations via Contrastive Learning for Instance Retrieval

Instance-level Image Retrieval (IIR), or simply Instance Retrieval, deal...

Please sign up or login with your details

Forgot password? Click here to reset