LiT Tuned Models for Efficient Species Detection

02/12/2023
by   Andre Nakkab, et al.
0

Recent advances in training vision-language models have demonstrated unprecedented robustness and transfer learning effectiveness; however, standard computer vision datasets are image-only, and therefore not well adapted to such training methods. Our paper introduces a simple methodology for adapting any fine-grained image classification dataset for distributed vision-language pretraining. We implement this methodology on the challenging iNaturalist-2021 dataset, comprised of approximately 2.7 million images of macro-organisms across 10,000 classes, and achieve a new state-of-the art model in terms of zero-shot classification accuracy. Somewhat surprisingly, our model (trained using a new method called locked-image text tuning) uses a pre-trained, frozen vision representation, proving that language alignment alone can attain strong transfer learning performance, even on fractious, long-tailed datasets. Our approach opens the door for utilizing high quality vision-language pretrained models in agriculturally relevant applications involving species detection.

READ FULL TEXT
research
12/16/2021

RegionCLIP: Region-based Language-Image Pretraining

Contrastive language-image pretraining (CLIP) using image-text pairs has...
research
03/09/2022

Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning

Despite achieving state-of-the-art zero-shot performance, existing visio...
research
11/22/2021

Florence: A New Foundation Model for Computer Vision

Automated visual understanding of our diverse and open world demands com...
research
10/09/2018

Bird Species Classification using Transfer Learning with Multistage Training

Bird species classification has received more and more attention in the ...
research
08/09/2023

Transferable Models for Bioacoustics with Human Language Supervision

Passive acoustic monitoring offers a scalable, non-invasive method for t...
research
10/09/2022

Learning to Decompose Visual Features with Latent Textual Prompts

Recent advances in pre-training vision-language models like CLIP have sh...
research
12/02/2020

Chair Segments: A Compact Benchmark for the Study of Object Segmentation

Over the years, datasets and benchmarks have had an outsized influence o...

Please sign up or login with your details

Forgot password? Click here to reset