Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

03/25/2023
by   Denis Kuznedelev, et al.
3

Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as classification across vehicle types or animal species. Crucially, TACO works in few-shot fashion, i.e. only a few task-specific samples are used, and the procedure has low computational overheads. We validate TACO on highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet, LAION, or iNaturalist, which we specialize and compress to a diverse set of "downstream" subtasks. TACO can reduce the number of non-zero parameters in existing models by up to 20x relative to the original models, leading to inference speedups of up to 3×, while remaining accuracy-competitive with the uncompressed models on the specialized tasks.

READ FULL TEXT
research
01/23/2023

A Simple Recipe for Competitive Low-compute Self supervised Vision Models

Self-supervised methods in vision have been mostly focused on large arch...
research
05/26/2022

Matryoshka Representations for Adaptive Deployment

Learned representations are a central component in modern ML systems, se...
research
04/13/2023

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Recent works have shown that large models pretrained on common visual le...
research
10/31/2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Generative Pre-trained Transformer (GPT) models set themselves apart thr...
research
08/28/2023

Adversarial Attacks on Foundational Vision Models

Rapid progress is being made in developing large, pretrained, task-agnos...
research
10/13/2020

Which Model to Transfer? Finding the Needle in the Growing Haystack

Transfer learning has been recently popularized as a data-efficient alte...
research
10/08/2022

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

Recent advances in learning aligned multimodal representations have been...

Please sign up or login with your details

Forgot password? Click here to reset