Fine-tuning Image Transformers using Learnable Memory

03/29/2022
by   Mark Sandler, et al.
0

In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens". We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy when compared to conventional head-only fine-tuning, and performs only slightly below the significantly more expensive full fine-tuning. We then propose an attention-masking approach that enables extension to new downstream tasks, with a computation reuse. In this setup in addition to being parameters efficient, models can execute both old and new tasks as a part of single inference at a small incremental cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Fine-tuning large pre-trained language models on various downstream task...
research
05/25/2023

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autoregressive Transformers adopted in Large Language Models (LLMs) are ...
research
05/26/2023

Do We Really Need a Large Number of Visual Prompts?

Due to increasing interest in adapting models on resource-constrained ed...
research
12/06/2022

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

Intermediate features of a pre-trained model have been shown informative...
research
11/17/2022

How to Fine-Tune Vision Models with SGD

SGD (with momentum) and AdamW are the two most used optimizers for fine-...
research
08/09/2019

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition

A speech-based hearing test is designed to identify the susceptible erro...
research
11/29/2022

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Fine-tuning pre-trained language models (PLMs) achieves impressive perfo...

Please sign up or login with your details

Forgot password? Click here to reset