Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

04/13/2023
by   Mohit Sharma, et al.
1

Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.

READ FULL TEXT

page 5

page 18

research
06/06/2021

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Adapter-based tuning has recently arisen as an alternative to fine-tunin...
research
02/12/2023

Policy-Induced Self-Supervision Improves Representation Finetuning in Visual RL

We study how to transfer representations pretrained on source tasks to t...
research
08/01/2023

ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data

In this study, we demonstrate the application of a hybrid Vision Transfo...
research
03/25/2023

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

Recent vision architectures and self-supervised training methods enable ...
research
11/14/2022

Supervised Fine-tuning Evaluation for Long-term Visual Place Recognition

In this paper, we present a comprehensive study on the utility of deep c...
research
08/04/2022

Standardizing and Centralizing Datasets to Enable Efficient Training of Agricultural Deep Learning Models

In recent years, deep learning models have become the standard for agric...
research
03/14/2019

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

While most previous work has focused on different pretraining objectives...

Please sign up or login with your details

Forgot password? Click here to reset