Alleviating Representational Shift for Continual Fine-tuning

04/22/2022
by   Shibo Jie, et al.
0

We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.

READ FULL TEXT
research
10/20/2022

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

A common approach to transfer learning under distribution shift is to fi...
research
06/05/2023

Continual Learning with Pretrained Backbones by Tuning in the Input Space

The intrinsic difficulty in adapting deep learning models to non-station...
research
08/15/2023

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

Fine-tuning pre-trained neural network models has become a widely adopte...
research
07/16/2023

Tangent Model Composition for Ensembling and Continual Fine-tuning

Tangent Model Composition (TMC) is a method to combine component models ...
research
06/19/2023

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Fine-tuning has been proven to be a simple and effective technique to tr...
research
02/15/2023

The Expressive Power of Tuning Only the Norm Layers

Feature normalization transforms such as Batch and Layer-Normalization h...
research
05/03/2023

SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning

We propose SimSC, a remarkably simple framework, to address the problem ...

Please sign up or login with your details

Forgot password? Click here to reset