Augmenting Sub-model to Improve Main Model

06/20/2023
by   Byeongho Heo, et al.
0

Image classification has improved with the development of training techniques. However, these techniques often require careful parameter tuning to balance the strength of regularization, limiting their potential benefits. In this paper, we propose a novel way to use regularization called Augmenting Sub-model (AugSub). AugSub consists of two models: the main model and the sub-model. While the main model employs conventional training recipes, the sub-model leverages the benefit of additional regularization. AugSub achieves this by mitigating adverse effects through a relaxed loss function similar to self-distillation loss. We demonstrate the effectiveness of AugSub with three drop techniques: dropout, drop-path, and random masking. Our analysis shows that all AugSub improves performance, with the training loss converging even faster than regular training. Among the three, AugMask is identified as the most practical method due to its performance and cost efficiency. We further validate AugMask across diverse training recipes, including DeiT-III, ResNet, MAE fine-tuning, and Swin Transformer. The results show that AugMask consistently provides significant performance gain. AugSub provides a practical and effective solution for introducing additional regularization under various training recipes. Code is available at <https://github.com/naver-ai/augsub>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2022

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Conventional NAS-based pruning algorithms aim to find the sub-network wi...
research
06/28/2021

R-Drop: Regularized Dropout for Neural Networks

Dropout is a powerful and widely used technique to regularize the traini...
research
07/25/2023

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

As the size of transformer-based models continues to grow, fine-tuning t...
research
07/21/2023

A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of Embodied AI

The advent of Chat-GPT has led to a surge of interest in Embodied AI. Ho...
research
04/01/2021

EfficientNetV2: Smaller Models and Faster Training

This paper introduces EfficientNetV2, a new family of convolutional netw...
research
05/28/2023

Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning

Parameter-efficient tuning methods (PETs) have achieved promising result...
research
04/20/2022

Does Interference Exist When Training a Once-For-All Network?

The Once-For-All (OFA) method offers an excellent pathway to deploy a tr...

Please sign up or login with your details

Forgot password? Click here to reset