giMLPs: Gate with Inhibition Mechanism in MLPs

08/01/2022
by   Cheng Kang, et al.
0

This paper presents a new model architecture, gate with inhibition MLP (giMLP).The gate with inhibition on CycleMLP (gi-CycleMLP) can produce equal performance on the ImageNet classification task, and it also improves the BERT, Roberta, and DeBERTaV3 models depending on two novel techniques. The first is the gating MLP, where matrix multiplications between the MLP and the trunk Attention input in further adjust models' adaptation. The second is inhibition which inhibits or enhances the branch adjustment, and with the inhibition levels increasing, it offers models more muscular features restriction. We show that the giCycleMLP with a lower inhibition level can be competitive with the original CycleMLP in terms of ImageNet classification accuracy. In addition, we also show through a comprehensive empirical study that these techniques significantly improve the performance of fine-tuning NLU downstream tasks. As for the gate with inhibition MLPs on DeBERTa (giDeBERTa) fine-tuning, we find it can achieve appealing results on most parts of NLU tasks without any extra pretraining again. We also find that with the use of Gate With Inhibition, the activation function should have a short and smooth negative tail, with which the unimportant features or the features that hurt models can be moderately inhibited. The experiments on ImageNet and twelve language downstream tasks demonstrate the effectiveness of Gate With Inhibition, both for image classification and for enhancing the capacity of nature language fine-tuning without any extra pretraining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Fine-tuning large pre-trained language models on various downstream task...
research
04/27/2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Deep pretrained language models have achieved great success in the way o...
research
06/21/2023

SIFTER: A Task-specific Alignment Strategy for Enhancing Sentence Embeddings

The paradigm of pre-training followed by fine-tuning on downstream tasks...
research
07/18/2022

STT: Soft Template Tuning for Few-Shot Adaptation

Prompt tuning has been an extremely effective tool to adapt a pre-traine...
research
10/26/2022

Exploring Robustness of Prefix Tuning in Noisy Data: A Case Study in Financial Sentiment Analysis

The invention of transformer-based models such as BERT, GPT, and RoBERTa...
research
05/08/2021

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Transfer learning has become the dominant paradigm for many natural lang...
research
05/26/2022

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

This paper deals with deep transductive learning, and proposes TransBoos...

Please sign up or login with your details

Forgot password? Click here to reset