FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer

06/30/2022
by   Jingping Liu, et al.
12

Prompt tuning is an emerging way of adapting pre-trained language models to downstream tasks. However, the existing studies are mainly to add prompts to the input sequence. This way would not work as expected due to the intermediate multi-head self-attention and feed-forward network computation, making model optimization not very smooth. Hence, we propose a novel tuning way called layer tuning, aiming to add learnable parameters in Transformer layers. Specifically, we focus on layer tuning for feed-forward network in the Transformer, namely FL-tuning. It introduces additional units into the hidden layer of each feed-forward network. We conduct extensive experiments on the public CLUE benchmark. The results show that: 1) Our FL-tuning outperforms prompt tuning methods under both full-data and few-shot settings in almost all cases. In particular, it improves accuracy by 17.93 F1 by 16.142 is more stable and converges about 1.17 times faster than P-tuning v2. 3) With only about 3 comparable with fine-tuning on most datasets, and significantly outperforms fine-tuning (e.g., accuracy improved by 12.9 The source codes are available at https://github.com/genggui001/FL-Tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Recently, fine-tuning pre-trained code models such as CodeBERT on downst...
research
08/12/2023

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

Transfer learning via fine-tuning pre-trained transformer models has gai...
research
09/21/2020

Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules

Training deep learning models on embedded devices is typically avoided s...
research
08/09/2021

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Neural painting refers to the procedure of producing a series of strokes...
research
01/24/2023

Transformer-Patcher: One Mistake worth One Neuron

Large Transformer-based Pretrained Language Models (PLMs) dominate almos...
research
08/17/2023

PMET: Precise Model Editing in a Transformer

Model editing techniques modify a minor proportion of knowledge in Large...
research
01/27/2023

Fine-tuning Neural-Operator architectures for training and generalization

In this work, we present an analysis of the generalization of Neural Ope...

Please sign up or login with your details

Forgot password? Click here to reset