Transformer-Patcher: One Mistake worth One Neuron

01/24/2023
by   Zeyu Huang, et al.
0

Large Transformer-based Pretrained Language Models (PLMs) dominate almost all Natural Language Processing (NLP) tasks. Nevertheless, they still make mistakes from time to time. For a model deployed in an industrial environment, fixing these mistakes quickly and robustly is vital to improve user experiences. Previous works formalize such problems as Model Editing (ME) and mostly focus on fixing one mistake. However, the one-mistake-fixing scenario is not an accurate abstraction of the real-world challenge. In the deployment of AI services, there are ever-emerging mistakes, and the same mistake may recur if not corrected in time. Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into Sequential Model Editing (SME) to help develop more practical editing methods. Our study shows that most current ME methods could yield unsatisfying results in this scenario. We then introduce Transformer-Patcher, a novel model editor that can shift the behavior of transformer-based models by simply adding and training a few neurons in the last Feed-Forward Network layer. Experimental results on both classification and generation tasks show that Transformer-Patcher can successively correct up to thousands of errors (Reliability) and generalize to their equivalent inputs (Generality) while retaining the model's accuracy on irrelevant inputs (Locality). Our method outperforms previous fine-tuning and HyperNetwork-based methods and achieves state-of-the-art performance for Sequential Model Editing (SME). The code is available at https://github.com/ZeroYuHuang/Transformer-Patcher.

READ FULL TEXT
research
11/20/2022

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

Large pre-trained models decay over long-term deployment as input distri...
research
08/17/2023

PMET: Precise Model Editing in a Transformer

Model editing techniques modify a minor proportion of knowledge in Large...
research
06/30/2022

FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer

Prompt tuning is an emerging way of adapting pre-trained language models...
research
02/10/2022

Locating and Editing Factual Knowledge in GPT

We investigate the mechanisms underlying factual knowledge recall in aut...
research
08/24/2022

DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

Deep prompt tuning (DPT) has gained great success in most natural langua...
research
05/22/2023

Editing Large Language Models: Problems, Methods, and Opportunities

Recent advancements in deep learning have precipitated the emergence of ...
research
12/21/2022

Automatic Emotion Modelling in Written Stories

Telling stories is an integral part of human communication which can evo...

Please sign up or login with your details

Forgot password? Click here to reset