K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

02/05/2020
by   Ruize Wang, et al.
0

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from the problem of catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further show that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2021

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Continual learning (CL) learns a sequence of tasks incrementally with th...
research
10/16/2022

Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion

Transformer-based pre-trained models like BERT have achieved great progr...
research
01/27/2023

Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers

Since the recent advent of regulations for data protection (e.g., the Ge...
research
07/21/2021

CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision

Recent work has shown success in incorporating pre-trained models like B...
research
05/02/2021

MathBERT: A Pre-Trained Model for Mathematical Formula Understanding

Large-scale pre-trained models like BERT, have obtained a great success ...
research
03/20/2022

Hierarchical Inductive Transfer for Continual Dialogue Learning

Pre-trained models have achieved excellent performance on the dialogue t...
research
01/14/2023

CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials

In recent years, graph neural network (GNN) based approaches have emerge...

Please sign up or login with your details

Forgot password? Click here to reset