Kformer: Knowledge Injection in Transformer Feed-Forward Layers

01/15/2022
by   Yunzhi Yao, et al.
0

Knowledge-Enhanced Model have developed a diverse set of techniques for knowledge integration on different knowledge sources. However, most previous work neglect the language model's own ability and simply concatenate external knowledge at the input. Recent work proposed that Feed Forward Network (FFN) in pre-trained language model can be seen as an memory that stored factual knowledge. In this work, we explore the FFN in Transformer and propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer. We empirically find that simply injecting knowledge into FFN can enhance the pre-trained language model's ability and facilitate current knowledge fusion methods. Our results on two benchmarks in the commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE) domains demonstrate that Kformer can utilize external knowledge deeply and achieves absolute improvements in these tasks.

READ FULL TEXT
research
05/15/2023

Knowledge Rumination for Pre-trained Language Models

Previous studies have revealed that vanilla pre-trained language models ...
research
09/19/2019

Exploring ways to incorporate additional knowledge to improve Natural Language Commonsense Question Answering

DARPA and Allen AI have proposed a collection of datasets to encourage r...
research
10/29/2020

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

This paper presents a novel fusion method for integrating an external la...
research
04/08/2021

Revisiting Simple Neural Probabilistic Language Models

Recent progress in language modeling has been driven not only by advance...
research
09/06/2021

Enhancing Language Models with Plug-and-Play Large-Scale Commonsense

We study how to enhance language models (LMs) with textual commonsense k...
research
10/14/2019

Pruning a BERT-based Question Answering Model

We investigate compressing a BERT-based question answering system by pru...
research
08/05/2020

Working Memory for Online Memory Binding Tasks: A Hybrid Model

Working Memory is the brain module that holds and manipulates informatio...

Please sign up or login with your details

Forgot password? Click here to reset