Neural Knowledge Bank for Pretrained Transformers

07/31/2022
by   Damai Dai, et al.
18

The ability of pretrained Transformers to remember factual knowledge is essential but still limited for existing models. Inspired by existing work that regards Feed-Forward Networks (FFNs) in Transformers as key-value memories, we design a Neural Knowledge Bank (NKB) and a knowledge injection strategy to introduce extra factual knowledge for pretrained Transformers. The NKB is in the form of additional knowledgeable memory slots to the FFN and the memory-like architecture makes it highly interpretable and flexible. When injecting extra knowledge with the Salient Span Masking (SSM) pretraining objective, we fix the original pretrained model and train only the NKB. This training strategy makes sure the general language modeling ability of the original pretrained model is not influenced. By mounting the NKB onto the T5 model, we verify its strong ability to store extra factual knowledge based on three closed-book question answering datasets. Also, we prove that mounting the NKB will not degrade the general language modeling ability of T5 through two representative tasks, summarization and machine translation. Further, we thoroughly analyze the interpretability of the NKB and reveal the meaning of its keys and values in a human-readable way. Finally, we show the flexibility of the NKB by directly modifying its value vectors to update the factual knowledge stored in it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2021

Knowledge Neurons in Pretrained Transformers

Large-scale pretrained language models are surprisingly good at recallin...
research
10/08/2020

Large Product Key Memory for Pretrained Language Models

Product key memory (PKM) proposed by Lample et al. (2019) enables to imp...
research
10/12/2022

Context Generation Improves Open Domain Question Answering

Closed-book question answering (QA) requires a model to directly answer ...
research
05/23/2023

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
research
12/31/2020

Verb Knowledge Injection for Multilingual Event Processing

In parallel to their overwhelming success across NLP tasks, language abi...
research
04/07/2020

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

We introduce a novel approach to transformers that learns hierarchical r...
research
08/01/2022

On the Limitations of Sociodemographic Adaptation with Transformers

Sociodemographic factors (e.g., gender or age) shape our language. Previ...

Please sign up or login with your details

Forgot password? Click here to reset