Entity Cloze By Date: What LMs Know About Unseen Entities

05/05/2022
by   Yasumasa Onoe, et al.
0

Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. However, in a dynamic world, new entities constantly arise. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity. We evaluate LMs' perplexity on masked spans within these sentences. We show that models more informed about the entities, such as those with access to a textual definition of them, achieve lower perplexity on this benchmark. Our experimental results demonstrate that making inferences about new entities remains difficult for LMs. Given its wide coverage on entity knowledge and temporal indexing, our dataset can be used to evaluate LMs and techniques designed to modify or extend their knowledge. Our automatic data collection pipeline can be easily used to continually update our benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2023

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

In this paper, we propose a table and image generation task to verify ho...
research
08/31/2019

EntEval: A Holistic Evaluation Benchmark for Entity Representations

Rich entity representations are useful for a wide class of problems invo...
research
11/19/2022

Entity-Assisted Language Models for Identifying Check-worthy Sentences

We propose a new uniform framework for text classification and ranking t...
research
04/28/2022

Instilling Type Knowledge in Language Models via Multi-Task QA

Understanding human language often necessitates understanding entities a...
research
05/23/2022

StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

Knowledge and language understanding of models evaluated through questio...
research
06/15/2023

Propagating Knowledge Updates to LMs Through Distillation

Modern language models have the capacity to store and use immense amount...
research
12/16/2021

FRUIT: Faithfully Reflecting Updated Information in Text

Textual knowledge bases such as Wikipedia require considerable effort to...

Please sign up or login with your details

Forgot password? Click here to reset