E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks

11/10/2020
by   Nikolaos Stylianou, et al.
0

In the last decade, the field of Neural Language Modelling has witnessed enormous changes, with the development of novel models through the use of Transformer architectures. However, even these models struggle to model long sequences due to memory constraints and increasing computational complexity. Coreference annotations over the training data can provide context far beyond the modelling limitations of such language models. In this paper we present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2, in order to incorporate entity annotations during training. Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present. To that end, we achieve richer representations for entity mentions, with insignificant training cost. We show the comparative model performance between GPT2 and GPT2E in terms of Perplexity on the CoNLL 2012 and LAMBADA datasets as well as the key differences in the entity representations and their effects in downstream tasks such as Named Entity Recognition. Furthermore, our approach can be adopted by the majority of Transformer-based language models.

READ FULL TEXT
research
08/15/2023

Informed Named Entity Recognition Decoding for Generative Language Models

Ever-larger language models with ever-increasing capabilities are by now...
research
11/04/2021

CoreLM: Coreference-aware Language Model Fine-Tuning

Language Models are the underpin of all modern Natural Language Processi...
research
04/18/2023

HeRo: RoBERTa and Longformer Hebrew Language Models

In this paper, we fill in an existing gap in resources available to the ...
research
05/01/2020

Multi-scale Transformer Language Models

We investigate multi-scale transformer language models that learn repres...
research
06/01/2023

Birth of a Transformer: A Memory Viewpoint

Large language models based on transformers have achieved great empirica...
research
05/03/2022

Mixed-effects transformers for hierarchical adaptation

Language use differs dramatically from context to context. To some degre...
research
07/24/2020

Named entity recognition in chemical patents using ensemble of contextual language models

Chemical patent documents describe a broad range of applications holding...

Please sign up or login with your details

Forgot password? Click here to reset