Temporal Attention for Language Models

02/04/2022
by   Guy D. Rosin, et al.
0

Pretrained language models based on the transformer architecture have shown great success in NLP. Textual training data often comes from the web and is thus tagged with time-specific information, but most language models ignore this information. They are trained on the textual data alone, limiting their ability to generalize temporally. In this work, we extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention - a time-aware self-attention mechanism. Temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points. It allows the transformer to capture this temporal information and create time-specific contextualized word representations. We leverage these representations for the task of semantic change detection; we apply our proposed mechanism to BERT and experiment on three datasets in different languages (English, German, and Latin) that also vary in time, size, and genre. Our proposed model achieves state-of-the-art results on all the datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

Linformer: Self-Attention with Linear Complexity

Large transformer models have shown extraordinary success in achieving s...
research
11/07/2019

Explicit Pairwise Word Interaction Modeling Improves Pretrained Transformers for English Semantic Similarity Tasks

In English semantic similarity tasks, classic word embedding-based appro...
research
11/07/2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

The attention mechanism is considered the backbone of the widely-used Tr...
research
10/04/2022

Memory in humans and deep language models: Linking hypotheses for model augmentation

The computational complexity of the self-attention mechanism in Transfor...
research
10/12/2021

Time Masking for Temporal Language Models

Our world is constantly evolving, and so is the content on the web. Cons...
research
06/23/2023

Knowledge-Infused Self Attention Transformers

Transformer-based language models have achieved impressive success in va...
research
04/07/2022

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Self-attention is a key enabler of state-of-art accuracy for various tra...

Please sign up or login with your details

Forgot password? Click here to reset