TimeBERT: Enhancing Pre-Trained Language Representations with Temporal Information

04/27/2022
by   Jiexin Wang, et al.
0

Time is an important aspect of text documents, which has been widely exploited in natural language processing and has strong influence, for example, in temporal information retrieval, where the temporal information of queries or documents need to be identified for relevance estimation. Event-related tasks like event ordering, which aims to order events by their occurrence time, also need to determine the temporal information of events. In this work, we investigate methods for incorporating temporal information during pre-training, to further improve the performance on time-related tasks. Compared with BERT which utilizes synchronic document collections (BooksCorpus and English Wikipedia) as the training corpora, we use long-span temporal news collection for building word representations, since temporal information constitutes one of the most significant features of news articles. We then introduce TimeBERT, a novel language representation model trained on a temporal collection of news articles via two new pre-training tasks, which harness two distinct temporal signals to construct time-aware language representation. The experimental results show that TimeBERT consistently outperforms BERT and other existing pre-trained models, with substantial gains on different downstream NLP tasks or applications for which time is of importance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Neural language representation models such as Bidirectional Encoder Repr...
research
12/26/2017

Advances in Pre-Training Distributed Word Representations

Many Natural Language Processing applications nowadays rely on pre-train...
research
09/28/2021

Temporal Information and Event Markup Language: TIE-ML Markup Process and Schema Version 1.0

Temporal Information and Event Markup Language (TIE-ML) is a markup stra...
research
12/12/2021

Topic Detection and Tracking with Time-Aware Document Embeddings

The time at which a message is communicated is a vital piece of metadata...
research
10/24/2018

History by Diversity: Helping Historians search News Archives

Longitudinal corpora like newspaper archives are of immense value to his...
research
03/01/2022

There is a Time and Place for Reasoning Beyond the Image

Images are often more significant than only the pixels to human eyes, as...
research
06/19/2019

Embedding time expressions for deep temporal ordering models

Data-driven models have demonstrated state-of-the-art performance in inf...

Please sign up or login with your details

Forgot password? Click here to reset