Training Temporal Word Embeddings with a Compass

06/05/2019
by   Valerio Di Carlo, et al.
1

Temporal word embeddings have been proposed to support the analysis of word meaning shifts during time and to study the evolution of languages. Different approaches have been proposed to generate vector representations of words that embed their meaning during a specific time interval. However, the training process used in these approaches is complex, may be inefficient or it may require large text corpora. As a consequence, these approaches may be difficult to apply in resource-scarce domains or by scientists with limited in-depth knowledge of embedding models. In this paper, we propose a new heuristic to train temporal word embeddings based on the Word2vec model. The heuristic consists in using atemporal vectors as a reference, i.e., as a compass, when training the representations specific to a given time interval. The use of the compass simplifies the training process and makes it more efficient. Experiments conducted using state-of-the-art datasets and methodologies suggest that our approach outperforms or equals comparable approaches while being more robust in terms of the required corpus size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2021

On the Impact of Temporal Representations on Metaphor Detection

State-of-the-art approaches for metaphor detection compare their literal...
research
07/23/2020

Word Embeddings: Stability and Semantic Change

Word embeddings are computed by a class of techniques within natural lan...
research
10/02/2020

Enriching Word Embeddings with Temporal and Spatial Information

The meaning of a word is closely linked to sociocultural factors that ca...
research
04/13/2020

Compass-aligned Distributional Embeddings for Studying Semantic Differences across Corpora

Word2vec is one of the most used algorithms to generate word embeddings ...
research
12/07/2018

Asynchronous Training of Word Embeddings for Large Text Corpora

Word embeddings are a powerful approach for analyzing language and have ...
research
11/08/2019

Ruminating Word Representations with Random Noised Masker

We introduce a training method for both better word representation and p...
research
09/21/2021

InvBERT: Text Reconstruction from Contextualized Embeddings used for Derived Text Formats of Literary Works

Digital Humanities and Computational Literary Studies apply text mining ...

Please sign up or login with your details

Forgot password? Click here to reset