Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

by   Yixiao Wang, et al.

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.


page 1

page 2

page 3

page 4


Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

Learning vectors that capture the meaning of concepts remains a fundamen...

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from ...

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...

Rehabilitation of Count-based Models for Word Vector Representations

Recent works on word representations mostly rely on predictive models. D...

Nonparametric Spherical Topic Modeling with Word Embeddings

Traditional topic models do not account for semantic regularities in lan...

Word embeddings for idiolect identification

The term idiolect refers to the unique and distinctive use of language o...

Semantic projection: recovering human knowledge of multiple, distinct object features from word embeddings

The words of a language reflect the structure of the human mind, allowin...

Please sign up or login with your details

Forgot password? Click here to reset