Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

06/15/2021
by   Yixiao Wang, et al.
0

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

Learning vectors that capture the meaning of concepts remains a fundamen...
research
10/18/2019

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from ...
research
12/04/2020

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...
research
12/16/2014

Rehabilitation of Count-based Models for Word Vector Representations

Recent works on word representations mostly rely on predictive models. D...
research
04/01/2016

Nonparametric Spherical Topic Modeling with Word Embeddings

Traditional topic models do not account for semantic regularities in lan...
research
02/10/2019

Word embeddings for idiolect identification

The term idiolect refers to the unique and distinctive use of language o...
research
02/05/2018

Semantic projection: recovering human knowledge of multiple, distinct object features from word embeddings

The words of a language reflect the structure of the human mind, allowin...

Please sign up or login with your details

Forgot password? Click here to reset