Contextually Propagated Term Weights for Document Representation

06/03/2019
by   Casper Hansen, et al.
0

Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target word. Thus, our model aims to simulate how semantic meaning is shared by words occurring in similar contexts, which is incorporated into bag-of-words document representations. Experimental evaluation in an unsupervised setting against 8 state of the art baselines shows that our model yields the best micro and macro F1 scores across datasets of increasing difficulty.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2016

Learning Word Embeddings from Intrinsic and Extrinsic Views

While word embeddings are currently predominant for natural language pro...
research
05/17/2017

Utility of general and specific word embeddings for classifying translational stages of research

Conventional text classification models make a bag-of-words assumption r...
research
04/19/2022

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Extracting phenotypes from clinical text has been shown to be useful for...
research
08/10/2018

Unsupervised Keyphrase Extraction Based on Outlier Detection

We propose a novel unsupervised keyphrase extraction approach based on o...
research
05/15/2023

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Languages are dynamic entities, where the meanings associated with words...
research
07/08/2018

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...
research
11/29/2018

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods ...

Please sign up or login with your details

Forgot password? Click here to reset