Relevance-based Word Embedding

05/09/2017
by   Hamed Zamani, et al.
0

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2017

Toward Incorporation of Relevant Documents in word2vec

Recent advances in neural word embedding provide significant benefit to ...
research
10/18/2022

On the Information Content of Predictions in Word Analogy Tests

An approach is proposed to quantify, in bits of information, the actual ...
research
11/06/2015

Towards a Better Understanding of Predict and Count Models

In a recent paper, Levy and Goldberg pointed out an interesting connecti...
research
08/18/2022

Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion

To recommend relevant merchandises for seasonal retail events, we rely o...
research
06/20/2016

Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Word embedding, specially with its recent developments, promises a quant...
research
11/14/2019

Query Expansion for Patent Searching using Word Embedding and Professional Crowdsourcing

The patent examination process includes a search of previous work to ver...
research
04/21/2020

Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

The search of information in large text repositories has been plagued by...

Please sign up or login with your details

Forgot password? Click here to reset