A data-driven strategy to combine word embeddings in information retrieval

05/26/2021
by   Alfredo Silva, et al.
0

Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a data-driven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word embeddings are a promising line of research in ad-hoc information retrieval.

READ FULL TEXT
research
05/25/2016

Query Expansion with Locally-Trained Word Embeddings

Continuous space word embeddings have received a great deal of attention...
research
07/25/2023

Towards Resolving Word Ambiguity with Word Embeddings

Ambiguity is ubiquitous in natural language. Resolving ambiguous meaning...
research
08/07/2019

Text mining policy: Classifying forest and landscape restoration policy agenda with neural information retrieval

Dozens of countries have committed to restoring the ecological functiona...
research
01/13/2020

On the Replicability of Combining Word Embeddings and Retrieval Models

We replicate recent experiments attempting to demonstrate an attractive ...
research
12/14/2022

Explainability of Text Processing and Retrieval Methods: A Critical Survey

Deep Learning and Machine Learning based models have become extremely po...
research
12/02/2015

Learning Semantic Similarity for Very Short Texts

Levering data on social media, such as Twitter and Facebook, requires in...
research
12/07/2018

Asynchronous Training of Word Embeddings for Large Text Corpora

Word embeddings are a powerful approach for analyzing language and have ...

Please sign up or login with your details

Forgot password? Click here to reset