Improving average ranking precision in user searches for biomedical research datasets

09/10/2017
by   Douglas Teodoro, et al.
0

Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3 best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0 Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies.

READ FULL TEXT
research
10/08/2018

A Vertical PRF Architecture for Microblog Search

In microblog retrieval, query expansion can be essential to obtain good ...
research
02/22/2022

Query Expansion and Entity Weighting for Query Reformulation Retrieval in Voice Assistant Systems

Voice assistants such as Alexa, Siri, and Google Assistant have become i...
research
11/08/2018

Deep Neural Networks for Query Expansion using Word Embeddings

Query expansion is a method for alleviating the vocabulary mismatch prob...
research
08/28/2018

Automated Query Expansion using High Dimensional Clustering

The exponential growth of information on the Internet has created a big ...
research
11/17/2019

Quels corpus d'entraînement pour l'expansion de requêtes par plongement de mots : application à la recherche de microblogs culturels

We describe here an experimental framework and the results obtained on m...
research
06/07/2020

Interactive Extractive Search over Biomedical Corpora

We present a system that allows life-science researchers to search a lin...

Please sign up or login with your details

Forgot password? Click here to reset