TripClick: The Log Files of a Large Health Web Search Engine

03/14/2021
by   Navid Rekabsaz, et al.
0

Click logs are valuable resources for a variety of information retrieval (IR) tasks. This includes query understanding/analysis, as well as learning effective IR models particularly when the models require large amounts of training data. We release a large-scale domain-specific dataset of click logs, obtained from user interactions of the Trip Database health web search engine. Our click log dataset comprises approximately 5.2 million user interactions collected between 2013 and 2020. We use this dataset to create a standard IR evaluation benchmark – TripClick – with around 700,000 unique free-text queries and 1.3 million pairs of query-document relevance signals, whose relevance is estimated by two click-through models. As such, the collection is one of the few datasets offering the necessary data richness and scale to train neural IR models with a large amount of parameters, and notably the first in the health domain. Using TripClick, we conduct experiments to evaluate a variety of IR models, showing the benefits of exploiting this data to train neural architectures. In particular, the evaluation results show that the best performing neural IR model significantly improves the performance by a large margin relative to classical IR models, especially for more frequent queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2020

CURE: Collection for Urdu Information Retrieval Evaluation and Ranking

Urdu is a widely spoken language with 163 million speakers worldwide acr...
research
06/19/2018

End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings

We consider the problem of retrieving and ranking items in an eCommerce ...
research
12/21/2020

Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

Neural networks with deep architectures have demonstrated significant pe...
research
01/21/2022

Reproducing Personalised Session Search over the AOL Query Log

Despite its troubled past, the AOL Query Log continues to be an importan...
research
06/09/2020

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

Users of Web search engines reveal their information needs through queri...
research
07/01/2017

An Approach for Weakly-Supervised Deep Information Retrieval

Recent developments in neural information retrieval models have been pro...
research
04/29/2019

On the Effect of Low-Frequency Terms on Neural-IR Models

Low-frequency terms are a recurring challenge for information retrieval ...

Please sign up or login with your details

Forgot password? Click here to reset