TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

08/14/2022
by   Sophia Althammer, et al.
0

Robust test collections are crucial for Information Retrieval research. Recently there is a growing interest in evaluating retrieval systems for domain-specific retrieval tasks, however these tasks often lack a reliable test collection with human-annotated relevance assessments following the Cranfield paradigm. In the medical domain, the TripClick collection was recently proposed, which contains click log data from the Trip search engine and includes two click-based test sets. However the clicks are biased to the retrieval model used, which remains unknown, and a previous study shows that the test sets have a low judgement coverage for the Top-10 results of lexical and neural retrieval models. In this paper we present the novel, relevance judgement test collection TripJudge for TripClick health retrieval. We collect relevance judgements in an annotation campaign and ensure the quality and reusability of TripJudge by a variety of ranking methods for pool creation, by multiple judgements per query-document pair and by an at least moderate inter-annotator agreement. We compare system evaluation with TripJudge and TripClick and find that that click and judgement-based evaluation can lead to substantially different system rankings.

READ FULL TEXT
research
03/06/2023

LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation

LongEval-Retrieval is a Web document retrieval benchmark that focuses on...
research
04/28/2020

On the Reliability of Test Collections for Evaluating Systems of Different Types

As deep learning based models are increasingly being used for informatio...
research
08/21/2023

DepreSym: A Depression Symptom Annotated Corpus and the Role of LLMs as Assessors of Psychological Markers

Computational methods for depression detection aim to mine traces of dep...
research
11/02/2022

Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

In the context of depth-k pooling for constructing web search test colle...
research
03/24/2021

CSFCube – A Test Collection of Computer Science Research Articles for Faceted Query by Example

Query by Example is a well-known information retrieval task in which a d...
research
11/20/2021

Effects of context, complexity, and clustering on evaluation for math formula retrieval

There are now several test collections for the formula retrieval task, i...
research
10/11/2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers

Tokenization is a crucial step in information retrieval, especially for ...

Please sign up or login with your details

Forgot password? Click here to reset