Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

01/21/2022
by   Gabriella Kazai, et al.
0

Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model's input based on a document's URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (SERP) to help searchers decide which result to click. This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways. To answer these questions, we study human and neural model based relevance assessments on 12k query-documents sampled from Bing's search logs. We compare changes in the relevance assessments when only the document summaries and when the full text is also exposed to assessors, studying a range of query and document properties, e.g., query type, snippet length. Our findings show that the full text is beneficial for humans and a BERT model for similar query and document types, e.g., tail, long queries. A closer look, however, reveals that humans and machines respond to the additional input in very different ways. Adding the full text can also hurt the ranker's performance, e.g., for navigational queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2023

QontSum: On Contrasting Salient Content for Query-focused Summarization

Query-focused summarization (QFS) is a challenging task in natural langu...
research
06/16/2023

GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval

Recent studies show that Generative Relevance Feedback (GRF), using text...
research
09/11/2018

EXS: Explainable Search Using Local Model Agnostic Interpretability

Retrieval models in information retrieval are used to rank documents for...
research
10/11/2022

On the Interpolation of Contextualized Term-based Ranking with BM25 for Query-by-Example Retrieval

Term-based ranking with pre-trained transformer-based language models ha...
research
10/19/2020

Surprise: Result List Truncation via Extreme Value Theory

Work in information retrieval has largely been centered around ranking a...
research
08/15/2022

Evaluating Dense Passage Retrieval using Transformers

Although representational retrieval models based on Transformers have be...
research
09/15/2018

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

User information needs vary significantly across different tasks, and th...

Please sign up or login with your details

Forgot password? Click here to reset