Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets

03/14/2018
by   Xinzhi Han, et al.
0

With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by user. Our analysis consists of two parts: 1) we use standard statistical methods to select important features among 137 candidates given by information retrieval researchers from Microsoft. We find that not all the features are useful, and give interpretations on the top-selected features; 2) we give baselines on prediction over the real-world dataset MSLR-WEB by using various learning algorithms. We find that models of boosting trees, random forest in general achieve the best performance of prediction. This agrees with the mainstream opinion in information retrieval community that tree-based algorithms outperform the other candidates for this problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2019

Report on the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2019)

The Bibliometric-enhanced Information Retrieval workshop series (BIR) at...
research
03/18/2021

Dynamic Model for Query-Document Expansion towards Improving Retrieval Relevance

Getting relevant information from search engines has been the heart of r...
research
11/20/2018

Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection

Ranking functions in information retrieval are often used in search engi...
research
03/01/2019

On Application of Learning to Rank for E-Commerce Search

E-Commerce (E-Com) search is an emerging important new application of in...
research
03/02/2022

Stable and Semi-stable Sampling Approaches for Continuously Used Samples

Information retrieval systems are usually measured by labeling the relev...
research
02/08/2021

User Engagement Prediction for Clarification in Search

Clarification is increasingly becoming a vital factor in various topics ...
research
05/23/2023

Term-Sets Can Be Strong Document Identifiers For Auto-Regressive Search Engines

Auto-regressive search engines emerge as a promising paradigm for next-g...

Please sign up or login with your details

Forgot password? Click here to reset