DeepAI AI Chat
Log In Sign Up

Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets

03/14/2018
by   Xinzhi Han, et al.
0

With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by user. Our analysis consists of two parts: 1) we use standard statistical methods to select important features among 137 candidates given by information retrieval researchers from Microsoft. We find that not all the features are useful, and give interpretations on the top-selected features; 2) we give baselines on prediction over the real-world dataset MSLR-WEB by using various learning algorithms. We find that models of boosting trees, random forest in general achieve the best performance of prediction. This agrees with the mainstream opinion in information retrieval community that tree-based algorithms outperform the other candidates for this problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/11/2019

Report on the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2019)

The Bibliometric-enhanced Information Retrieval workshop series (BIR) at...
03/18/2021

Dynamic Model for Query-Document Expansion towards Improving Retrieval Relevance

Getting relevant information from search engines has been the heart of r...
11/20/2018

Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection

Ranking functions in information retrieval are often used in search engi...
03/01/2019

On Application of Learning to Rank for E-Commerce Search

E-Commerce (E-Com) search is an emerging important new application of in...
03/02/2022

Stable and Semi-stable Sampling Approaches for Continuously Used Samples

Information retrieval systems are usually measured by labeling the relev...
02/08/2021

User Engagement Prediction for Clarification in Search

Clarification is increasingly becoming a vital factor in various topics ...
05/23/2023

Term-Sets Can Be Strong Document Identifiers For Auto-Regressive Search Engines

Auto-regressive search engines emerge as a promising paradigm for next-g...