BanditRank: Learning to Rank Using Contextual Bandits

10/23/2019
by   Phanideep Gampa, et al.
0

We propose an extensible deep learning method that uses reinforcement learning to train neural networks for offline ranking in information retrieval (IR). We call our method BanditRank as it treats ranking as a contextual bandit problem. In the domain of learning to rank for IR, current deep learning models are trained on objective functions different from the measures they are evaluated on. Since most evaluation measures are discrete quantities, they cannot be leveraged by directly using gradient descent algorithms without an approximation. BanditRank bridges this gap by directly optimizing a task-specific measure, such as mean average precision (MAP), using gradient descent. Specifically, a contextual bandit whose action is to rank input documents is trained using a policy gradient algorithm to directly maximize the reward. The reward can be a single measure, such as MAP, or a combination of several measures. The notion of ranking is also inherent in BanditRank, similar to the current listwise approaches. To evaluate the effectiveness of BanditRank, we conducted a series of experiments on datasets related to three different tasks, i.e., web search, community, and factoid question answering. We found that it performs better than state-of-the-art methods when applied on the question answering datasets. On the web search dataset, we found that BanditRank performed better than four strong listwise baselines including LambdaMART, AdaRank, ListNet and Coordinate Ascent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2017

Machine Comprehension by Text-to-Text Neural Question Generation

We propose a recurrent neural model that generates natural-language ques...
research
08/27/2018

A strong baseline for question relevancy ranking

The best systems at the SemEval-16 and SemEval-17 community question ans...
research
01/19/2022

Improving Biomedical Information Retrieval with Neural Retrievers

Information retrieval (IR) is essential in search engines and dialogue s...
research
02/02/2020

Safe Exploration for Optimizing Contextual Bandits

Contextual bandit problems are a natural fit for many information retrie...
research
04/12/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

In stochastic contextual bandit (SCB) problems, an agent selects an acti...
research
06/14/2019

Microsoft AI Challenge India 2018: Learning to Rank Passages for Web Question Answering with Deep Attention Networks

This paper describes our system for The Microsoft AI Challenge India 201...
research
08/31/2020

Optimize What You Evaluate With: A Simple Yet Effective Framework For Direct Optimization Of IR Metrics

Learning-to-rank has been intensively studied and has shown significantl...

Please sign up or login with your details

Forgot password? Click here to reset