Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

02/24/2017
by   Stephen N. Pallone, et al.
0

We analyze the problem of learning a single user's preferences in an active learning setting, sequentially and adaptively querying the user over a finite time horizon. Learning is conducted via choice-based queries, where the user selects her preferred option among a small subset of offered alternatives. These queries have been shown to be a robust and efficient way to learn an individual's preferences. We take a parametric approach and model the user's preferences through a linear classifier, using a Bayesian prior to encode our current knowledge of this classifier. The rate at which we learn depends on the alternatives offered at every time epoch. Under certain noise assumptions, we show that the Bayes-optimal policy for maximally reducing entropy of the posterior distribution of this linear classifier is a greedy policy, and that this policy achieves a linear lower bound when alternatives can be constructed from the continuum. Further, we analyze a different metric called misclassification error, proving that the performance of the optimal policy that minimizes misclassification error is bounded below by a linear function of differential entropy. Lastly, we numerically compare the greedy entropy reduction policy with a knowledge gradient policy under a number of scenarios, examining their performance under both differential entropy and misclassification error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Continuous-in-time Limit for Bayesian Bandits

This paper revisits the bandit problem in the Bayesian setting. The Baye...
research
03/22/2023

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradien...
research
05/23/2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory p...
research
05/08/2020

Active Preference Learning using Maximum Regret

We study active preference learning as a framework for intuitively speci...
research
01/15/2021

Deciding What to Learn: A Rate-Distortion Approach

Agents that learn to select optimal actions represent a prominent focus ...
research
05/26/2023

Reputation-based Persuasion Platforms

In this paper, we introduce a two-stage Bayesian persuasion model in whi...
research
07/16/2014

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

We consider the problem of group testing with sum observations and noise...

Please sign up or login with your details

Forgot password? Click here to reset