Finding Favourite Tuples on Data Streams with Provably Few Comparisons

07/06/2023
by   Guangyi Zhang, et al.
0

One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in n, where n is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the theoretical guarantees of our method can be maintained for this important problem variant. In addition, we show how to enhance existing pruning techniques in the literature by leveraging powerful tools from mathematical programming. Finally, we systematically evaluate all proposed algorithms over both synthetic and real-life datasets, examine their scalability, and demonstrate their superior performance over existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2018

A comparative study of top-k high utility itemset mining methods

High Utility Itemset (HUI) mining problem is one of the important proble...
research
02/16/2015

Clustering and Inference From Pairwise Comparisons

Given a set of pairwise comparisons, the classical ranking problem compu...
research
03/25/2023

Targeted Mining of Top-k High Utility Itemsets

Finding high-importance patterns in data is an emerging data mining task...
research
12/03/2019

Rank Aggregation via Heterogeneous Thurstone Preference Models

We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranke...
research
01/30/2013

Utility Elicitation as a Classification Problem

We investigate the application of classification techniques to utility e...
research
10/30/2021

TargetUM: Targeted High-Utility Itemset Querying

Traditional high-utility itemset mining (HUIM) aims to determine all hig...
research
11/26/2017

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Multileaved comparison methods generalize interleaved comparison methods...

Please sign up or login with your details

Forgot password? Click here to reset