Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions

02/24/2021
by   Junyang Gao, et al.
0

A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are naturally fascinated by claims with long durability, such as: "On January 22, 2006, Kobe Bryant dropped 81 points against Toronto Raptors. Since then, this scoring record has yet to be broken." In general, given a sequence of instant-stamped records, suppose that we can rank them by a user-specified scoring function f, which may consider multiple attributes of a record to compute a single score for ranking. This paper studies "durable top-k queries", which find records whose scores were within top-k among those records within a "durability window" of given length, e.g., a 10-year window starting/ending at the timestamp of the record. The parameter k, the length of the durability window, and parameters of the scoring function (which capture user preference) can all be given at the query time. We illustrate why this problem formulation yields more meaningful answers in some practical situations than other similar types of queries considered previously. We propose new algorithms for solving this problem, and provide a comprehensive theoretical analysis on the complexities of the problem itself and of our algorithms. Our algorithms vastly outperform various baselines (by up to two orders of magnitude on real and synthetic datasets).

READ FULL TEXT

page 1

page 12

page 13

research
03/17/2022

Weighing the techniques for data optimization in a database

A set of preferred records can be obtained from a large database in a mu...
research
01/11/2022

Flexible Skyline: one query to rule them all

The most common archetypes to identify relevant information in large dat...
research
03/23/2022

Trying to bridge the gap between skyline and top-k queries

There are two most common paradigms that are used in order to identify r...
research
02/27/2019

Ranking in Genealogy: Search Results Fusion at Ancestry

Genealogy research is the study of family history using available resour...
research
07/21/2023

Subset Sampling and Its Extensions

This paper studies the subset sampling problem. The input is a set 𝒮 of ...
research
09/03/2018

GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search

In this paper, we study the problem of approximate containment similarit...
research
01/17/2019

Generating Pareto records

We present, (partially) analyze, and apply an efficient algorithm for th...

Please sign up or login with your details

Forgot password? Click here to reset