Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

07/06/2023
by   Anh L. Mai, et al.
0

A package query returns a package - a multiset of tuples - that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as KD-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.

READ FULL TEXT
research
03/11/2021

Stochastic Package Queries in Probabilistic Databases

We provide methods for in-database support of decision making under unce...
research
05/04/2022

BilevelJuMP.jl: Modeling and Solving Bilevel Optimization in Julia

In this paper we present BilevelJuMP, a new Julia package to support bil...
research
04/15/2021

Optimizing Multiple Multi-Way Stream Joins

We address the joint optimization of multiple stream joins in a scale-ou...
research
03/16/2020

Supporting Hard Queries over Probabilistic Preferences

Preference analysis is widely applied in various domains such as social ...
research
05/05/2022

Leveraging Application Data Constraints to Optimize Database-Backed Web Applications

Exploiting the relationships among data, such as primary and foreign key...
research
05/30/2018

Q-Graph: Preserving Query Locality in Multi-Query Graph Processing

Arising user-centric graph applications such as route planning and perso...

Please sign up or login with your details

Forgot password? Click here to reset