Scalable k-d trees for distributed data

01/20/2022
by   Aritra Chakravorty, et al.
0

Data structures known as k-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct k-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

Application of Data Science to Discover Violence-Related Issues in Iraq

Data science has been satisfactorily used to discover social issues in s...
research
01/30/2023

Optimal Decision Trees For Interpretable Clustering with Constraints

Constrained clustering is a semi-supervised task that employs a limited ...
research
08/19/2020

LMFAO: An Engine for Batches of Group-By Aggregates

LMFAO is an in-memory optimization and execution engine for large batche...
research
09/26/2017

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

Generalized linear models (GLMs) -- such as logistic regression, Poisson...
research
12/31/2021

Statistical scalability and approximate inference in distributed computing environments

Harnessing distributed computing environments to build scalable inferenc...
research
05/02/2023

Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

Experts advising decision-makers are likely to display expertise which v...
research
02/07/2020

Recursive PGFs for BSTs and DSTs

We review fundamentals underlying binary search trees and digital search...

Please sign up or login with your details

Forgot password? Click here to reset