Stars: Tera-Scale Graph Building for Clustering and Graph Learning

12/05/2022
by   CJ Carey, et al.
0

A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present Stars: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the Tera-Scale, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10 1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2 10-fold improvement in running time without quality loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2019

K-Nearest Neighbor Approximation Via the Friend-of-a-Friend Principle

Suppose V is an n-element set where for each x ∈ V, the elements of V ∖{...
research
08/07/2023

TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs

We introduce TeraHAC, a (1+ϵ)-approximate hierarchical agglomerative clu...
research
11/23/2019

GRASPEL: Graph Spectral Learning at Scale

Learning meaningful graphs from data plays important roles in many data ...
research
05/27/2019

Learning to Route in Similarity Graphs

Recently similarity graphs became the leading paradigm for efficient nea...
research
08/22/2022

Generalized Relative Neighborhood Graph (GRNG) for Similarity Search

Similarity search is a fundamental building block for information retrie...
research
07/23/2020

Grale: Designing Networks for Graph Learning

How can we find the right graph for semi-supervised learning? In real wo...
research
08/30/2010

Learning Multi-modal Similarity

In many applications involving multi-media data, the definition of simil...

Please sign up or login with your details

Forgot password? Click here to reset