In-database connected component analysis

02/26/2018
by   Harald Bögeholz, et al.
0

We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any ϵ>0 and any input graph G=〈 V, E 〉 the algorithm terminates after O( |V|) SQL queries with probability of at least 1-ϵ, which we show empirically to translate to a quasi-linear runtime in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2023

Relational Playground: Teaching the Duality of Relational Algebra and SQL

Students in introductory data management courses are often taught how to...
research
10/07/2022

Integration of Skyline Queries into Spark SQL

Skyline queries are frequently used in data analytics and multi-criteria...
research
02/08/2018

SQL Query Completion for Data Exploration

Within the big data tsunami, relational databases and SQL are still ther...
research
02/25/2023

Computing the Difference of Conjunctive Queries Efficiently

We investigate how to efficiently compute the difference result of two (...
research
05/05/2020

Importing Relationships into a Running Graph Database Using Parallel Processing

Importing relationships into a running graph database using multiple thr...
research
03/25/2021

Vertex-centric Parallel Computation of SQL Queries

We present a scheme for parallel execution of SQL queries on top of any ...
research
06/26/2015

Bag-of-Features Image Indexing and Classification in Microsoft SQL Server Relational Database

This paper presents a novel relational database architecture aimed to vi...

Please sign up or login with your details

Forgot password? Click here to reset