[Technical Report] Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

03/25/2021
by   Kyoungmin Kim, et al.
0

Graph pattern cardinality estimation is the problem of estimating the number of embeddings of a query graph in a data graph. This fundamental problem arises, for example, during query planning in subgraph matching algorithms. There are two major approaches to solving the problem: sampling and synopsis. Synopsis (or summary)-based methods are fast and accurate if synopses capture information of graphs well. However, these methods suffer from large errors due to loss of information during summarization and inherent assumptions. Sampling-based methods are unbiased but suffer from large estimation variance due to large sample space. To address these limitations, we propose Alley, a hybrid method that combines both sampling and synopses. Alley employs 1) a novel sampling strategy, random walk with intersection, which effectively reduces the sample space, 2) branching to further reduce variance, and 3) a novel mining approach that extracts and indexes tangled patterns as synopses which are inherently difficult to estimate by sampling. By using them in the online estimation phase, we can effectively reduce the sample space while still ensuring unbiasedness. We establish that Alley has worst-case optimal runtime and approximation quality guarantees for any given error bound ϵ and required confidence μ. In addition to the theoretical aspect of Alley, our extensive experiments show that Alley outperforms the state-of-the-art methods by up to orders of magnitude higher accuracy with similar efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2022

Sampling-based Estimation of the Number of Distinct Values in Distributed Environment

In data mining, estimating the number of distinct values (NDV) is a fund...
research
07/23/2020

Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal ε-uniform sampling, and perfect uniform sampling

We study the connected subgraph sampling problem: given an integer k ≥ 3...
research
01/29/2018

Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation

Estimating the cardinality (i.e., the number of answers) of conjunctive ...
research
07/14/2019

The FAST Algorithm for Submodular Maximization

In this paper we describe a new algorithm called Fast Adaptive Sequencin...
research
08/11/2021

A General Cardinality Estimation Framework for Subgraph Matching in Property Graphs

Many techniques have been developed for the cardinality estimation probl...
research
07/25/2023

Duet: efficient and scalable hybriD neUral rElation undersTanding

Learned cardinality estimation methods have achieved high precision comp...
research
08/05/2021

Q-error Bounds of Random Uniform Sampling for Cardinality Estimation

Random uniform sampling has been studied in various statistical tasks bu...

Please sign up or login with your details

Forgot password? Click here to reset