Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling

03/08/2022
by   Yuxiang Wang, et al.
0

A knowledge graph (KG) manages large-scale and real-world facts as a big graph in a schema-flexible manner. Aggregate query is a fundamental query over KGs, e.g., "what is the average price of cars produced in Germany?". Despite its importance, answering aggregate queries on KGs has received little attention in the literature. Aggregate queries can be supported based on factoid queries, e.g., "find all cars produced in Germany", by applying an additional aggregate operation on factoid queries' answers. However, this straightforward method is challenging because both the accuracy and efficiency of factoid query processing will seriously impact the performance of aggregate queries. In this paper, we propose a "sampling-estimation" model to answer aggregate queries over KGs, which is the first work to provide an approximate aggregate result with an effective accuracy guarantee, and without relying on factoid queries. Specifically, we first present a semantic-aware sampling to collect a high-quality random sample through a random walk based on knowledge graph embedding. Then, we propose unbiased estimators for COUNT, SUM, and a consistent estimator for AVG to compute the approximate aggregate results based on the random sample, with an accuracy guarantee in the form of confidence interval. We extend our approach to support iterative improvement of accuracy, and more complex queries with filter, GROUP-BY, and different graph shapes, e.g., chain, cycle, star, flower. Extensive experiments over real-world KGs demonstrate the effectiveness and efficiency of our approach.

READ FULL TEXT

page 1

page 14

research
09/05/2019

Random Sampling for Group-By Queries

Random sampling has been widely used in approximate query processing on ...
research
07/23/2019

Efficient Knowledge Graph Accuracy Evaluation

Estimation of the accuracy of a large-scale knowledge graph (KG) often r...
research
08/06/2019

RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Analysts commonly investigate the data distributions derived from statis...
research
07/10/2021

NeuroDB: A Neural Network Framework for Answering Range Aggregate Queries and Beyond

Range aggregate queries (RAQs) are an integral part of many real-world a...
research
10/22/2017

Natural Language Aggregate Query over RDF Data

Natural language question-answering over RDF data has received widesprea...
research
03/29/2021

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Sample-based approximate query processing (AQP) suffers from many pitfal...
research
11/28/2018

Approximate Evaluation of Label-Constrained Reachability Queries

The current surge of interest in graph-based data models mirrors the usa...

Please sign up or login with your details

Forgot password? Click here to reset