Provenance for Large-scale Datalog

07/11/2019
by   David Zhao, et al.
0

Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data-provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this paper, we introduce a novel bottom-up Datalog evaluation strategy for debugging: our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation. A debugging query mechanism allows arbitrary provenance queries, constructing partial proof trees of tuples with minimal height. We integrate our technique into Souffle, a Datalog engine that synthesizes C++ code, and achieve high performance by using specialized parallel data structures. Experiments are conducted with DOOP/DaCapo, producing proof annotations for tens of millions of output tuples. We show that our method has a runtime overhead of 1.27x on average while being more flexible than existing state-of-the-art techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2014

Application of Methods for Syntax Analysis of Context-Free Languages to Query Evaluation of Logic Programs

My research goal is to employ a parser generation algorithm based on the...
research
12/10/2018

Scaling-Up In-Memory Datalog Processing: Observations and Techniques

Recursive query processing has experienced a recent resurgence, as a res...
research
07/24/2019

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

A large class of traditional graph and data mining algorithms can be con...
research
02/25/2023

Suspension Analysis and Selective Continuation-Passing Style for Higher-Order Probabilistic Programming Languages

Probabilistic programming languages (PPLs) make encoding and automatical...
research
06/21/2023

Automatic Inference of Resource Leak Specifications

A resource leak occurs when a program fails to free some finite resource...
research
08/29/2023

PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Binary similarity analysis determines if two binary executables are from...
research
05/03/2020

BCFA: Bespoke Control Flow Analysis for CFA at Scale

Many data-driven software engineering tasks such as discovering programm...

Please sign up or login with your details

Forgot password? Click here to reset