Explaining Wrong Queries Using Small Examples

04/09/2019
by   Zhengjie Miao, et al.
0

For testing the correctness of SQL queries, e.g., evaluating student submissions in a database course, a standard practice is to execute the query in question on some test database instance and compare its result with that of the correct query. Given two queries Q_1 and Q_2, we say that a database instance D is a counterexample (for Q_1 and Q_2) if Q_1(D) differs from Q_2(D); such a counterexample can serve as an explanation of why Q_1 and Q_2 are not equivalent. While the test database instance may serve as a counterexample, it may be too large or complex to read and understand where the inequivalence comes from. Therefore, in this paper, given a known counterexample D for Q_1 and Q_2, we aim to find the smallest counterexample D' ⊆ D where Q_1(D') ≠ Q_2(D'). The problem in general is NP-hard. We give a suite of algorithms for finding the smallest counterexample for different classes of queries, some more tractable than others. We also present an efficient provenance-based algorithm for SPJUD queries that uses a constraint solver, and extend it to more complex queries with aggregation, group-by, and nested queries. We perform extensive experiments indicating the effectiveness and scalability of our solution on student queries from an undergraduate database course and on queries from the TPC-H benchmark. We also report a user study from the course where we deployed our tool to help students with an assignment on relational algebra.

READ FULL TEXT
research
12/19/2019

Edit Based Grading of SQL Queries

Grading student SQL queries manually is a tedious and error-prone proces...
research
06/23/2023

Relational Playground: Teaching the Duality of Relational Algebra and SQL

Students in introductory data management courses are often taught how to...
research
05/15/2018

NEURON: Query Optimization Meets Natural Language Processing For Augmenting Database Education

Relational database management system (RDBMS) is a major undergraduate c...
research
10/18/2022

Efficient Evaluation of Arbitrary Relational Calculus Queries

The relational calculus (RC) is a concise, declarative query language. H...
research
02/22/2022

Understanding Queries by Conditional Instances

A powerful way to understand a complex query is by observing how it oper...
research
08/13/2023

MDB: Interactively Querying Datasets and Models

As models are trained and deployed, developers need to be able to system...
research
03/14/2023

It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations

Large language models (LLMs) are capable to perform complex reasoning by...

Please sign up or login with your details

Forgot password? Click here to reset