Semantic Evaluation for Text-to-SQL with Distilled Test Suites

by   Ruiqi Zhong, et al.

We propose test suite accuracy to approximate semantic accuracy for Text-to-SQL models. Our method distills a small test suite of databases that achieves high code coverage for the gold query from a large number of randomly generated databases. At evaluation time, it computes the denotation accuracy of the predicted queries on the distilled test suite, hence calculating a tight upper-bound for semantic accuracy efficiently. We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples. In contrast, the current Spider metric leads to a 2.5 worst case, indicating that test suite accuracy is needed. Our implementation, along with distilled test suites for eleven Text-to-SQL datasets, is publicly available.



page 6


Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

We present Spider, a large-scale, complex and cross-domain semantic pars...

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task

Most existing studies in text-to-SQL tasks do not require generating com...

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Detecting online hate is a complex task, and low-performing models have ...

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

We present a sequence-to-action parsing approach for the natural languag...

Experiences with Some Benchmarks for Deductive Databases and Implementations of Bottom-Up Evaluation

OpenRuleBench is a large benchmark suite for rule engines, which include...

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...

One SQL to Rule Them All

Real-time data analysis and management are increasingly critical for tod...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.