ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

06/07/2023
by   Yi Zhang, et al.
0

Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85%. However, Spider mainly contains simple databases with few tables, columns, and entries, which does not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2021

Weakly Supervised Mapping of Natural Language to SQL through Question Decomposition

Natural Language Interfaces to Databases (NLIDBs), where users pose quer...
research
04/26/2021

Evaluating Query Languages and Systems for High-Energy Physics Data

In the domain of high-energy physics (HEP), query languages in general a...
research
06/06/2023

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

Large language models (LLMs) with memory are computationally universal. ...
research
09/05/2023

Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

Existing approaches to automatic data transformation are insufficient to...
research
09/21/2022

T5QL: Taming language models for SQL generation

Automatic SQL generation has been an active research area, aiming at str...
research
03/21/2022

Paraphrasing Techniques for Maritime QA system

There has been an increasing interest in incorporating Artificial Intell...
research
06/09/2021

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Most available semantic parsing datasets, comprising of pairs of natural...

Please sign up or login with your details

Forgot password? Click here to reset