UNITE: A Unified Benchmark for Text-to-SQL Evaluation

05/25/2023
by   Wuwei Lan, et al.
0

A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark <cit.>, we introduce ∼120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues – which these SOTA models cannot address well. [Our code and data processing script will be available at <https://github.com/XXXX.>]

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2023

Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Neural text-to-SQL models have achieved remarkable performance in transl...
research
10/23/2022

Towards Generalizable and Robust Text-to-SQL Parsing

Text-to-SQL parsing tackles the problem of mapping natural language ques...
research
06/23/2018

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...
research
10/06/2020

Semantic Evaluation for Text-to-SQL with Distilled Test Suites

We propose test suite accuracy to approximate semantic accuracy for Text...
research
09/14/2022

SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers

This paper aims to improve the performance of text-to-SQL parsing by exp...
research
03/14/2019

LIKE Patterns and Complexity

We investigate the expressive power and complexity questions for the LIK...
research
06/22/2021

KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers

The goal of database question answering is to enable natural language qu...

Please sign up or login with your details

Forgot password? Click here to reset