Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization

09/11/2021
by   Yujian Gan, et al.
0

Recently, there has been significant progress in studying neural networks for translating text descriptions into SQL queries under the zero-shot cross-domain setting. Despite achieving good performance on some public benchmarks, we observe that existing text-to-SQL models do not generalize when facing domain knowledge that does not frequently appear in the training data, which may render the worse prediction performance for unseen domains. In this work, we investigate the robustness of text-to-SQL models when the questions require rarely observed domain knowledge. In particular, we define five types of domain knowledge and introduce Spider-DK (DK is the abbreviation of domain knowledge), a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-DK are selected from Spider, and we modify some samples by adding domain knowledge that reflects real-world question paraphrases. We demonstrate that the prediction accuracy dramatically drops on samples that require such domain knowledge, even if the domain knowledge appears in the training set, and the model provides the correct predictions for related training samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Recently, there has been significant progress in studying neural network...
research
01/03/2023

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

In this paper, we study the problem of knowledge-intensive text-to-SQL, ...
research
06/22/2021

KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers

The goal of database question answering is to enable natural language qu...
research
09/05/2023

Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

Existing approaches to automatic data transformation are insufficient to...
research
11/11/2022

DocuT5: Seq2seq SQL Generation with Table Documentation

Current SQL generators based on pre-trained language models struggle to ...
research
02/20/2019

Towards Semantic Big Graph Analytics for Cross-Domain Knowledge Discovery

In recent years, the size of big linked data has grown rapidly and this ...
research
06/23/2018

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...

Please sign up or login with your details

Forgot password? Click here to reset