Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

01/21/2023
by   Shuaichen Chang, et al.
0

Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0 the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.

READ FULL TEXT
research
05/25/2023

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

A practical text-to-SQL system should generalize well on a wide variety ...
research
12/20/2022

Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation

The robustness of Text-to-SQL parsers against adversarial perturbations ...
research
01/12/2023

On the Structural Generalization in Text-to-SQL

Exploring the generalization of a text-to-SQL parser is essential for a ...
research
07/30/2020

Photon: A Robust Cross-Domain Text-to-SQL System

Natural language interfaces to databases (NLIDB) democratize end user ac...
research
06/02/2021

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Recently, there has been significant progress in studying neural network...
research
03/15/2022

Evaluating the Text-to-SQL Capabilities of Large Language Models

We perform an empirical evaluation of Text-to-SQL capabilities of the Co...
research
06/23/2018

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...

Please sign up or login with your details

Forgot password? Click here to reset