Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

12/17/2022
by   Yiyun Zhao, et al.
0

Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2019

X-SQL: reinforce schema representation with context

In this work, we present X-SQL, a new network architecture for the probl...
research
10/29/2022

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

Text-to-SQL parsers typically struggle with databases unseen during the ...
research
09/29/2020

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

We present GraPPa, an effective pre-training approach for table semantic...
research
10/24/2020

Structure-Grounded Pretraining for Text-to-SQL

Learning to capture text-table alignment is essential for table related ...
research
04/12/2021

Learning to Synthesize Data for Semantic Parsing

Synthesizing data for semantic parsing has gained increasing attention r...
research
10/21/2022

STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing

In this paper, we propose a novel SQL guided pre-training framework STAR...
research
11/01/2018

Embedding Individual Table Columns for Resilient SQL Chatbots

Most of the world's data is stored in relational databases. Accessing th...

Please sign up or login with your details

Forgot password? Click here to reset