Structure-Grounded Pretraining for Text-to-SQL

10/24/2020
by   Xiang Deng, et al.
0

Learning to capture text-table alignment is essential for table related tasks like text-to-SQL. The model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (StruG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel prediction tasks: column grounding, value grounding and column-value mapping, and train them using weak supervision without requiring complex SQL annotation. Additionally, to evaluate the model under a more realistic setting, we create a new evaluation set Spider-Realistic based on Spider with explicit mentions of column names removed, and adopt two existing single-database text-to-SQL datasets. StruG significantly outperforms BERT-LARGE on Spider and the realistic evaluation sets, while bringing consistent improvement on the large-scale WikiSQL benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2019

Using Database Rule for Weak Supervised Text-to-SQL Generation

We present a simple and novel way to do the task of text-to-SQL problem ...
research
04/23/2018

Semantic Parsing with Syntax- and Table-Aware SQL Generation

We present a generative model to map natural language questions into SQL...
research
12/17/2022

Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Recently, there has been increasing interest in synthesizing data to imp...
research
03/20/2019

Column2Vec: Structural Understanding via Distributed Representations of Database Schemas

We present Column2Vec, a distributed representation of database columns ...
research
03/11/2021

Self-supervised Text-to-SQL Learning with Header Alignment Training

Since we can leverage a large amount of unlabeled data without any human...
research
04/02/2019

Combinatorial inequalities

This is an expanded version of the Notices of the AMS column with the sa...
research
11/01/2018

Embedding Individual Table Columns for Resilient SQL Chatbots

Most of the world's data is stored in relational databases. Accessing th...

Please sign up or login with your details

Forgot password? Click here to reset