TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation

06/10/2020
by   Ningyuan Sun, et al.
0

Parsing natural language to corresponding SQL (NL2SQL) with data driven approaches like deep neural networks attracts much attention in recent years. Existing NL2SQL datasets assume that condition values should appear exactly in natural language questions and the queries are answerable given the table. However, these assumptions may fail in practical scenarios, because user may use different expressions for the same content in the table, and query information outside the table without the full picture of contents in table. Therefore we present TableQA, a large-scale cross-domain Natural Language to SQL dataset in Chinese language consisting 64,891 questions and 20,311 unique SQL queries on over 6,000 tables. Different from exisiting NL2SQL datasets, TableQA requires to generalize well not only to SQL skeletons of different questions and table schemas, but also to the various expressions for condition values. Experiment results show that the state-of-the-art model with 95.1 condition value accuracy on WikiSQL only gets 46.8 and 43.0 challenging and necessary to handle. Two table-aware approaches are proposed to alleviate the problem, the end-to-end approaches obtains 51.3 accuracy on the condition value and logic form tasks, with improvement of 4.7 and 3.4

READ FULL TEXT
research
04/23/2018

Semantic Parsing with Syntax- and Table-Aware SQL Generation

We present a generative model to map natural language questions into SQL...
research
09/02/2019

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

We focus on the cross-domain context-dependent text-to-SQL generation ta...
research
11/11/2022

DocuT5: Seq2seq SQL Generation with Table Documentation

Current SQL generators based on pre-trained language models struggle to ...
research
11/13/2018

Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters

Translating natural language to SQL queries for table-based question ans...
research
05/10/2023

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

Converting text into the structured query language (Text2SQL) is a resea...
research
08/26/2022

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

As the first session-level Chinese dataset, CHASE contains two separate ...
research
10/21/2020

On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

Large-scale semantic parsing datasets annotated with logical forms have ...

Please sign up or login with your details

Forgot password? Click here to reset