SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

08/26/2022
by   Saihao Huang, et al.
0

As the first session-level Chinese dataset, CHASE contains two separate parts, i.e., 2,003 sessions manually constructed from scratch (CHASE-C), and 3,456 sessions translated from English SParC (CHASE-T). We find the two parts are highly discrepant and incompatible as training and evaluation data. In this work, we present SeSQL, yet another large-scale session-level text-to-SQL dataset in Chinese, consisting of 5,028 sessions all manually constructed from scratch. In order to guarantee data quality, we adopt an iterative annotation workflow to facilitate intense and in-time review of previous-round natural language (NL) questions and SQL queries. Moreover, by completing all context-dependent NL questions, we obtain 27,012 context-independent question/SQL pairs, allowing SeSQL to be used as the largest dataset for single-round multi-DB text-to-SQL parsing. We conduct benchmark session-level text-to-SQL parsing experiments on SeSQL by employing three competitive session-level parsers, and present detailed analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

The cross-domain text-to-SQL task aims to build a system that can parse ...
research
06/10/2020

TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation

Parsing natural language to corresponding SQL (NL2SQL) with data driven ...
research
09/29/2019

A Pilot Study for Chinese SQL Semantic Parsing

The task of semantic parsing is highly useful for dialogue and question ...
research
10/21/2020

On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

Large-scale semantic parsing datasets annotated with logical forms have ...
research
05/11/2023

QURG: Question Rewriting Guided Context-Dependent Text-to-SQL Semantic Parsing

Context-dependent Text-to-SQL aims to translate multi-turn natural langu...
research
12/27/2022

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Text-to-SQL semantic parsing is an important NLP task, which greatly fac...
research
01/03/2023

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

In this paper, we study the problem of knowledge-intensive text-to-SQL, ...

Please sign up or login with your details

Forgot password? Click here to reset