CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

05/25/2023
by   Hanchong Zhang, et al.
0

The cross-domain text-to-SQL task aims to build a system that can parse user questions into SQL on complete unseen databases, and the single-domain text-to-SQL task evaluates the performance on identical databases. Both of these setups confront unavoidable difficulties in real-world applications. To this end, we introduce the cross-schema text-to-SQL task, where the databases of evaluation data are different from that in the training data but come from the same domain. Furthermore, we present CSS, a large-scale CrosS-Schema Chinese text-to-SQL dataset, to carry on corresponding studies. CSS originally consisted of 4,340 question/SQL pairs across 2 databases. In order to generalize models to different medical systems, we extend CSS and create 19 new databases along with 29,280 corresponding dataset examples. Moreover, CSS is also a large corpus for single-domain Chinese text-to-SQL studies. We present the data collection approach and a series of analyses of the data statistics. To show the potential and usefulness of CSS, benchmarking baselines have been conducted and reported. Our dataset is publicly available at <https://huggingface.co/datasets/zhanghanchong/css>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2018

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

We present Spider, a large-scale, complex and cross-domain semantic pars...
research
08/26/2022

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

As the first session-level Chinese dataset, CHASE contains two separate ...
research
06/23/2018

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...
research
10/21/2022

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

The task of context-dependent text-to-SQL aims to convert multi-turn use...
research
12/27/2022

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Text-to-SQL semantic parsing is an important NLP task, which greatly fac...
research
05/19/2023

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

Large language models (LLMs) with in-context learning have demonstrated ...
research
06/06/2023

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

Large language models (LLMs) with memory are computationally universal. ...

Please sign up or login with your details

Forgot password? Click here to reset