Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

05/04/2022
by   Yujian Gan, et al.
0

In text-to-SQL tasks – as in much of NLP – compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional example generation method. We first split the sentences in the Spider text-to-SQL dataset into sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset Spider-SS. We then construct a further dataset, Spider-CG, by composing Spider-SS sub-sentences in different combinations, to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence is seen during training. To deal with this problem, we modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Zero-shot Text-to-SQL Learning with Auxiliary Task

Recent years have seen great success in the use of neural seq2seq models...
research
01/12/2023

On the Structural Generalization in Text-to-SQL

Exploring the generalization of a text-to-SQL parser is essential for a ...
research
04/21/2023

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

We study the problem of decomposing a complex text-to-sql task into smal...
research
05/27/2023

Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques

Compositional and domain generalization present significant challenges i...
research
06/20/2023

On Evaluating Multilingual Compositional Generalization with Translated Datasets

Compositional generalization allows efficient learning and human-like in...
research
09/06/2021

Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization

Modern semantic parsers suffer from two principal limitations. First, tr...
research
06/08/2021

Meta-Learning to Compositionally Generalize

Natural language is compositional; the meaning of a sentence is a functi...

Please sign up or login with your details

Forgot password? Click here to reset