On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

by   Tianze Shi, et al.

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoder-decoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4 that annotated alignments can support further accuracy gains of up to 23.9


page 1

page 2

page 3

page 4


Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

We present Spider, a large-scale, complex and cross-domain semantic pars...

TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation

Parsing natural language to corresponding SQL (NL2SQL) with data driven ...

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

As the first session-level Chinese dataset, CHASE contains two separate ...

Syntactic Question Abstraction and Retrieval for Data-Scarce Semantic Parsing

Deep learning approaches to semantic parsing require a large amount of l...

You Say 'What', I Hear 'Where' and 'Why' --- (Mis-)Interpreting SQL to Derive Fine-Grained Provenance

SQL declaratively specifies what (not how) the desired output of a query...

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Most available semantic parsing datasets, comprising of pairs of natural...

Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture

An intelligent machine that can answer human questions based on electron...