ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries

10/19/2020
by   Karthik Radhakrishnan, et al.
0

Translating natural language utterances to executable queries is a helpful technique in making the vast amount of data stored in relational databases accessible to a wider range of non-tech-savvy end users. Prior work in this area has largely focused on textual input that is linguistically correct and semantically unambiguous. However, real-world user queries are often succinct, colloquial, and noisy, resembling the input of a search engine. In this work, we introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL) to achieve robust text-to-SQL modeling over natural language search (NLS) questions. Due to the lack of evaluation data, we curate a new dataset of NLS questions and demonstrate the efficacy of our approach. ColloQL's superior performance extends to well-formed text, achieving 84.9 (logical) and 90.7 the best of our knowledge, the highest performing model that does not use execution guided decoding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2018

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

Interacting with relational databases through natural language helps use...
research
03/03/2021

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Data augmentation has attracted a lot of research attention in the deep ...
research
07/30/2020

Photon: A Robust Cross-Domain Text-to-SQL System

Natural language interfaces to databases (NLIDB) democratize end user ac...
research
11/09/2020

"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL

In Natural Language Interfaces to Databases systems, the text-to-SQL tec...
research
11/07/2020

SeqGenSQL – A Robust Sequence Generation Model for Structured Query Language

We explore using T5 (Raffel et al. (2019)) to directly translate natural...
research
06/09/2021

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Most available semantic parsing datasets, comprising of pairs of natural...
research
04/06/2022

Sigma Workbook: A Spreadsheet for Cloud Data Warehouses

Cloud data warehouses (CDWs) bring large-scale data and compute power cl...

Please sign up or login with your details

Forgot password? Click here to reset