A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

06/25/2023
by   Marcelo Archanjo José, et al.
0

Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. We propose a training process with database schema pruning (removal of tables and columns names that are useless for the query of interest). In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously: English, Portuguese, Spanish, and French. Our proposed technique used the Spider dataset and increased the exact set match accuracy results from 0.718 to 0.736 in a validation dataset (Dev). Source code, evaluations, and checkpoints are available at: https://github.com/C4AI/gap-text2sql.

READ FULL TEXT

page 6

page 8

page 10

research
06/27/2019

Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers

When translating natural language questions into SQL queries to answer q...
research
10/07/2021

mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer

The translation of natural language questions to SQL queries has attract...
research
11/01/2021

SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL

The Text-to-SQL task, aiming to translate the natural language of the qu...
research
04/10/2021

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser

Given a database schema, Text-to-SQL aims to translate a natural languag...
research
12/27/2022

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Text-to-SQL semantic parsing is an important NLP task, which greatly fac...
research
12/23/2020

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

We present BRIDGE, a powerful sequential architecture for modeling depen...
research
09/02/2019

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

We focus on the cross-domain context-dependent text-to-SQL generation ta...

Please sign up or login with your details

Forgot password? Click here to reset