mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer

10/07/2021
by   Marcelo Archanjo José, et al.
0

The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83 Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2023

A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Long sequences of text are challenging in the context of transformers, d...
research
04/06/2017

MRA - Proof of Concept of a Multilingual Report Annotator Web Application

MRA (Multilingual Report Annotator) is a web application that translates...
research
08/31/2021

mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset

The MS MARCO ranking dataset has been widely used for training deep lear...
research
05/23/2022

KOLD: Korean Offensive Language Dataset

Although large attention has been paid to the detection of hate speech, ...
research
04/21/2023

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

We study the problem of decomposing a complex text-to-sql task into smal...
research
05/08/2023

MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset

Relation extraction (RE) is a fundamental task in information extraction...
research
06/07/2023

Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs

Compared to English, German word order is freer and therefore poses addi...

Please sign up or login with your details

Forgot password? Click here to reset