Evaluating the Text-to-SQL Capabilities of Large Language Models

03/15/2022
by   Nitarshan Rajkumar, et al.
0

We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

READ FULL TEXT
research
04/23/2023

Divide and Prompt: Chain of Thought Prompting for Text-to-SQL

Chain-of-thought (CoT) prompting combined with large language models (LL...
research
10/13/2022

Assessing Out-of-Domain Language Model Performance from Few Examples

While pretrained language models have exhibited impressive generalizatio...
research
01/21/2023

Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Neural text-to-SQL models have achieved remarkable performance in transl...
research
02/23/2023

Language Model Crossover: Variation through Few-Shot Prompting

This paper pursues the insight that language models naturally enable an ...
research
05/25/2023

Uncovering and Categorizing Social Biases in Text-to-SQL

Content Warning: This work contains examples that potentially implicate ...
research
05/24/2023

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

We investigate the predictability of large language model (LLM) capabili...
research
07/07/2018

Recommender system for learning SQL using hints

Today's software industry requires individuals who are proficient in as ...

Please sign up or login with your details

Forgot password? Click here to reset