TAPEX: Table Pre-training via Learning a Neural SQL Executor

07/16/2021
by   Qian Liu, et al.
0

Recent years pre-trained language models hit a success on modeling natural language sentences and (semi-)structured tables. However, existing table pre-training techniques always suffer from low data quality and low pre-training efficiency. In this paper, we show that table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. By pre-training on the synthetic corpus, our approach TAPEX dramatically improves the performance on downstream tasks, boosting existing language models by at most 19.5 strong results when using a small pre-trained corpus. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin, and our model achieves new state-of-the-art results on four well-known datasets, including improving the WikiSQL denotation accuracy to 89.6 SQA denotation accuracy to 74.5 (+3.6 on synthetic executable programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

We present GraPPa, an effective pre-training approach for table semantic...
research
05/16/2023

Generative Table Pre-training Empowers Models for Tabular Prediction

Recently, the topic of table pre-training has attracted considerable res...
research
01/27/2022

Reasoning Like Program Executors

Reasoning over natural language is a long-standing goal for the research...
research
09/10/2021

ReasonBERT: Pre-trained to Reason with Distant Supervision

We present ReasonBert, a pre-training method that augments language mode...
research
01/20/2022

LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training

Language-based environment manipulation requires agents to manipulate th...
research
12/01/2022

Language Model Pre-training on True Negatives

Discriminative pre-trained language models (PLMs) learn to predict origi...
research
05/30/2021

Pre-training Universal Language Representation

Despite the well-developed cut-edge representation learning for language...

Please sign up or login with your details

Forgot password? Click here to reset