REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

02/04/2023
by   Aivin V. Solatorio, et al.
0

Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is challenging as it requires modeling both a "parent" table and its relationships across tables. We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq) model. We implement target masking to prevent data copying and propose the Q_δ statistic and statistical bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on prediction tasks, "out-of-the-box", for large non-relational datasets without needing fine-tuning.

READ FULL TEXT

page 7

page 17

research
11/14/2022

Row Conditional-TGAN for generating synthetic relational databases

Besides reproducing tabular data properties of standalone tables, synthe...
research
11/30/2022

Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders

Synthetic data generation has recently gained widespread attention as a ...
research
06/26/2020

TURL: Table Understanding through Representation Learning

Relational tables on the Web store a vast amount of knowledge. Owing to ...
research
09/09/2021

AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational Data

Temporal relational data, perhaps the most commonly used data type in in...
research
06/07/2023

Privately generating tabular data using language models

Privately generating synthetic data from a table is an important brick o...
research
09/14/2021

A Novel Global Feature-Oriented Relational Triple Extraction Model based on Table Filling

Table filling based relational triple extraction methods are attracting ...
research
04/06/2013

Client-Driven Content Extraction Associated with Table

The goal of the project is to extract content within table in document i...

Please sign up or login with your details

Forgot password? Click here to reset