Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders

11/30/2022
by   Ciro Antonio Mami, et al.
0

Synthetic data generation has recently gained widespread attention as a more reliable alternative to traditional data anonymization. The involved methods are originally developed for image synthesis. Hence, their application to the typically tabular and relational datasets from healthcare, finance and other industries is non-trivial. While substantial research has been devoted to the generation of realistic tabular datasets, the study of synthetic relational databases is still in its infancy. In this paper, we combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases. We then apply the obtained method to two publicly available databases in computational experiments. The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets, even for large datasets with advanced data types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2023

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

Tabular data is a common form of organizing data. Multiple models are av...
research
02/06/2020

Supervised Learning on Relational Databases with Graph Neural Networks

The majority of data scientists and machine learning practitioners use r...
research
05/19/2023

RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design

Identifying molecules that exhibit some pre-specified properties is a di...
research
03/02/2016

Probabilistic Relational Model Benchmark Generation

The validation of any database mining methodology goes through an evalua...
research
10/11/2019

Persistence and Big Data Analytics Architectures for Smart Connected Vehicles

Up until recently, relational databases were considered as the de-facto ...
research
11/24/2021

SchemaDB: Structures in Relational Datasets

In this paper we introduce the SchemaDB data-set; a collection of relati...
research
07/04/2018

Generating Synthetic but Plausible Healthcare Record Datasets

Generating datasets that "look like" given real ones is an interesting t...

Please sign up or login with your details

Forgot password? Click here to reset