DeepAI AI Chat
Log In Sign Up

DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data

03/07/2022
by   Gael Lederrey, et al.
39

Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Generative Adversarial Networks (GANs) are considered state-of-the-art for developing generative models. However, these deep learning models are data-driven, and it is, thus, difficult to control the generation process. It can, therefore, lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfitting the sample's noise. This article presents the Directed Acyclic Tabular GAN (DATGAN) to address these limitations by integrating expert knowledge in deep learning models for synthetic tabular data generation. This approach allows the interactions between variables to be specified explicitly using a Directed Acyclic Graph (DAG). The DAG is then converted to a network of modified Long Short-Term Memory (LSTM) cells to accept multiple inputs. Multiple DATGAN versions are systematically tested on multiple assessment metrics. We show that the best versions of the DATGAN outperform state-of-the-art generative models on multiple case studies. Finally, we show how the DAG can create hypothetical synthetic datasets.

READ FULL TEXT

page 17

page 18

page 33

page 42

10/05/2022

ciDATGAN: Conditional Inputs for Tabular GANs

Conditionality has become a core component for Generative Adversarial Ne...
03/02/2023

Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems

Deep learning models frequently suffer from various problems such as cla...
11/07/2018

Forging new worlds: high-resolution synthetic galaxies with chained generative adversarial networks

Astronomy of the 21st century finds itself with extreme quantities of da...
10/22/2020

Learnability and Complexity of Quantum Samples

Given a quantum circuit, a quantum computer can sample the output distri...
03/22/2023

Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models

This paper presents a novel approach to simulating electronic health rec...
08/16/2022

Novel Deep Learning Approach to Derive Cytokeratin Expression and Epithelium Segmentation from DAPI

Generative Adversarial Networks (GANs) are state of the art for image sy...

Code Repositories

DATGAN

Directed Acyclic Tabular GAN (DATGAN) for integrating expert knowledge in synthetic tabular data generation


view repo

SynthPop

Code for the article "DATGAN: Integrating Expert Knowledge into Deep Learning for Synthetic Tabular Data"


view repo