Copula Flows for Synthetic Data Generation

01/03/2021
by   Sanket Kamthe, et al.
12

The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret.Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables.Moreover, loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is equivalent to estimating the density of the data. Based on the copula theory, we divide the density estimation task into two parts, i.e., estimating univariate marginals and estimating the multivariate copula density over the univariate marginals. We use normalising flows to learn both the copula density and univariate marginals. We benchmark our method on both simulated and real data-sets in terms of density estimation as well as the ability to generate high-fidelity synthetic data

READ FULL TEXT

page 4

page 13

page 14

research
02/06/2020

Using generative adversarial networks to synthesize artificial financial datasets

Generative Adversarial Networks (GANs) became very popular for generatio...
research
04/25/2022

PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings

Generative models such as the variational autoencoder (VAE) and the gene...
research
10/29/2021

Improving the quality of generative models through Smirnov transformation

Solving the convergence issues of Generative Adversarial Networks (GANs)...
research
05/19/2022

Smooth densities and generative modeling with unsupervised random forests

Density estimation is a fundamental problem in statistics, and any attem...
research
09/10/2023

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial i...
research
09/04/2023

FinDiff: Diffusion Models for Financial Tabular Data Generation

The sharing of microdata, such as fund holdings and derivative instrumen...
research
10/21/2021

CaloFlow II: Even Faster and Still Accurate Generation of Calorimeter Showers with Normalizing Flows

Recently, we introduced CaloFlow, a high-fidelity generative model for G...

Please sign up or login with your details

Forgot password? Click here to reset