FinDiff: Diffusion Models for Financial Tabular Data Generation

09/04/2023
by   Timur Sattarov, et al.
0

The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces 'FinDiff', a diffusion model designed to generate real-world financial tabular data for a variety of regulatory downstream tasks, for example economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility.

READ FULL TEXT
research
02/28/2023

Synthesizing Mixed-type Electronic Health Records using Diffusion Models

Electronic Health Records (EHRs) contain sensitive patient information, ...
research
03/08/2023

Diffusing Gaussian Mixtures for Generating Categorical Data

Learning a categorical distribution comes with its own set of challenges...
research
01/03/2021

Copula Flows for Synthetic Data Generation

The ability to generate high-fidelity synthetic data is crucial when ava...
research
02/27/2023

Differentially Private Diffusion Models Generate Useful Synthetic Images

The ability to generate privacy-preserving synthetic versions of sensiti...
research
08/28/2023

Generating tabular datasets under differential privacy

Machine Learning (ML) is accelerating progress across fields and industr...
research
09/30/2019

Generating High-fidelity, Synthetic Time Series Datasets with DoppelGANger

Limited data access is a substantial barrier to data-driven networking r...
research
09/14/2023

Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context

Financial simulators play an important role in enhancing forecasting acc...

Please sign up or login with your details

Forgot password? Click here to reset