On the causality-preservation capabilities of generative modelling

01/03/2023
by   Yves-Cédric Bauwelinckx, et al.
0

Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Causal-TGAN: Generating Tabular Data Using Causal Generative Adversarial Networks

Synthetic data generation becomes prevalent as a solution to privacy lea...
research
03/02/2023

Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems

Deep learning models frequently suffer from various problems such as cla...
research
09/26/2021

Synthetic Data Generation for Fraud Detection using GANs

Detecting money laundering in gambling is becoming increasingly challeng...
research
01/31/2023

A Bayesian Generative Adversarial Network (GAN) to Generate Synthetic Time-Series Data, Application in Combined Sewer Flow Prediction

Despite various breakthroughs in machine learning and data analysis tech...
research
07/12/2022

Generative Adversarial Networks Applied to Synthetic Financial Scenarios Generation

The finance industry is producing an increasing amount of datasets that ...
research
07/01/2023

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis

Generative adversarial networks (GANs) have drawn considerable attention...
research
08/24/2022

GAN-based generative modelling for dermatological applications – comparative study

The lack of sufficiently large open medical databases is one of the bigg...

Please sign up or login with your details

Forgot password? Click here to reset