Generative Trees: Adversarial and Copycat

01/26/2022
by   Richard Nock, et al.
5

While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours to a large extent decision tree (DT)-based models. This paper proposes a new path forward for the generation of tabular data, exploiting decades-old understanding of the supervised task's best components for DT induction, from losses (properness), models (tree-based) to algorithms (boosting). The properness condition on the supervised loss – which postulates the optimality of Bayes rule – leads us to a variational GAN-style loss formulation which is tight when discriminators meet a calibration property trivially satisfied by DTs, and, under common assumptions about the supervised loss, yields "one loss to train against them all" for the generator: the χ^2. We then introduce tree-based generative models, generative trees (GTs), meant to mirror on the generative side the good properties of DTs for classifying tabular data, with a boosting-compliant adversarial training algorithm for GTs. We also introduce copycat training, in which the generator copies at run time the underlying tree (graph) of the discriminator DT and completes it for the hardest discriminative task, with boosting compliant convergence. We test our algorithms on tasks including fake/real distinction, training from fake data and missing data imputation. Each one of these tasks displays that GTs can provide comparatively simple – and interpretable – contenders to sophisticated state of the art methods for data generation (using neural network models) or missing data imputation (relying on multiple imputation by chained equations with complex tree-based modeling).

READ FULL TEXT

page 13

page 33

page 36

page 37

page 38

page 39

page 40

page 41

research
08/07/2023

Generative Forests

Tabular data represents one of the most prevalent form of data. When it ...
research
08/11/2020

IGANI: Iterative Generative Adversarial Networks for Imputation Applied to Prediction of Traffic Data

Generative adversarial networks (GANs) are implicit generative models th...
research
08/03/2021

Categorical EHR Imputation with Generative Adversarial Nets

Electronic Health Records often suffer from missing data, which poses a ...
research
06/07/2018

GAIN: Missing Data Imputation using Generative Adversarial Nets

We propose a novel method for imputing missing data by adapting the well...
research
11/04/2020

Learning to Rank with Missing Data via Generative Adversarial Networks

We explore the role of Conditional Generative Adversarial Networks (GAN)...
research
01/10/2022

Differentiable and Scalable Generative Adversarial Models for Data Imputation

Data imputation has been extensively explored to solve the missing data ...
research
02/26/2019

HexaGAN: Generative Adversarial Nets for Real World Classification

Most deep learning classification studies assume clean data. However, di...

Please sign up or login with your details

Forgot password? Click here to reset