Generative Forests

08/07/2023
by   Richard Nock, et al.
0

Tabular data represents one of the most prevalent form of data. When it comes to data generation, many approaches would learn a density for the data generation process, but would not necessarily end up with a sampler, even less so being exact with respect to the underlying density. A second issue is on models: while complex modeling based on neural nets thrives in image or text generation (etc.), less is known for powerful generative models on tabular data. A third problem is the visible chasm on tabular data between training algorithms for supervised learning with remarkable properties (e.g. boosting), and a comparative lack of guarantees when it comes to data generation. In this paper, we tackle the three problems, introducing new tree-based generative models convenient for density modeling and tabular data generation that improve on modeling capabilities of recent proposals, and a training algorithm which simplifies the training setting of previous approaches and displays boosting-compliant convergence. This algorithm has the convenient property to rely on a supervised training scheme that can be implemented by a few tweaks to the most popular induction scheme for decision tree induction with two classes. Experiments are provided on missing data imputation and comparing generated data to real data, displaying the quality of the results obtained by our approach, in particular against state of the art.

READ FULL TEXT

page 2

page 3

page 12

page 13

page 15

page 36

page 38

page 41

research
01/26/2022

Generative Trees: Adversarial and Copycat

While Generative Adversarial Networks (GANs) achieve spectacular results...
research
03/27/2020

MCFlow: Monte Carlo Flow Models for Data Imputation

We consider the topic of data imputation, a foundational task in machine...
research
07/12/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood...
research
05/11/2019

Boosting Generative Models by Leveraging Cascaded Meta-Models

Deep generative models are effective methods of modeling data. However, ...
research
02/27/2017

Boosted Generative Models

We propose a new approach for using unsupervised boosting to create an e...
research
02/13/2023

Variational Mixture of HyperGenerators for Learning Distributions Over Functions

Recent approaches build on implicit neural representations (INRs) to pro...
research
05/19/2022

Smooth densities and generative modeling with unsupervised random forests

Density estimation is a fundamental problem in statistics, and any attem...

Please sign up or login with your details

Forgot password? Click here to reset