Evaluation of Categorical Generative Models – Bridging the Gap Between Real and Synthetic Data

10/28/2022
by   Florence Regol, et al.
0

The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicability. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a model's strengths, weaknesses, and overall capabilities. Gaining these insights can be particularly important for generative modeling as the target quantity is completely unknown. Multiple issues related to the evaluation of generative models have been reported in the literature. We argue those problems can be avoided by an evaluation based on ground truth. General criticisms of synthetic experiments are that they are too simplified and not representative of practical scenarios. As such, our experimental setting is tailored to a realistic generative task. We focus on categorical data and introduce an appropriately scalable evaluation method. Our method involves tasking a generative model to learn a distribution in a high-dimensional setting. We then successively bin the large space to obtain smaller probability spaces where meaningful statistical tests can be applied. We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks and compare the generative models based on the highest task difficulty they can reach before being detected as being too far from the ground truth. We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.

READ FULL TEXT
research
03/08/2023

Diffusing Gaussian Mixtures for Generating Categorical Data

Learning a categorical distribution comes with its own set of challenges...
research
08/23/2023

Improving Generative Model-based Unfolding with Schrödinger Bridges

Machine learning-based unfolding has enabled unbinned and high-dimension...
research
10/16/2022

Evaluation of the Synthetic Electronic Health Records

Generative models have been found effective for data synthesis due to th...
research
06/16/2020

Goodness-of-Fit Test for Self-Exciting Processes

Recently there have been many research efforts in developing generative ...
research
08/19/2022

Demystifying Randomly Initialized Networks for Evaluating Generative Models

Evaluation of generative models is mostly based on the comparison betwee...
research
06/01/2021

Hybrid Generative Models for Two-Dimensional Datasets

Two-dimensional array-based datasets are pervasive in a variety of domai...
research
05/26/2023

Functional Flow Matching

In this work, we propose Functional Flow Matching (FFM), a function-spac...

Please sign up or login with your details

Forgot password? Click here to reset