Diffusing Gaussian Mixtures for Generating Categorical Data

03/08/2023
by   Florence Regol, et al.
0

Learning a categorical distribution comes with its own set of challenges. A successful approach taken by state-of-the-art works is to cast the problem in a continuous domain to take advantage of the impressive performance of the generative models for continuous data. Amongst them are the recently emerging diffusion probabilistic models, which have the observed advantage of generating high-quality samples. Recent advances for categorical generative models have focused on log likelihood improvements. In this work, we propose a generative model for categorical data based on diffusion models with a focus on high-quality sample generation, and propose sampled-based evaluation methods. The efficacy of our method stems from performing diffusion in the continuous domain while having its parameterization informed by the structure of the categorical nature of the target distribution. Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets.

READ FULL TEXT
research
11/28/2022

Continuous diffusion for categorical data

Diffusion models have quickly become the go-to paradigm for generative m...
research
10/28/2022

Evaluation of Categorical Generative Models – Bridging the Gap Between Real and Synthetic Data

The machine learning community has mainly relied on real data to benchma...
research
08/03/2022

AdaCat: Adaptive Categorical Discretization for Autoregressive Models

Autoregressive generative models can estimate complex continuous data di...
research
09/04/2023

FinDiff: Diffusion Models for Financial Tabular Data Generation

The sharing of microdata, such as fund holdings and derivative instrumen...
research
09/02/2022

First Hitting Diffusion Models

We propose a family of First Hitting Diffusion Models (FHDM), deep gener...
research
02/20/2023

DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

While diffusion models have achieved great success in generating continu...
research
06/01/2021

Hybrid Generative Models for Two-Dimensional Datasets

Two-dimensional array-based datasets are pervasive in a variety of domai...

Please sign up or login with your details

Forgot password? Click here to reset