Continuous diffusion for categorical data

11/28/2022
by   Sander Dieleman, et al.
0

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models

The field of language modelling has been largely dominated by autoregres...
research
03/08/2023

Diffusing Gaussian Mixtures for Generating Categorical Data

Learning a categorical distribution comes with its own set of challenges...
research
09/30/2022

TabDDPM: Modelling Tabular Data with Diffusion Models

Denoising diffusion probabilistic models are currently becoming the lead...
research
04/10/2023

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recentl...
research
04/07/2023

ChiroDiff: Modelling chirographic data with Diffusion Models

Generative modelling over continuous-time geometric constructs, a.k.a su...
research
08/14/2023

Bayesian Flow Networks

This paper introduces Bayesian Flow Networks (BFNs), a new class of gene...
research
07/21/2017

A New Family of Near-metrics for Universal Similarity

We propose a family of near-metrics based on local graph diffusion to ca...

Please sign up or login with your details

Forgot password? Click here to reset