T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities

05/24/2023
by   Kangfu Mei, et al.
0

Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. While DPF shows great potential for unifying data generation of various modalities including images, videos, and 3D geometry, it does not scale to a higher data resolution. This can be attributed to the “scaling property”, where it is difficult for the model to capture local structures through uniform sampling. To this end, we propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning, and incorporating additional guidance, e.g., text description, to complement the global geometry. The model can be scaled to generate high-resolution data while unifying multiple modalities. Experimental results on data generation in various modalities demonstrate the effectiveness of our model, as well as its potential as a foundation framework for scalable modality-unified visual content generation.

READ FULL TEXT

page 6

page 7

page 9

page 14

research
01/26/2023

simple diffusion: End-to-end diffusion for high resolution images

Currently, applying diffusion models in pixel space of high resolution i...
research
03/01/2023

Diffusion Probabilistic Fields

Diffusion probabilistic models have quickly become a major approach for ...
research
03/12/2023

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

This paper proposes a unified diffusion framework (dubbed UniDiffuser) t...
research
06/01/2023

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Text-conditional diffusion models are able to generate high-fidelity ima...
research
05/02/2023

Unpaired Downscaling of Fluid Flows with Diffusion Bridges

We present a method to downscale idealized geophysical fluid simulations...
research
02/22/2022

Hierarchical Perceiver

General perception systems such as Perceivers can process arbitrary moda...
research
12/01/2022

Unite and Conquer: Cross Dataset Multimodal Synthesis using Diffusion Models

Generating photos satisfying multiple constraints find broad utility in ...

Please sign up or login with your details

Forgot password? Click here to reset