Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback

05/25/2023
by   Yiqi Lin, et al.
0

Generating and editing a 3D scene guided by natural language poses a challenge, primarily due to the complexity of specifying the positional relations and volumetric changes within the 3D space. Recent advancements in Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities across various domains. Surprisingly, these models also show great potential in realizing and interpreting the 3D space. In light of this, we propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter into the off-the-shelf layout-to-3D generative models, allowing users to flexibly and interactively generate visual content. Specifically, we design a versatile layout structure base on the bounding boxes and semantics to prompt the LLMs to model the spatial generation and reasoning from language. Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content. We validate the effectiveness of LI3D, primarily in 3D generation and editing through multi-round interactions, which can be flexibly extended to 2D generation and editing. Various experiments demonstrate the potential benefits of incorporating LLMs in generative AI for applications, e.g., metaverse. Moreover, we benchmark the layout reasoning performance of LLMs with neural visual artist tasks, revealing their emergent ability in the spatial layout domain.

READ FULL TEXT

page 6

page 7

page 8

page 16

page 19

page 21

page 25

page 26

research
05/24/2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Attaining a high degree of user controllability in visual generation oft...
research
12/25/2019

Controllable and Progressive Image Extrapolation

Image extrapolation aims at expanding the narrow field of view of a give...
research
06/23/2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Large-scale generative models such as GPT and DALL-E have revolutionized...
research
05/24/2023

Visual Programming for Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in man...
research
03/24/2023

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

Recent research endeavors have shown that combining neural radiance fiel...
research
03/08/2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

ChatGPT is attracting a cross-field interest as it provides a language i...
research
03/04/2022

Interactive Image Synthesis with Panoptic Layout Generation

Interactive image synthesis from user-guided input is a challenging task...

Please sign up or login with your details

Forgot password? Click here to reset