Point-Bind Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

09/01/2023
by   Ziyu Guo, et al.
0

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large language model (LLM) following 3D multi-modal instructions. By parameter-efficient fine-tuning techniques, Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi-modal question-answering capacity. We hope our work may cast a light on the community for extending 3D point clouds to multi-modality applications. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM.

READ FULL TEXT

page 2

page 3

page 8

research
09/07/2023

ImageBind-LLM: Multi-modality Instruction Tuning

We present ImageBind-LLM, a multi-modality instruction tuning method of ...
research
08/30/2023

LLaSM: Large Language and Speech Model

Multi-modal large language models have garnered significant interest rec...
research
03/14/2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

We present a Non-parametric Network for 3D point cloud analysis, Point-N...
research
06/12/2023

Valley: Video Assistant with Large Language model Enhanced abilitY

Recently, several multi-modal models have been developed for joint image...
research
08/03/2023

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

In this work, we investigate extending the comprehension of Multi-modal ...
research
12/13/2022

LidarCLIP or: How I Learned to Talk to Point Clouds

Research connecting text and images has recently seen several breakthrou...
research
02/10/2023

Boosting 3D Point Cloud Registration by Transferring Multi-modality Knowledge

The recent multi-modality models have achieved great performance in many...

Please sign up or login with your details

Forgot password? Click here to reset