Scalable 3D Captioning with Pretrained Models

06/12/2023
by   Tiange Luo, et al.
0

We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. This approach utilizes pretrained models from image captioning, image-text alignment, and LLM to consolidate captions from multiple views of a 3D asset, completely side-stepping the time-consuming and costly process of manual annotation. We apply Cap3D to the recently introduced large-scale 3D dataset, Objaverse, resulting in 660k 3D-text pairs. Our evaluation, conducted using 41k human annotations from the same dataset, demonstrates that Cap3D surpasses human-authored descriptions in terms of quality, cost, and speed. Through effective prompt engineering, Cap3D rivals human performance in generating geometric descriptions on 17k collected annotations from the ABO dataset. Finally, we finetune Text-to-3D models on Cap3D and human captions, and show Cap3D outperforms; and benchmark the SOTA including Point-E, Shape-E, and DreamFusion.

READ FULL TEXT

page 1

page 9

page 15

page 16

page 17

page 18

research
06/26/2021

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

Despite the success of various text generation metrics such as BERTScore...
research
09/14/2023

Training Audio Captioning Models without Audio

Automated Audio Captioning (AAC) is the task of generating natural langu...
research
02/12/2021

Annotation Cleaning for the MSR-Video to Text Dataset

The video captioning task is to describe the video contents with natural...
research
09/15/2023

PatFig: Generating Short and Long Captions for Patent Figures

This paper introduces Qatent PatFig, a novel large-scale patent figure d...
research
12/13/2021

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

Text-based image captioning (TextCap) requires simultaneous comprehensio...
research
10/21/2022

PoseScript: 3D Human Poses from Natural Language

Natural language is leveraged in many computer vision tasks such as imag...
research
07/20/2023

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

Captions are crucial for understanding scientific visualizations and doc...

Please sign up or login with your details

Forgot password? Click here to reset