CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

by   Kevis-Kokitsi Maninis, et al.

We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are performed automatically, and the tasks performed by humans are simple, well-specified, and require only limited reasoning in 3D. This makes them feasible for crowd-sourcing and has allowed us to construct a large-scale dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate offers 108K instances of 12K unique CAD models placed in the 3D representations of 21K videos. In comparison to Scan2CAD, the largest existing dataset with CAD model annotations on real scenes, CAD-Estate has 8x more instances and 4x more unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on CAD-Estate for the task of automatic 3D object reconstruction and pose estimation, demonstrating that it leads to improvements on the popular Scan2CAD benchmark. We will release the data by mid July 2023.


page 1

page 3

page 4

page 5

page 8


Matching RGB Images to CAD Models for Object Pose Estimation

We propose a novel method for 3D object pose estimation in RGB images, w...

Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

We address the task of aligning CAD models to a video sequence of a comp...

CAD Priors for Accurate and Flexible Instance Reconstruction

We present an efficient and automatic approach for accurate reconstructi...

Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

We present an automatic method for annotating images of indoor scenes wi...

HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

We present an automated and efficient approach for retrieving high-quali...

Addressing the Sim2Real Gap in Robotic 3D Object Classification

Object classification with 3D data is an essential component of any scen...

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

We propose a transformer-based neural network architecture for multi-obj...

Please sign up or login with your details

Forgot password? Click here to reset