Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

12/08/2020
by   Kevis-Kokitsi Maninis, et al.
13

We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method is able to process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and depth ambiguities in the per-frame predictions, and generally improves the estimate of all pose parameters. By leveraging multi-view constraints, our method also resolves occlusions and handles objects that are out of view in individual frames, thus reconstructing all objects into a single globally consistent CAD representation of the scene. In comparison to the state-of-the-art single-frame method Mask2CAD that we build on, we achieve substantial improvements on Scan2CAD (from 11.6 class average accuracy).

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 9

research
06/15/2023

CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

We propose a method for annotating videos of complex multi-object scenes...
research
10/20/2022

6D Pose Estimation for Textureless Objects on RGB Frames using Multi-View Optimization

6D pose estimation of textureless objects is a valuable but challenging ...
research
03/24/2022

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

We propose a transformer-based neural network architecture for multi-obj...
research
08/02/2021

Consistent Depth of Moving Objects in Video

We present a method to estimate depth of a dynamic scene, containing arb...
research
03/21/2018

A Unified Framework for Multi-View Multi-Class Object Pose Estimation

One core challenge in object pose estimation is to ensure accurate and r...
research
02/27/2022

Next-Best-View Prediction for Active Stereo Cameras and Highly Reflective Objects

Depth acquisition with the active stereo camera is a challenging task fo...
research
11/17/2022

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

We present a method for performing tasks involving spatial relations bet...

Please sign up or login with your details

Forgot password? Click here to reset