Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries

11/28/2022
by   Yuanwen Yue, et al.
0

We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models will be available at: https://github.com/ywyue/RoomFormer.

READ FULL TEXT

page 7

page 8

page 12

page 14

research
06/18/2021

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and...
research
03/08/2021

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) dete...
research
04/10/2022

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and...
research
08/17/2021

End-to-End Dense Video Captioning with Parallel Decoding

Dense video captioning aims to generate multiple associated captions wit...
research
07/10/2021

Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition

Surgical phase recognition is of particular interest to computer assiste...
research
08/12/2022

Class-attention Video Transformer for Engagement Intensity Prediction

In order to deal with variant-length long videos, prior works extract mu...
research
09/06/2023

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

3D dense captioning requires a model to translate its understanding of a...

Please sign up or login with your details

Forgot password? Click here to reset