TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

07/21/2022
by   Gabriel Sarch, et al.
2

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors. TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects. Commonsense priors are encoded in three modules: i) visuo-semantic detectors that detect out-of-place objects, ii) an associative neural graph memory of objects and spatial relations that proposes plausible semantic receptacles and surfaces for object repositions, and iii) a visual search network that guides the agent's exploration for efficiently localizing the receptacle-of-interest in the current scene to reposition the object. We test TIDEE on tidying up disorganized scenes in the AI2THOR simulation environment. TIDEE carries out the task directly from pixel and raw depth input without ever having observed the same room beforehand, relying only on priors learned from a separate set of training houses. Human evaluations on the resulting room reorganizations show TIDEE outperforms ablative versions of the model that do not use one or more of the commonsense priors. On a related room rearrangement benchmark that allows the agent to view the goal state prior to rearrangement, a simplified version of our model significantly outperforms a top-performing method by a large margin. Code and data are available at the project website: https://tidee-agent.github.io/.

READ FULL TEXT

page 12

page 27

page 28

page 30

research
06/02/2023

CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

This paper introduces a novel method for determining the best room to pl...
research
09/21/2023

SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs

Object rearrangement is pivotal in robotic-environment interactions, rep...
research
06/21/2022

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

Physically rearranging objects is an important capability for embodied a...
research
05/22/2022

Housekeep: Tidying Virtual Households using Commonsense Reasoning

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in...
research
10/15/2018

Visual Semantic Navigation using Scene Priors

How do humans navigate to target objects in novel scenes? Do we use the ...
research
03/31/2022

Continuous Scene Representations for Embodied AI

We propose Continuous Scene Representations (CSR), a scene representatio...
research
08/18/2016

IM2CAD

Given a single photo of a room and a large database of furniture CAD mod...

Please sign up or login with your details

Forgot password? Click here to reset