DeepAI AI Chat
Log In Sign Up

Building Generalizable Agents with a Realistic and Rich 3D Environment

by   Yi Wu, et al.

Towards bridging the gap between machine and human intelligence, it is of utmost importance to introduce environments that are visually realistic and rich in content. In such environments, one can evaluate and improve a crucial property of practical intelligent systems, namely generalization. In this work, we build House3D, a rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of houses, ranging from single-room studios to multi-storeyed houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et al., 2017). With an emphasis on semantic-level generalization, we study the task of concept-driven navigation, RoomNav, using a subset of houses in House3D. In RoomNav, an agent navigates towards a target specified by a semantic concept. To succeed, the agent learns to comprehend the scene it lives in by developing perception, understand the concept by mapping it to the correct semantics, and navigate to the target by obeying the underlying physical rules. We train RL agents with both continuous and discrete action spaces and show their ability to generalize in new unseen environments. In particular, we observe that (1) training is substantially harder on large house sets but results in better generalization, (2) using semantic signals (e.g., segmentation mask) boosts the generalization performance, and (3) gated networks on semantic input signal lead to improved training performance and generalization. We hope House3D, including the analysis of the RoomNav task, serves as a building block towards designing practical intelligent systems and we wish it to be broadly adopted by the community.


Learning and Planning with a Semantic Model

Building deep reinforcement learning agents that can generalize and adap...

Collaborative Visual Navigation

As a fundamental problem for Artificial Intelligence, multi-agent system...

Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

One of the grand challenges of reinforcement learning is the ability to ...

Situational Fusion of Visual Representation for Visual Navigation

A complex visual navigation task puts an agent in different situations w...

3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents

Intelligent agents gather information and perceive semantics within the ...

Bayesian Relational Memory for Semantic Visual Navigation

We introduce a new memory architecture, Bayesian Relational Memory (BRM)...

The Replica Dataset: A Digital Replica of Indoor Spaces

We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor s...