Decentralized Distributed PPO: Solving PointGoal Navigation

11/01/2019
by   Erik Wijmans, et al.
45

We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever "stale"), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling – achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) – over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially "solves" the task – near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90 performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks – the analog of "ImageNet pre-training + task-specific fine-tuning" for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models + code will be publicly available).

READ FULL TEXT

page 1

page 6

page 8

page 9

page 10

page 11

page 13

page 17

research
10/11/2022

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement

We present Variable Experience Rollout (VER), a technique for efficientl...
research
11/20/2022

Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

In Vision-and-Language Navigation (VLN), researchers typically take an i...
research
08/03/2023

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for ...
research
06/14/2022

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Massive datasets and high-capacity models have driven many recent advanc...
research
02/05/2022

Zero Experience Required: Plug Play Modular Transfer Learning for Semantic Visual Navigation

In reinforcement learning for visual navigation, it is common to develop...

Please sign up or login with your details

Forgot password? Click here to reset