Learning Object-conditioned Exploration using Distributed Soft Actor Critic

by   Ayzaan Wahid, et al.

Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.


page 2

page 8


Using Soft Actor-Critic for Low-Level UAV Control

Unmanned Aerial Vehicles (UAVs), or drones, have recently been used in s...

Variational Quantum Soft Actor-Critic

Quantum computing has a superior advantage in tackling specific problems...

Adversarially Guided Actor-Critic

Despite definite success in deep reinforcement learning problems, actor-...

Reinforcement Learning with Random Delays

Action and observation delays commonly occur in many Reinforcement Learn...

Landmark Policy Optimization for Object Navigation Task

This work studies object goal navigation task, which involves navigating...

Self-Imitation Learning

This paper proposes Self-Imitation Learning (SIL), a simple off-policy a...

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

The class of deep deterministic off-policy algorithms is effectively app...

1 Introduction