Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

01/21/2023
by   Shaofei Cai, et al.
0

We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents. We first identify two main challenges of learning such policies: 1) the indistinguishability of tasks from the state distribution, due to the vast scene diversity, and 2) the non-stationary nature of environment dynamics caused by partial observability. To tackle the first challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage the emergence of goal-relevant visual state representations. To tackle the second challenge, the policy is further fueled by an adaptive horizon prediction module that helps alleviate the learning uncertainty brought by the non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method significantly outperforms the best baseline so far; in many of them, we double the performance. Our ablation and exploratory studies then explain how our approach beat the counterparts and also unveil the surprising bonus of zero-shot generalization to new scenes (biomes). We hope our agent could help shed some light on learning goal-conditioned, multi-task agents in challenging, open-ended environments like Minecraft.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 13

research
11/21/2019

Agent Probing Interaction Policies

Reinforcement learning in a multi agent system is difficult because thes...
research
03/17/2017

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

Many real-world tasks involve multiple agents with partial observability...
research
02/03/2023

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

In this paper, we study the problem of planning in Minecraft, a popular,...
research
03/13/2021

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Simulation provides a safe and efficient way to generate useful data for...
research
01/13/2022

Non-Stationary Representation Learning in Sequential Linear Bandits

In this paper, we study representation learning for multi-task decision-...
research
06/24/2023

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems

This paper is concerned with a finite-horizon inverse control problem, w...
research
01/29/2019

Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

Problems arise when using reward functions to capture dependencies betwe...

Please sign up or login with your details

Forgot password? Click here to reset