Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

06/07/2021
by   Yichi Zhang, et al.
0

Despite recent progress, learning new tasks through language instructions remains an extremely challenging problem. On the ALFRED benchmark for task learning, the published state-of-the-art system only achieves a task success rate of less than 10 performance of over 90 at task learning. In a departure from a widely applied end-to-end architecture, we decomposed task learning into three sub-problems: sub-goal planning, scene navigation, and object manipulation; and developed a model HiTUT (stands for Hierarchical Tasks via Unified Transformers) that addresses each sub-problem in a unified manner to learn a hierarchical task structure. On the ALFRED benchmark, HiTUT has achieved the best performance with a remarkably higher generalization ability. In the unseen environment, HiTUT achieves over 160 performance gain in success rate compared to the previous state of the art. The explicit representation of task structures also enables an in-depth understanding of the nature of the problem and the ability of the agent, which provides insight for future benchmark development and evaluation.

READ FULL TEXT
research
02/25/2020

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

Learning to navigate in a visual environment following natural-language ...
research
04/09/2023

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Understanding the continuous states of objects is essential for task lea...
research
12/03/2018

Multi-task Learning of Hierarchical Vision-Language Representation

It is still challenging to build an AI system that can perform tasks tha...
research
11/15/2022

Structured Exploration Through Instruction Enhancement for Object Navigation

Finding an object of a specific class in an unseen environment remains a...
research
07/24/2023

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Pre-trained large language models (LLMs) have recently achieved better g...
research
03/05/2019

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

As deep learning continues to make progress for challenging perception t...
research
05/18/2022

On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets

Natural language guided embodied task completion is a challenging proble...

Please sign up or login with your details

Forgot password? Click here to reset