A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

07/24/2023
by   Izzeddin Gür, et al.
0

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50 and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9 higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation.

READ FULL TEXT

page 4

page 18

page 20

research
06/09/2023

Mind2Web: Towards a Generalist Agent for the Web

We introduce Mind2Web, the first dataset for developing and evaluating g...
research
07/04/2023

Embodied Task Planning with Large Language Models

Equipping embodied agents with commonsense is important for robots to su...
research
11/01/2022

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

The adoption of pre-trained language models to generate action plans for...
research
03/16/2023

A Picture is Worth a Thousand Words: Language Models Plan from Pixels

Planning is an important capability of artificial agents that perform lo...
research
07/04/2022

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Existing benchmarks for grounding language in interactive environments e...
research
06/07/2021

Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

Despite recent progress, learning new tasks through language instruction...
research
03/16/2022

Less is More: Summary of Long Instructions is Better for Program Synthesis

Despite the success of large pre-trained language models (LMs) such as C...

Please sign up or login with your details

Forgot password? Click here to reset