Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

by   Evan Zheran Liu, et al.

Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to "warm-start" the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level "workflows" which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., "Step 1: click on a textbox; Step 2: enter some text"). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent's ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x.


page 1

page 2

page 3

page 4


Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration

In complex environments with high dimension, training a reinforcement le...

Learning to Navigate the Web

Learning in environments with large state and action spaces, and sparse ...

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Recent progress in deep reinforcement learning (RL) and computer vision ...

Depth and nonlinearity induce implicit exploration for RL

The question of how to explore, i.e., take actions with uncertain outcom...

Play with Emotion: Affect-Driven Reinforcement Learning

This paper introduces a paradigm shift by viewing the task of affect mod...

DOM-Q-NET: Grounded RL on Structured Language

Building agents to interact with the web would allow for significant imp...

Please sign up or login with your details

Forgot password? Click here to reset