Model-Based Offline Planning

08/12/2020
by   Arthur Argenson, et al.
0

Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system's operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability leverage planning to respect environmental constraints. We are able to find near-optimal polices for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-shot goal-conditioned policies on a series of environments.

READ FULL TEXT

page 8

page 21

research
05/16/2021

Model-Based Offline Planning with Trajectory Pruning

Offline reinforcement learning (RL) enables learning policies using pre-...
research
11/15/2021

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

Applications of Reinforcement Learning (RL) in robotics are often limite...
research
01/14/2022

Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

Offline reinforcement learning (RL) Algorithms are often designed with e...
research
06/16/2021

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Offline Reinforcement Learning (RL) aims to extract near-optimal policie...
research
11/10/2019

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
11/10/2019

Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
07/26/2021

The Holy Grail of Multi-Robot Planning: Learning to Generate Online-Scalable Solutions from Offline-Optimal Experts

Many multi-robot planning problems are burdened by the curse of dimensio...

Please sign up or login with your details

Forgot password? Click here to reset