Model-Based Offline Planning with Trajectory Pruning

by   Xianyuan Zhan, et al.

Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL useable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. Model-based planning framework provides an attractive solution for such tasks. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches, and allows easy adaptation to varying objectives and extra constraints.


page 3

page 4

page 5

page 6

page 7

page 10

page 12

page 14


Model-Based Offline Planning

Offline learning is a key part of making reinforcement learning (RL) use...

UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning

Offline reinforcement learning (RL) provides a framework for learning de...

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

Applications of Reinforcement Learning (RL) in robotics are often limite...

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a ...

The Holy Grail of Multi-Robot Planning: Learning to Generate Online-Scalable Solutions from Offline-Optimal Experts

Many multi-robot planning problems are burdened by the curse of dimensio...

Offline Equilibrium Finding

Offline reinforcement learning (Offline RL) is an emerging field that ha...

Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies

Treatment policies learned via reinforcement learning (RL) from observat...