Towards Model-based Reinforcement Learning for Industry-near Environments

by   Per-Arne Andersen, et al.

Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. While these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that in practice, make these algorithms a no-go for critical operations in the industry. On the other hand, model-based reinforcement learning focuses on learning the transition dynamics between states in an environment. If these environment dynamics are adequately learned, a model-based approach is perhaps the most sample efficient method for learning agents to act in an environment optimally. The traits of model-based reinforcement are ideal for real-world environments where sampling is slow and for mission-critical operations. In the warehouse industry, there is an increasing motivation to minimise time and to maximise production. Currently, autonomous agents act suboptimally using handcrafted policies for significant portions of the state-space. In this paper, we present The Dreaming Variational Autoencoder v2 (DVAE-2), a model-based reinforcement learning algorithm that increases sample efficiency, hence enable algorithms with low sample efficiency function better in real-world environments. We introduce Deep Warehouse, a simulated environment for industry-near testing of autonomous agents in grid-based warehouses. Finally, we illustrate that DVAE-2 improves the sample efficiency for the Deep Warehouse compared to model-free methods.


page 1

page 2

page 3

page 4


VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control

Recent breakthroughs in Go play and strategic games have witnessed the g...

Discrete Latent Space World Models for Reinforcement Learning

Sample efficiency remains a fundamental issue of reinforcement learning....

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Deep reinforcement learning has recently shown many impressive successes...

Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction

In the present paper, we propose a decoder-free extension of Dreamer, a ...

Equivariant MuZero

Deep reinforcement learning repeatedly succeeds in closed, well-defined ...

Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space

Combining model-based and model-free deep reinforcement learning has sho...

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

Model-based reinforcement learning algorithms make decisions by building...

Please sign up or login with your details

Forgot password? Click here to reset