Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems

05/31/2022
by   Nasrin Sultana, et al.
0

Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs <cit.> <cit.> <cit.>. However, these approaches suffer from two key challenges when applied to combinatorial problems: insufficient exploration and the requirement of many training examples of the search space to achieve reasonable performance. Combinatorial optimisation can be complex, characterised by search spaces with many optimas and large spaces to search and learn. Therefore, a new method is needed to find good solutions that are more efficient by being more sample efficient. This paper presents a new reinforcement learning approach that is based on entropy. In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return and improves the sample efficiency to achieve faster learning during training time. We systematically evaluate our approach on a range of route optimisation tasks typically used to evaluate learning-based optimisation, such as the such as the Travelling Salesman problems (TSP), Capacitated Vehicle Routing Problem (CVRP). In this paper, we show that our model can generalise to various route problems, such as the split-delivery VRP (SDVRP), and compare the performance of our method with that of current state-of-the-art approaches. The Empirical results show that the proposed method can improve on state-of-the-art methods in terms of solution quality and computation time and generalise to problems of different sizes.

READ FULL TEXT

page 17

page 18

page 19

page 20

page 22

page 23

page 26

page 27

research
12/24/2020

Learning Vehicle Routing Problems using Policy Optimisation

Deep reinforcement learning (DRL) has been used to learn effective heuri...
research
06/14/2022

Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT

The vehicle routing problem is a well known class of NP-hard combinatori...
research
07/22/2023

Unlocking Carbon Reduction Potential with Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Heavy goods vehicles are vital backbones of the supply chain delivery sy...
research
10/22/2020

Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning

The future of mobility-as-a-Service (Maas)should embrace an integrated s...
research
06/14/2018

Maximum a Posteriori Policy Optimisation

We introduce a new algorithm for reinforcement learning called Maximum a...
research
12/19/2020

Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems

We present a novel deep reinforcement learning method to learn construct...
research
11/11/2021

BOiLS: Bayesian Optimisation for Logic Synthesis

Optimising the quality-of-results (QoR) of circuits during logic synthes...

Please sign up or login with your details

Forgot password? Click here to reset