Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

05/29/2019
by   Hyunwook Kang, et al.
0

Can the success of reinforcement learning methods for combinatorial optimization problems be extended to multi-robot scheduling problems in stochastic contexts? Three issues are particularly important in this context: quality of the resulting decisions, scalability, and transferability. To achieve these ends we generalize the concept of clique potential to stochastic clique potential. We extend a mean field inference fixed point iteration with this new concept and use it to modify thestructure2vec method. We next propose a new reinforcement learning framework combining a graph representation of the problem and a consensus auction inspired by heuristics in the problem domain. This representation enables transferability in terms of the number of robots. Sequential encoding of information through multiple layers of our extended structure2vec results in 96 While training tractability is inherited from single robot methods in the literature, use of a multi-robot consensus auction-based relaxation of the maximum operation in the Bellman optimality equation allows for scalable selection of actions in the fitted Q-iteration. We apply our framework to multi-robot reward collection (MRRC) problems in stochastic environments with linear or non-linear rewards. In stochastic environments with non-linear rewards, the new method achieves 20 popular sequential greedy assignment (SGA) algorithm. Linear scalability in terms of training is achieved and demonstrated. Transferability is demonstrated by the use of a heuristic trained with three robots that continues to achieve 95 robots. We further mention the results obtained when extending the approach to identical parallel machine scheduling(IPMS) problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset