Learning Permutations with Sinkhorn Policy Gradient

05/18/2018
by   Patrick Emami, et al.
0

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean TSP, and matching tasks. We also observe that SPG is significantly more data efficient at the matching task than the baseline methods, which indicates that SPG is conducive to learning representations that are useful for reasoning about permutations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2017

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action...
research
04/10/2017

DeepPermNet: Visual Permutation Learning

We present a principled approach to uncover the structure of visual data...
research
09/15/2018

Sampled Policy Gradient for Learning to Play the Game Agar.io

In this paper, a new offline actor-critic learning algorithm is introduc...
research
10/09/2019

Investigation on the generalization of the Sampled Policy Gradient algorithm

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-criti...
research
10/26/2017

Reparameterizing the Birkhoff Polytope for Variational Permutation Inference

Many matching, tracking, sorting, and ranking problems require probabili...
research
01/24/2019

AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks

ShuffleNet is a state-of-the-art light weight convolutional neural netwo...
research
06/10/2019

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

In the context of learning deterministic policies in continuous domains,...

Please sign up or login with your details

Forgot password? Click here to reset