A new Sinkhorn algorithm with Deletion and Insertion operations

by   Luc Brun, et al.

This technical report is devoted to the continuous estimation of an epsilon-assignment. Roughly speaking, an epsilon assignment between two sets V1 and V2 may be understood as a bijective mapping between a sub part of V1 and a sub part of V2 . The remaining elements of V1 (not included in this mapping) are mapped onto an epsilon pseudo element of V2 . We say that such elements are deleted. Conversely, the remaining elements of V2 correspond to the image of the epsilon pseudo element of V1. We say that these elements are inserted. As a result our method provides a result similar to the one of the Sinkhorn algorithm with the additional ability to reject some elements which are either inserted or deleted. It thus naturally handles sets V1 and V2 of different sizes and decides mappings/insertions/deletions in a unified way. Our algorithms are iterative and differentiable and may thus be easily inserted within a backpropagation based learning framework such as artificial neural networks.


page 1

page 2

page 3

page 4


Replicated Computational Results (RCR) Report for "Code Generation for Generally Mapped Finite Elements"

"Code Generation for Generally Mapped Finite Elements" includes performa...

Differentiable Matrix Elements with MadJax

MadJax is a tool for generating and evaluating differentiable matrix ele...

Finding Small Multi-Demand Set Covers with Ubiquitous Elements and Large Sets is Fixed-Parameter Tractable

We study a variant of Set Cover where each element of the universe has s...

High order transition elements: The xNy-element concept – Part I: Statics

Advanced transition elements are of utmost importance in many applicatio...

Perfect Codes for Generalized Deletions from Minuscule Elements of Weyl Groups

This paper discusses a connection between insertion/deletion (ID) codes ...

Tesselating a Pascal-like tetrahedron for the subdivision of high order tetrahedral finite elements

Three-dimensional N^th order nodal Lagrangian tetrahedral finite element...

Similar Elements and Metric Labeling on Complete Graphs

We consider a problem that involves finding similar elements in a collec...

1 Introduction

This report is devoted to the continuous estimation of an -assignment(Definition 1). Roughly speaking, an -assignment between two sets and may be understood as a bijective mapping between a sub part of and a sub part of . The remaining elements of (not included in this mapping) are mapped onto an pseudo element of . We say that such elements are deleted. Conversely, the remaining elements of correspond to the image of the pseudo element of (Figure 1). We say that these elements are inserted.

Figure 1: (a) An example of -assignment function. 1 is mapped onto b, 2 onto a, 3 is deleted. (b) its associated assignment matrix

Let us note that if and have the same size, the bijective mapping induced by an -assignment may involve all elements of , each element being mapped onto a single element of . In this sense, an -assignment is more general than a bijective mapping. Moreover, the main advantage of an -assignment is that it provides us the freedom to not map any element which is then assigned to the element of or belong to the image of the element of . This last property allows us to reject some mappings if for example, these mappings are associated to a large cost.

An -assignment function may be associated to an -assignment matrix (Figure 1(b)) just like any bijective mapping is associated to a permutation matrix. Given two sets, and of respective sizes and , an -assignment matrix is encoded by a matrix, where and play respectively the roles of the element of and the one of . The last column of index of such a matrix encodes the deletions while the last line encodes the insertions. By construction, there is a single in each of the first rows and columns, the remaining elements being set to .

Given and , one can define a cost matrix encoding the cost of the mapping of any element of onto an element of as well as the cost of deleting each element of and inserting each element of . Finding an -assignment minimizing the sum of mappings, deletions and insertions costs is a direct extension of the Linear Sum Assignment Problem (LSAP) called the Linear Sum Assignment Problem with Edition [1] (LSAPE). Given an -assignment matrix and a cost matrix , this cost may be formulated as:

where is taken over all -assignment matrices.

We define in previous works [2, 1], an adaptation of the Hungarian algorithm which allows to find an optimal solution to the above problem in

. However, while providing an optimal solution, this algorithm does not readily allow the computation of the gradient of the associated operation. This last drawback, does not allow to easily insert such an algorithm into a deep learning pipeline. On the other hand, the Sinkhorn algorithm 

[6], is based on a continuous relaxation of the problem where permutation matrices are replaced by bi-stochastic matrices with an entropic regularization. This algorithm is the workhorse of computational optimal transport [4] and is based on iterative matrix multiplications hereby allowing the backpropagation of the gradient [3]. The aim of this technical report is to transpose the results of the Sinkhorn algorithm to

assignment matrices. Just like the Sinkhorn algorithm which does not provide a permutation matrix but rather a bi-stochastic matrix, our algorithm will provide an

bi-stochastic matrix (Definition 3). This last point may be of advantage within the Neural Network framework where the hard decisions corresponding to -assignment matrices may not allow a proper propagation of the gradient.

More formally, given a similarity matrix (which may be easily deduced from a cost matrix), we aim at finding two diagonal matrices and such that is a bi-stochastic matrix. Section 2 provides the main definitions and notations used in the remaining part of this report. The existence and uniqueness of a solution is demonstrated in Section 3 while Section 4 provides a constructive algorithm which convergence is demonstrated. Let us note that while Section 3 is a simple adaption of the original proof [6], Section 4 is significantly different from [6] since the arguments used for bi-stochastic matrices in the original proof do not hold for bi-stochastic matrices.

2 Definitions and notations

Definition 1 (-assignment).

Let and be two strictly positive integers. An -assignment is a mapping satisfying the following constraints:

where is the power set of .

Each element of is thus mapped onto a set composed of a single element of and in the same way the set of antecedents of each is reduced to one element . Hence the only element of which can be mapped onto a set composed of several elements is . In the same way, is the only element which may have several antecedents. The constraint ensures that is mapped to at least one element and that has at least an antecedent.

In the example of Figure 1 we have and . Elements 1, 2, 3 are respectively mapped onto . Where the last mapping corresponds to a deletion of 3 (which is mapped onto ). Consequently has two antecedents and .

Definition 2 (-row/column stochastic matrix).

A non negative matrix is called an -row stochastic matrix iff:

is called an -column stochastic matrix iff:

Definition 3 (-bi-stochastic matrix).

A non negative matrix is called an -bi-stochastic matrix iff:

If , is called an -assignment matrix and there is a one-to-one mapping between -assignments and -assignment matrices.

Let us note that any -bi-stochastic matrix is a bi-stochastic matrix on which the bi-stochastic constraints are relaxed on the last line and last column. So any squared bi-stochastic matrix is also an -bi-stochastic matrix (the reverse being obviously false).

Definition 4.

If is a matrix and an -assignment then the set is called an -diagonal of corresponding to . if is squared and is the identity, the diagonal is called the main diagonal.

Note that is a sequence (as is unique for while is a set). The above definition is a straightforward of the usual notion of diagonal where is required to be a permutation. In the following we will only consider -diagonals of matrices which will be simply called diagonal.

Definition 5.

total support  
If is a nonnegative matrix, is said to have total support if and if every positive element of lies on a positive -diagonal. A nonnegative matrix that contains a positive diagonal is said to have a support.

If and define set of indices respectively contained in and then :

  • denotes the sub matrix of restricted to indices and ,

  • denotes the sub matrix of restricted to indices not contained in , i.e. and to the indices contained in ,

  • denotes the sub matrix of restricted to indices contained in and not contained in , i.e. .

  • denotes the sub matrix of restricted to the indices not contained in and .

Definition 6.

Secable rectangular matrix  

A rectangular non negative matrix is said to be secable if one can find :

  • a partition of into two sets and and

  • a partition of into two sets and

such that:

X A[X,Z] 0
Y 0 A[Y,T]

Let us note that this notion of secable matrix is quite close from the one of block diagonal matrix. However, is not required to be squared.

3 Existence and uniqueness

Theorem 3.1.

Let be a nonnegative matrix such that does not contain any line or column filled with 0. A necessary and sufficient condition that there exists an bi-stochastic matrix of the form where and are diagonal matrices with positive main diagonals and a last entry equal to is that has total support. If exists then it is unique. Also and are unique if and only if is non secable.


Let us suppose that and are -bi-stochastic matrices where , and . If and :


Let and put

Let us note that since we have , and . Moreover, we have by hypothesis and for all .

Let us fist show that .

If , then For all . Then using equation (3), we have for any :

Hence for all and thus .

Conversely, if , we have for all . Using equation (4) for :

Hence, for all and .

In this case we consider the alternative definitions for and :

Since for all and for all and are non empty.

Using initial definitions for and , let us thus consider and . Then using (3):

where the last equality comes from (1). Similarly, using (4):

where the last equality comes from (2). Whence . But in this case, we have using (3):

This last equality is compatible with (1) only if for all . Dropping sub indices, we have for all and all . Thus

Hence if and . So . More concisely, we have: .

In the same way, implies using (4):

which is compatible with (2) only if for all . Thus for all and for all we have . Thus

On we have . Thus:


Moreover, for any we have:

In the same way, we have for any . But in this case using , we have for :

Thus which induces which imposes . Indeed, since , we have and .

In the same way for :

and we have in the same way for , . We then obtain for :

Using the previous equality , hence and (since .

Thus is an bi-stochastic matrix (where and plays the role of the last row and column respectively).

Let us briefly show that and are simultaneously empty or non empty. Let us fist suppose that and let us consider . Since by hypothesis, it exists such that . But since , we have and thus a contradiction. In the same way, if , let us consider . Since , it exists such that . Again a contradiction.

If is non secable the configuration where both and are non empty correspond to a partition of into and its complementary and a partition of into and its complementary with no connections between and the complementary of nor any connection between and the complementary of (Figure 2). Such a decomposition being refused, we have and Hence and and and are unique ().

If the non secable property of does not hold and and exist, and exist, include the row and the column , are bi-stochastic matrices and have a size lower than the one of . Furthermore, and where and have like and (from which they are derived) a positive main diagonal with a at last position. The argument may be repeated on these submatrices until is established. Given that , we already know that and that and hence and are zeros elsewhere (Figure 2). Hence is equal to . Note however, that since (and the same for ) and are no longer unique.

Figure 2: Decomposition of matrix .

4 A constructive algorithm

For any and any let us consider the series and defined as follows:

Moreover we also define:

with for all :

Let us denote by the matrix whose entries in are equal to and whose last row is filled with zeros but a 1 at position .

In the same way let us denote by the matrix whose entries in are equal to and whose last column is filled with zeros except a at position .

If and

denote the vectors encoding respectively

and we have for :

Figure 3: and matrices.

is row stochastic. Indeed, for any and :

and the last line of contains a single entry equal to .

Moreover, is column stochastic for . Indeed for each :

Note that is not column stochastic. One noticeable effect of this negative property is that while .

Moreover the last column contains a single positive entry equal to . Hence is row stochastic and equation 5 involves two row stochastic matrices.

Combining both equations of 5 we have:

where the inverse notation applied to a vector denotes the element-wise inverse operation.

Since is row stochastic, we have:

Using we obtain:

where is the element-wise product also known as Hadamard product. Since, for , is row stochastic we have and thus: