1 Introduction
This report is devoted to the continuous estimation of an assignment(Definition 1). Roughly speaking, an assignment between two sets and may be understood as a bijective mapping between a sub part of and a sub part of . The remaining elements of (not included in this mapping) are mapped onto an pseudo element of . We say that such elements are deleted. Conversely, the remaining elements of correspond to the image of the pseudo element of (Figure 1). We say that these elements are inserted.
Let us note that if and have the same size, the bijective mapping induced by an assignment may involve all elements of , each element being mapped onto a single element of . In this sense, an assignment is more general than a bijective mapping. Moreover, the main advantage of an assignment is that it provides us the freedom to not map any element which is then assigned to the element of or belong to the image of the element of . This last property allows us to reject some mappings if for example, these mappings are associated to a large cost.
An assignment function may be associated to an assignment matrix (Figure 1(b)) just like any bijective mapping is associated to a permutation matrix. Given two sets, and of respective sizes and , an assignment matrix is encoded by a matrix, where and play respectively the roles of the element of and the one of . The last column of index of such a matrix encodes the deletions while the last line encodes the insertions. By construction, there is a single in each of the first rows and columns, the remaining elements being set to .
Given and , one can define a cost matrix encoding the cost of the mapping of any element of onto an element of as well as the cost of deleting each element of and inserting each element of . Finding an assignment minimizing the sum of mappings, deletions and insertions costs is a direct extension of the Linear Sum Assignment Problem (LSAP) called the Linear Sum Assignment Problem with Edition [1] (LSAPE). Given an assignment matrix and a cost matrix , this cost may be formulated as:
where is taken over all assignment matrices.
We define in previous works [2, 1], an adaptation of the Hungarian algorithm which allows to find an optimal solution to the above problem in
. However, while providing an optimal solution, this algorithm does not readily allow the computation of the gradient of the associated operation. This last drawback, does not allow to easily insert such an algorithm into a deep learning pipeline. On the other hand, the Sinkhorn algorithm
[6], is based on a continuous relaxation of the problem where permutation matrices are replaced by bistochastic matrices with an entropic regularization. This algorithm is the workhorse of computational optimal transport [4] and is based on iterative matrix multiplications hereby allowing the backpropagation of the gradient [3]. The aim of this technical report is to transpose the results of the Sinkhorn algorithm toassignment matrices. Just like the Sinkhorn algorithm which does not provide a permutation matrix but rather a bistochastic matrix, our algorithm will provide an
bistochastic matrix (Definition 3). This last point may be of advantage within the Neural Network framework where the hard decisions corresponding to assignment matrices may not allow a proper propagation of the gradient.More formally, given a similarity matrix (which may be easily deduced from a cost matrix), we aim at finding two diagonal matrices and such that is a bistochastic matrix. Section 2 provides the main definitions and notations used in the remaining part of this report. The existence and uniqueness of a solution is demonstrated in Section 3 while Section 4 provides a constructive algorithm which convergence is demonstrated. Let us note that while Section 3 is a simple adaption of the original proof [6], Section 4 is significantly different from [6] since the arguments used for bistochastic matrices in the original proof do not hold for bistochastic matrices.
2 Definitions and notations
Definition 1 (assignment).
Let and be two strictly positive integers. An assignment is a mapping satisfying the following constraints:
where is the power set of .
Each element of is thus mapped onto a set composed of a single element of and in the same way the set of antecedents of each is reduced to one element . Hence the only element of which can be mapped onto a set composed of several elements is . In the same way, is the only element which may have several antecedents. The constraint ensures that is mapped to at least one element and that has at least an antecedent.
In the example of Figure 1 we have and . Elements 1, 2, 3 are respectively mapped onto . Where the last mapping corresponds to a deletion of 3 (which is mapped onto ). Consequently has two antecedents and .
Definition 2 (row/column stochastic matrix).
A non negative matrix is called an row stochastic matrix iff:
is called an column stochastic matrix iff:
Definition 3 (bistochastic matrix).
A non negative matrix is called an bistochastic matrix iff:
If , is called an assignment matrix and there is a onetoone mapping between assignments and assignment matrices.
Let us note that any bistochastic matrix is a bistochastic matrix on which the bistochastic constraints are relaxed on the last line and last column. So any squared bistochastic matrix is also an bistochastic matrix (the reverse being obviously false).
Definition 4.
diagonal
If is a matrix and an
assignment then the set
is called an diagonal
of corresponding to . if is squared and
is the identity, the diagonal is called the main diagonal.
Note that is a sequence (as is unique for while is a set). The above definition is a straightforward of the usual notion of diagonal where is required to be a permutation. In the following we will only consider diagonals of matrices which will be simply called diagonal.
Definition 5.
total support
If is a nonnegative matrix, is said to have total support if and if every positive element of lies on a positive diagonal. A nonnegative matrix that contains a positive diagonal is said to have a support.
If and define set of indices respectively contained in and then :

denotes the sub matrix of restricted to indices and ,

denotes the sub matrix of restricted to indices not contained in , i.e. and to the indices contained in ,

denotes the sub matrix of restricted to indices contained in and not contained in , i.e. .

denotes the sub matrix of restricted to the indices not contained in and .
Definition 6.
Secable rectangular matrix
A rectangular non negative matrix is said to be secable if one can find :

a partition of into two sets and and

a partition of into two sets and
such that:
Z  T  

X  A[X,Z]  0 
Y  0  A[Y,T] 
Let us note that this notion of secable matrix is quite close from the one of block diagonal matrix. However, is not required to be squared.
3 Existence and uniqueness
Theorem 3.1.
Let be a nonnegative matrix such that does not contain any line or column filled with 0. A necessary and sufficient condition that there exists an bistochastic matrix of the form where and are diagonal matrices with positive main diagonals and a last entry equal to is that has total support. If exists then it is unique. Also and are unique if and only if is non secable.
Proof.
Let us suppose that and are bistochastic matrices where , and . If and :
(1)  
(2)  
(3)  
(4) 
Let and put
Let us note that since we have , and . Moreover, we have by hypothesis and for all .
Let us fist show that .
In this case we consider the alternative definitions for and :
Since for all and for all and are non empty.
Using initial definitions for and , let us thus consider and . Then using (3):
where the last equality comes from (1). Similarly, using (4):
where the last equality comes from (2). Whence . But in this case, we have using (3):
This last equality is compatible with (1) only if for all . Dropping sub indices, we have for all and all . Thus
Hence if and . So . More concisely, we have: .
In the same way, implies using (4):
which is compatible with (2) only if for all . Thus for all and for all we have . Thus
On we have . Thus:
Hence
Moreover, for any we have:
In the same way, we have for any . But in this case using , we have for :
Thus which induces which imposes . Indeed, since , we have and .
In the same way for :
and we have in the same way for , . We then obtain for :
Using the previous equality , hence and (since .
Thus is an bistochastic matrix (where and plays the role of the last row and column respectively).
Let us briefly show that and are simultaneously empty or non empty. Let us fist suppose that and let us consider . Since by hypothesis, it exists such that . But since , we have and thus a contradiction. In the same way, if , let us consider . Since , it exists such that . Again a contradiction.
If is non secable the configuration where both and are non empty correspond to a partition of into and its complementary and a partition of into and its complementary with no connections between and the complementary of nor any connection between and the complementary of (Figure 2). Such a decomposition being refused, we have and Hence and and and are unique ().
If the non secable property of does not hold and and exist, and exist, include the row and the column , are bistochastic matrices and have a size lower than the one of . Furthermore, and where and have like and (from which they are derived) a positive main diagonal with a at last position. The argument may be repeated on these submatrices until is established. Given that , we already know that and that and hence and are zeros elsewhere (Figure 2). Hence is equal to . Note however, that since (and the same for ) and are no longer unique.
∎
4 A constructive algorithm
For any and any let us consider the series and defined as follows:
Moreover we also define:
with for all :
Let us denote by the matrix whose entries in are equal to and whose last row is filled with zeros but a 1 at position .
In the same way let us denote by the matrix whose entries in are equal to and whose last column is filled with zeros except a at position .
is row stochastic. Indeed, for any and :
and the last line of contains a single entry equal to .
Moreover, is column stochastic for . Indeed for each :
Note that is not column stochastic. One noticeable effect of this negative property is that while .
Moreover the last column contains a single positive entry equal to . Hence is row stochastic and equation 5 involves two row stochastic matrices.
Combining both equations of 5 we have:
where the inverse notation applied to a vector denotes the elementwise inverse operation.
Since is row stochastic, we have:
Using we obtain:
where is the elementwise product also known as Hadamard product. Since, for , is row stochastic we have and thus: