1 Introduction
2 Problem Formulation
We begin with formalizing the problem of matching two sets of features and with different sizes and such that . In what follows we assume that the observed features are randomly generated from the model
(1) 
where

and are two collections of vectors from , corresponding to the original features, which are unavailable,

are positive real numbers corresponding to the levels of noise contaminating each feature,

and
are two independent sets of i.i.d. random vectors drawn from the Gaussian distribution with zero mean and identity covariance matrix.
In the most generic setting both and may contain outliers, i.e. features that don’t have their corresponding pair. In other words, the task is to find two sets of features and of cardinality such that and , where the parameter gives a correspondence between these sets. In this case there are outliers and the value of is unknown. Thus, for simplicity we only consider the case where , hence only the larger set of features, namely , contains outliers.
To formalize the task of feature matching with outliers we aim to find an injection such that
(2) 
where is an equivalence relation that we call matching criterion. We assume that there exists an injection such that we have and .
In what follows we call a feature an outlier, if , where we have that . Let us also define , for which we have . Consequently, in addition to the features , the rest of the features from are called inliers.
In this formulation, the data generating distribution is defined by the (unknown) parameters^{1}^{1}1We omit the set of parameters and , since they are automatically contained in and , respectively due to the fact that is included in the set of parameters. , and . In the problem of matching, we focus our attention on the problem of estimating the parameter only, considering and as nuisance parameters. In what follows, we denote by
the probability distribution of the vector
defined by (1) under the condition (2). We write for the expectation with respect to .We use the binary loss function, which equals
, if and otherwise, i.e. 0 1 distance between and , given by(3) 
Our ultimate goal is to design estimators that have an expected error smaller than a prescribed level under the weakest possible conditions on the nuisance parameter and noise level . The problem of matching becomes more difficult with hardly distinguishable features. To quantify this phenomenon, we introduce the relative separation distance and the relative outlier separation distance , which measure the minimal distancetonoise ratio between inlier features and the minimal distancetonoise ratio between inlier and outlier features, respectively. Denoting the precise definitions read as
(4) 
Notice that can be rewritten as
and the dependence of on and is redundant. Hence, we will refer to as to ease the notation.
Clearly, if (or, ) and ’s are all equal, then the parameter is nonidentifiable, in the sense that there exist two different permutations and such that the distributions and coincide. Therefore, the conditions are necessary for the existence of consistent estimators of . Furthermore, good estimators are those consistently estimating even if and are small.
In the setting of permutation estimation, i.e. when there are no outliers in both and , (collier2016minimax) established the optimal minimax rate of relative separation distance . The minimax rate is defined as
where is the symmetric group over the set . The minimax rate is proved to be proportional to . Since this rate is optimal we establish a threshold such that for all and given estimator we have
The minimax outlier separation distance is the smallest possible outlier separation distance achieved by an estimator , i.e.
(5) 
where the infimum is taken over all possible estimators of .
In what follows we devote our main attention to the case of heteroscedastic noise scenario when the
’s can be different.3 Main results
The general approach here is based on maximum likelihood estimation. Writing the loglikelihood for the vector yields
3.1 Upper bound for
In this section we obtain an upper bound for
Further, we consider the case of known noise variances, i.e. we assume that all
’s are known.3.1.1 Known variance
In the case of known ’s the estimator of reads as
(6) 
where the minimum is taken over all injective functions mapping from to .
The next theorem states that if both and
are large enough then the probability of finding the correct matching using LSNS method is at least
.Theorem 1 (Upper bound for LSNS)
Let and condition (2) be fulfilled. If and then
(7) 

We prove the upper bound for in the presence of outliers. Without loss of generality we can assume that . We wish to bound the probability of the event , where . It is evident that
where
Denote and
(8) Since for every , then
(9) Similarly, for every and ,
On the event from the previous inequality we have
(10) where
Hence, combining obtained bounds (9) and (10) we get that
(11) Using (11) we can show, that
(12) For suitably chosen standard Gaussian random variables
it holds that . Therefore, using the tail bound for the standard Gaussian distribution and the union bound, we get(13) To bound the random variable , we use the following result. [LaurentMassart2000, Eq. (4.3) and (4.4)] If
is drawn from the chisquared distribution
, where , then, for every ,As a consequence, , . This inequality, combined with the union bound, yields
(14) Using inequalities (12)(14), we get that
which implies
3.1.2 Unknown variance
In the setup when the observation variances are unknown the minimization problem displayed in (6) can be further minimized with respect to parameters and , taking into account the constraint (2). This readily yields
(15) 
Theorem 2 (Upper bound for LSL)
Let then for we have
(16) 

In the same spirit as in the proof of Theorem 1 we have the event of wrong matching denoted by with . Notice that
(17) where
(18) (19) For the same random variables and defined in (8) we can upper bound the term and lower bound as follows
and
Further, on the event from the last display we get that
with defined as
Therefore, combining these bounds we get that
which implies
where
Notice that the term is bounded, namely it is easy to show that for any combination of and involved in the definition of . Similarly, it can be verified that . Hence, combining both bounds obtained from (3.1.2) and taking to be greater than we get that if
then .
3.1.3 Outlier detection
In this section we show that the outlier detection accuracy depends only on the quantity
. In Theorem 1 we showed that if both and are large enough then the LSNS procedure finds the correct matching with high probability.Assume that for two distinct inliers the probability of the mismatch between two pairs and is larger than some universal constant . Considering the LSS procedure the last statement formally reads as
(20) 
Then,
(21) 
TODO: put

First notice that the condition
can be rewritten as follows
(22) Under the assumption that for all the inequality from (22) reads as
where and . We introduce the normalized version of , denoted by :
(23) where . Due to the independence of and we can decompose the sum in (23) in the following way
(24) where . The second term from the last display can be dealt as follows
where and are independent and . The independence of and follows from the fact that . Thus, on the event we have
(25) (26) where is a standard normal random variable. From the last insertion we derive the upper bound for , by using the lower bound for the complementary error function.^{2}^{2}2proof or citation needed.
(27)
Taking
3.2 Lower bounds for
4 Numerical results
We performed several numerical experiments to corroborate our theoretical results. All experiments were implemented using python (numpy). For linear sum assignment problem we used the generalized Hungarian algorithm implemented in scipy library. The experiment was carried out in the following manner. First we generate (the original matching) and (note that can be derived using and ). is sampled from Gaussian distribution with 0 mean and
variance (which is also sampled randomly from uniform distribution on
). Additionally, for every for which ( is an outlier) we increment every coordinate of by . (also ) is sampled from uniform distribution over . Afterwards we generate and according to the procedure described in section 2. We try to solve the problem with four aforementioned algorithms (Greedy, LSS, LSNS, LSL). The results are summed up in Figure 1. In all experiments and . TODO: Add explanation of obtained results.TODO: Average the plots over 50 independent trials.
Acknowledgments
This work was partially supported by the grants Investissements d’Avenir (ANR11IDEX0003/Labex Ecodec/ANR11LABX0047) and CALLISTO. The authors thank the Reviewers for many valuable suggestions.