# Optimal detection of the feature matching map in presence of noise and outliers

We consider the problem of finding the matching map between two sets of d dimensional vectors from noisy observations, where the second set contains outliers. The matching map is then an injection, which can be consistently estimated only if the vectors of the second set are well separated. The main result shows that, in the high-dimensional setting, a detection region of unknown injection can be characterized by the sets of vectors for which the inlier-inlier distance is of order at least d^1/4 and the inlier-outlier distance is of order at least d^1/2. These rates are achieved using the estimated matching minimizing the sum of logarithms of distances between matched pairs of points. We also prove lower bounds establishing optimality of these rates. Finally, we report results of numerical experiments on both synthetic and real world data that illustrate our theoretical results and provide further insight into the properties of the estimators studied in this work.

• 2 publications
• 3 publications
• 3 publications
11/24/2019

### On Maximum-Sum Matchings of Points

Huemer et al. (Discrete Mathematics, 2019) proved that for any two point...
10/24/2018

### Approximate Minimum-Weight Matching with Outliers under Translation

Our goal is to compare two planar point sets by finding subsets of a giv...
02/12/2019

### Minimax rates in outlier-robust estimation of discrete models

We consider the problem of estimating the probability distribution of a ...
02/22/2019

### Matching points with disks with a common intersection

We consider matchings with diametral disks between two sets of points R ...
07/20/2020

### Integrating Network Embedding and Community Outlier Detection via Multiclass Graph Description

Network (or graph) embedding is the task to map the nodes of a graph to ...
06/22/2018

### Smart Inverter Grid Probing for Learning Loads: Part II - Probing Injection Design

This two-part work puts forth the idea of engaging power electronics to ...
10/04/2018

### On the Inner Product Predicate and a Generalization of Matching Vector Families

Motivated by cryptographic applications such as predicate encryption, we...

## 2 Problem Formulation

We begin with formalizing the problem of matching two sets of features and with different sizes and such that . In what follows we assume that the observed features are randomly generated from the model

 {Xi=θi+σiξi ,X\tt\#j=θ\tt\#j+σ\tt\#jξ\tt\#j,i=1,…,n and j=1,…,m (1)

where

• and are two collections of vectors from , corresponding to the original features, which are unavailable,

• are positive real numbers corresponding to the levels of noise contaminating each feature,

• and

are two independent sets of i.i.d. random vectors drawn from the Gaussian distribution with zero mean and identity covariance matrix.

In the most generic setting both and may contain outliers, i.e. features that don’t have their corresponding pair. In other words, the task is to find two sets of features and of cardinality such that and , where the parameter gives a correspondence between these sets. In this case there are outliers and the value of is unknown. Thus, for simplicity we only consider the case where , hence only the larger set of features, namely , contains outliers.

To formalize the task of feature matching with outliers we aim to find an injection such that

 ∀ i∈{1,…,n},θi≡θ\tt\#π∗(i), (2)

where is an equivalence relation that we call matching criterion. We assume that there exists an injection such that we have and .

In what follows we call a feature an outlier, if , where we have that . Let us also define , for which we have . Consequently, in addition to the features , the rest of the features from are called inliers.

In this formulation, the data generating distribution is defined by the (unknown) parameters111We omit the set of parameters and , since they are automatically contained in and , respectively due to the fact that is included in the set of parameters. , and . In the problem of matching, we focus our attention on the problem of estimating the parameter only, considering and as nuisance parameters. In what follows, we denote by

the probability distribution of the vector

defined by (1) under the condition (2). We write for the expectation with respect to .

We use the binary loss function, which equals

, if and otherwise, i.e. 0 -1 distance between and , given by

 δ0\,-1(^π,π∗)≜1{^π≠π∗}. (3)

Our ultimate goal is to design estimators that have an expected error smaller than a prescribed level under the weakest possible conditions on the nuisance parameter and noise level . The problem of matching becomes more difficult with hardly distinguishable features. To quantify this phenomenon, we introduce the relative separation distance and the relative outlier separation distance , which measure the minimal distance-to-noise ratio between inlier features and the minimal distance-to-noise ratio between inlier and outlier features, respectively. Denoting the precise definitions read as

 ¯κ(θ\#,σ\tt% \#,π∗)≜minπ∗(i)≠j,i∈[n],j∈Jπ∗∥θi−θ\tt% \#j∥(σ2i+(σ\tt\#j)2)1/2,¯κout(θ\#,σ\tt\#,π∗)≜mini∈[n],j∈Oπ∗∥θi−θ\tt\#j∥(σ2i+(σ\tt\#j)2)1/2. (4)

Notice that can be rewritten as

 ¯κ(θ,σ)≜mini,j∈[n]i≠j∥θi−θj∥(σ2i+σ2j)1/2

and the dependence of on and is redundant. Hence, we will refer to as to ease the notation.

Clearly, if (or, ) and ’s are all equal, then the parameter is nonidentifiable, in the sense that there exist two different permutations and such that the distributions and coincide. Therefore, the conditions are necessary for the existence of consistent estimators of . Furthermore, good estimators are those consistently estimating even if and are small.

In the setting of permutation estimation, i.e. when there are no outliers in both and , (collier2016minimax) established the optimal minimax rate of relative separation distance . The minimax rate is defined as

where is the symmetric group over the set . The minimax rate is proved to be proportional to . Since this rate is optimal we establish a threshold such that for all and given estimator we have

 ¯καout(^π)=inf{κ>0 | maxπsup¯κout(θ\#,σ\tt\#,π∗)>κPθ\#,σ\tt\#,π(^π≠π)≤α}.

The minimax outlier separation distance is the smallest possible outlier separation distance achieved by an estimator , i.e.

 ¯καout≜inf^π ¯καout(^π), (5)

where the infimum is taken over all possible estimators of .

In what follows we devote our main attention to the case of heteroscedastic noise scenario when the

’s can be different.

## 3 Main results

The general approach here is based on maximum likelihood estimation. Writing the log-likelihood for the vector yields

 −ln(θ\#,σ\tt\#,π,X,X\tt\#) =n∑i=1(∥Xi−θi∥222σ2i+12log(σ2i)) +m∑j=1(∥X\tt\#j−θ\tt\#j∥222(σ\tt\#j)2+12log((σ\tt\#j)2))

### 3.1 Upper bound for ¯κout

In this section we obtain an upper bound for

Further, we consider the case of known noise variances, i.e. we assume that all

’s are known.

#### 3.1.1 Known variance

In the case of known ’s the estimator of reads as

 ¯πLSNS≜argminπ:[n]→[m]n∑i=1∥Xi−X\tt\#π(i)∥2σ2i+σ\tt\#π(i)2, (6)

where the minimum is taken over all injective functions mapping from to .

The next theorem states that if both and

are large enough then the probability of finding the correct matching using LSNS method is at least

.

###### Theorem 1 (Upper bound for LSNS)

Let and condition (2) be fulfilled. If and then

 Pθ\#,σ\tt\#,π∗(^π≠π∗)≤α. (7)
• We prove the upper bound for in the presence of outliers. Without loss of generality we can assume that . We wish to bound the probability of the event , where . It is evident that

 Ω⊂⋃π≠π∗Ωπ,

where

 Ωπ ={n∑i=1∥Xi−X\tt\#i∥22σ2i≥n∑i=1∥Xi−X\tt\#π(i)∥2σ2i+(σ\tt\#π(i))2}

Denote and

 ζ1=maxi≠j∣∣∣(θi−θ%#j)⊤(σiξi−σ\tt\#jξ\tt% \#j)∥θi−θ\tt\#j∥σi,j∣∣∣,ζ2=d−1/2maxi,j∣∣∣∥∥∥σiξi−σ\tt\#jξ\tt\#jσi,j∥∥∥2−d∣∣∣. (8)

Since for every , then

 ∥Xi−X\tt\#i∥2=σ2i∥ξi−ξ\tt\#i∥2≤2σ2i(d+√dζ2). (9)

Similarly, for every and ,

 ∥Xi−X\tt\#j∥2 =∥θi−θ\tt\#j∥2+∥σiξi−σ\tt\#jξ\tt\#j∥2+2(θi−θ\tt\#j)⊤(σiξi−σ\tt\#jξ\tt\#j) ≥∥θi−θ\tt\#j∥2+σ2i,j(d−√dζ2)−2∥θi−θ\tt\#j∥σi,jζ1.

On the event from the previous inequality we have

 ∥Xi−X\tt\#j∥2σ2i,j≥¯κ(θ,σ)2−2¯κ(θ,σ)ζ1+d−√dζ2, (10)

where

 ¯κ(θ,σ)=mini∈[n],j∈[m]i≠j∥θi−θ\tt\#j∥σi,j.

Hence, combining obtained bounds (9) and (10) we get that

 Ω∩Ω1⊂{d+√dζ2≥¯κ(θ,σ)2−2¯κ(θ,σ)ζ1+d−√dζ2}. (11)

Using (11) we can show, that

 P(Ω) ≤P(Ω∁1)+P(Ω∩Ω1) ≤P(ζ1≥¯κ(θ,σ))+P(2√dζ2+2¯κ(θ,σ)ζ1≥¯κ(θ,σ)2) ≤2P(ζ1≥¯κ(θ,σ)4)+P(ζ2≥¯κ(θ,σ)24√d). (12)

For suitably chosen standard Gaussian random variables

it holds that . Therefore, using the tail bound for the standard Gaussian distribution and the union bound, we get

 P(ζ1≥14¯κ(θ,σ)) (13)

To bound the random variable , we use the following result. [LaurentMassart2000, Eq. (4.3) and (4.4)] If

is drawn from the chi-squared distribution

, where , then, for every ,

 { P(Y−D≤−2√Dx)≤e−x, P(Y−D≥2√Dx+2x)≤e−x.

As a consequence, , . This inequality, combined with the union bound, yields

 P(ζ2≥¯κ(θ,σ)24√d) ≤2nmexp{−(¯κ(θ,σ)/16)2d(¯κ2(θ,σ)∧8d)}. (14)

Using inequalities (12)-(14), we get that

which implies

#### 3.1.2 Unknown variance

In the setup when the observation variances are unknown the minimization problem displayed in (6) can be further minimized with respect to parameters and , taking into account the constraint (2). This readily yields

 ¯πLSL≜argminπ:[n]→[m]n∑i=1log∥Xi−X\tt\#π(i)∥2. (15)
###### Theorem 2 (Upper bound for LSL)

Let then for we have

 (16)
• In the same spirit as in the proof of Theorem 1 we have the event of wrong matching denoted by with . Notice that

 Ω∈⋃π≠π∗Ωπ, (17)

where

 Ωπ ={n∑i=1log∥Xi−X\tt\#i∥2≥n∑i=1log∥Xi−X\tt\#π(i)∥2} (18) ⊂n⋃i=1⋃j∈[m]∖{i}{log∥Xi−X\tt\#i∥2≥log∥Xi−X\tt\#j∥2} (19)

For the same random variables and defined in (8) we can upper bound the term and lower bound as follows

 ∥Xi−X\tt\#i∥2≤2σ2i(d+√dζ2)

and

 ∥Xi−X\tt\#j∥2≥∥θi−θ\tt\#j∥2+σ2i,j(d−√dζ2)−2ζ1∥θi−θ\tt\#j∥σi,j.

Further, on the event from the last display we get that

 ∥Xi−X\tt\#j∥2σ2i,j≥¯κ(θ,σ)2−2ζ1¯κ(θ,σ)+d−√dζ2

with defined as

 ¯κ(θ,σ)=mini∈[n],j∈[m]i≠j∥θi−θ\tt\#j∥σi,j.

Therefore, combining these bounds we get that

 Ω∩Ω1⊂{2σ2i(d+√dζ2)≥σ2i,j(¯κ(θ,σ)2−2ζ1¯κ(θ,σ)+d−√dζ2)},

which implies

 P(Ω) ≤P(Ωc1)+P(Ω∩Ω1) ≤P(ζ1≥¯κ(θ,σ))+P(√dζ2(3σ2i+σ2j)+2σ2i,jζ1¯κ(θ,σ)≥σ2i,j¯κ(θ,σ)2+d(σ2j−σ2i)) ≤2P(ζ1≥14¯κ(θ,σ))+P(ζ2≥¯κ(θ,σ)A(σ)+√d|B(σ)|),

where

 A(σ)=σ2i+σ2j2(3σ2i+σ2j),B(σ)=σ2j−σ2i3σ2i+σ2j.

Notice that the term is bounded, namely it is easy to show that for any combination of and involved in the definition of . Similarly, it can be verified that . Hence, combining both bounds obtained from (3.1.2) and taking to be greater than we get that if

 ¯κ(θ,σ)2≥¯κ2≜d/2+(√2dlog8nmα∨32log4nmα),

then .

#### 3.1.3 Outlier detection

In this section we show that the outlier detection accuracy depends only on the quantity

. In Theorem 1 we showed that if both and are large enough then the LSNS procedure finds the correct matching with high probability.

Assume that for two distinct inliers the probability of the mismatch between two pairs and is larger than some universal constant . Considering the LSS procedure the last statement formally reads as

 P(∥Xi−X\tt\#j∥2+∥Xj−X% \tt\#i∥2≤∥Xi−X\tt\#i∥2+∥Xj−X\tt\#% j∥2)≥c. (20)

Then,

 κi,j≜∥θi−θj∥√σ2i+σ2j≤2√log2c+1.5. (21)

TODO: put

• First notice that the condition

 ∥Xi−X\tt\#j∥2+∥Xj−X\tt\#i∥2≤∥Xi−X\tt\#i∥2+∥Xj−X\tt\#j∥2

can be rewritten as follows

 (Xi−Xj)⊤(X\tt\#i−X\tt\#j)≤0. (22)

Under the assumption that for all the inequality from (22) reads as

 Dij≜∥θi−θj∥2+(ηij+η\tt\#ij)⊤(θi−θj)+η⊤ijη% \tt\#ij≤0,

where and . We introduce the normalized version of , denoted by :

 ¯Dij=Dijσ2i+σ2j=κ2ij+(ηij+η\tt\#ij)⊤(θi−θj)σ2i,j+η⊤ijη\tt\#ijσ2i,j, (23)

where . Due to the independence of and we can decompose the sum in (23) in the following way

 ¯Dij=ζ1+η⊤ijη\tt\#ijσ2i,j, (24)

where . The second term from the last display can be dealt as follows

 ζ2 ≜η⊤ijη\tt\#ijσ2i,j=14σ2i,j[(ηij+η\tt\#ij)2−(ηij−η\tt\#ij)2] ∼12(Q−Q\#),

where and are independent and . The independence of and follows from the fact that . Thus, on the event we have

 P(¯D12≤0) ≥P(ζ1+log1δ+2√log1δ≤0) (25) =P(Z0≥κ2ij+log1δ+2√log1δκij√2)≥c, (26)

where is a standard normal random variable. From the last insertion we derive the upper bound for , by using the lower bound for the complementary error function.222proof or citation needed.

 e≥c. (27)

Taking

## 4 Numerical results

We performed several numerical experiments to corroborate our theoretical results. All experiments were implemented using python (numpy). For linear sum assignment problem we used the generalized Hungarian algorithm implemented in scipy library. The experiment was carried out in the following manner. First we generate (the original matching) and (note that can be derived using and ). is sampled from Gaussian distribution with 0 mean and

variance (which is also sampled randomly from uniform distribution on

). Additionally, for every for which ( is an outlier) we increment every coordinate of by . (also ) is sampled from uniform distribution over . Afterwards we generate and according to the procedure described in section 2. We try to solve the problem with four aforementioned algorithms (Greedy, LSS, LSNS, LSL). The results are summed up in Figure 1. In all experiments and . TODO: Add explanation of obtained results.
TODO: Average the plots over 50 independent trials.

## Acknowledgments

This work was partially supported by the grants Investissements d’Avenir (ANR-11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047) and CALLISTO. The authors thank the Reviewers for many valuable suggestions.