1 Introduction
Recent years have witnessed the extensive research on metric learning, which aims at learning semantic distance and embeddings such that similar examples are mapped to nearby points on a manifold and dissimilar examples are mapped apart from each other [20, 27, 30, 39]
. Compared to conventional distance metric learning, deep metric learning learns a nonlinear embedding of the data using deep neural networks, and it has shown significant benefits by exploring more loss structures. With the development of these learning techniques, deep metric learning has been widely applied to the tasks of face recognition
[29, 28], image clustering and retrieval [33, 20].Deep metric learning has made remarkable successes in generating discriminative features. To improve the performance of learned features, many learning methods have explored the structures in the objective functions, such as contrastive loss [9], triplet loss [22, 36], lifted structured embedding [20], Npair Loss method [27], etc. These deep metric learning methods can be categorized as structurelearning methods, which focus on constructing more effective structures for objective functions by making use of training batches or increasing negative examples. However, most structurelearning methods simply take the Euclidean distance as the semantic distance metric and ignore that the distance metric is playing a nonnegligible role in deep metric learning. Different from structurelearning, some metric learning methods [37, 6] employ new distance metrics to metric learning. For example, Weinberger et al
. have proposed a distance metric for knearest neighbor (kNN) classification in metric learning, i.e, Mahalanobis distance
[37], which shows that the performance of metric learning algorithms also depends on the distance metric. Contrary to structurelearning methods, these methods exploring a new distance metric can be categorized as distancelearning methods. Compared to the structurelearning methods, designing a good distance metric for measuring the semantic similarity may make a more significant impact on learning discriminative embeddings. Therefore, we focus on designing of a novel and effective distance metric.Measuring similarities between pairs of examples is critical for metric learning. The most wellknown distance metric is Euclidean distance, which has been widely used in learning discriminative embeddings. However, Euclidean distance metric only measures the distance between paired examples in dimensional space, lacking the abilities to preserve the correlation and improve the robustness of the pairs. Therefore, we devise a new distance metric by leveraging a concept defined in signal processing, i.e. SignaltoNoise Ratio (SNR), as a similarity measurement in deep metric learning. Generally, SNR in signal processing is used to measure the level of a desired signal to the level of noise, and a larger SNR value means a higher signal quality. For similarity measurement in deep metric learning, a pair of learned features and can be given as , where
can be treated as a noise. Then, the SNR is the ratio of the feature variance and the noise variance. Based on the definition of SNR in deep metric learning, we find that SNR is promising to be formulated as a distance metric for measuring the differences between paired features.
In this paper, based on the properties in SNR, we propose an SNR distance metric to replace Euclidean distance metric for deep metric learning. In the aspect of space analysis and theoretical demonstration, we explain the advantages of SNR distance over Euclidean distance. Different from Euclidean distance, SNR distance is a more robust distance metric, which can further jointly reduce the intraclass distances and enlarge the interclass distances for the learned features, and preserve the correlations of the features. Moreover, we propose a Deep SNRbased Metric Learning (DSML) method, which uses SNR distance metric as similarity measurement for generating more discriminative features. To show the generality of our SNRbased metric, we also extend our approach to hashing retrieval learning.
Our main contributions can be summarized as follows. (1) To the best of our knowledge, this is the first work that employs SNR to build the distance metric in deep metric learning. By analyzing the properties of the SNR distance metric, we find that it has better performance than Euclidean distance and can be widely used in deep metric learning. (2) We show how to integrate our SNR distance metric into the popular learning frameworks, and propose the corresponding objective function in our DSML. (3) We make extensive experiments on three widelyused benchmarks about image clustering and retrieval tasks, and the results demonstrate the superiority of our deep SNRbased metric learning approach over stateoftheart methods. (4) We extend our SNRbased metric distance to deep hashing learning and obtain promising experiment results.
2 Related Work
2.1 Metric Learning
Metric learning methods, which have been widely applied to image retrieval, clustering and recognition tasks, have attracted much attention. With the development of deep neural networks, deep metric learning methods
[5, 21, 15, 10]have shown promising performance on the complex computer vision tasks. To distinguish the innovations of different deep metric learning methods, we roughly divide these approaches into structurelearning and distancelearning methods, and introduce these works briefly. Related to our work, we also introduce deep hashing methods based on the famous metric learning structures.
2.1.1 StructureLearning Methods
The most wellknown structurelearning approach is contrastive embedding, which is proposed by Hadsell et al. [9]. The main idea of contrastive loss [9] is that similar examples should be mapped to nearby points on a manifold and dissimilar examples should be mapped apart from each other. This idea have established the foundation of the objective functions in deep metric learning. Following this work, the subsequent structurelearning methods have proposed various loss functions with different structures. For example, triplet loss [22, 36] is composed of triplets, and each triplet is consisted of a anchor example, a positive example and a negative example. The triplet loss encourages the positive distance to be smaller than the negative distance with a margin. Lifted structured loss [20]
lifts the vector of pairwise distances within the batch to the matrix of pairwise distances. Npair loss
[27] generalizes triplet loss by allowing joint comparison among more than one negative examples, which means a feature pair is composed of samples from the same labels and other pairs in the minibatch have different labels. ALMN [1] proposes to optimize an adaptive large margin objective via the generated virtual points instead of mining hardsamples. Besides these works, several works [22, 26] try to mine hard negative data on the basis of triple loss, and they can been seen as enhanced structurelearning methods. Different from these structurelearning methods, our work aims to design a new distance metric for deep metric learning. Because most structurelearning methods use the Euclidean distance as their similarity measurement (inner product in Npair loss can be regarded as a similar Euclidean measurement), they can provide the baselines for our work.2.1.2 DistanceLearning Methods
Different from structurelearning approaches, the distancelearning method, which explores a superior distance metric, is also promising to improve the performance of deep metric learning. In traditional metric learning [23, 24], some distancelearning methods have been proposed by using Mahalanobis distance to measure the similarities of samples. For instance, Globerson et al. [8] presented an algorithm to learn Mahalanobis distance in classification tasks. Weinberger et al. [37] showed how to learn a Mahalanobis distance metric for kNN classification from labeled examples. Davis et al. [6] presented an informationtheoretic approach to learning a Mahalanobis distance function. In deep metric learning, we noticed that in order to learn better features, Wang et al. proposed a distancelearning method to constrain the angle at the negative point of triplet triangles [34]. Moreover, Chen et al. [2] introduce energy confusion metric to improve the generalization of the learned deep metric. Chen et al. [3] propose the hybridattention based decoupled metric learning framework for learning discriminative and robust deep metric. However, the angle measurement for triangles has limitations when measuring the distance of two points, and it cannot be regarded as a general distance metric. In this paper, we propose a general distancelearning method, which uses SNRbased metric for measuring the similarity of image pairs in deep metric learning.
2.2 Hashing Learning
Similar to deep metric learning, deep hashing aims to learn a discriminative embedding to preserve the consistency with semantic similarity in binary features. Recently, many deep hashing methods [40, 16, 38, 41, 18, 31, 25, 42] have been proposed to learn compact binary codes and retrieve the similar images in Hamming space. Benefiting from metric learning methods, some deep hashing methods [17, 14, 35]
are established on contrastive embedding or triplet embedding. In this paper, in order to extend the application of our SNRbased metric and verify the generality of the metric, we also propose a deep SNRbased hashing learning method, which aims to generate similaritypreserving binary codes by training the convolutional neural networks with our SNR metric based loss layer.
3 Proposed Approach
Pairwise distances in features are usually measured by Euclidean distance metric, which has been rarely changed [34]. However, designing a good distance metric for measuring the similarity between images is significant for improving the performance of deep metric learning. Therefore, we propose a new SNRbased metric for deep metric learning.
3.1 SNRbased Metric
Definition: In deep metric learning, given two images and , the learned features can be denoted as and , where is the metric learning function and denotes the learned parameters. Given a pair of features and , where the anchor feature is and the compared feature is . We denote the anchor feature as signal, and the compared feature as noisy signal, then the noise in and can be formulated as .
In statistical theory, a standard definition of SNR is the ratio of signal variance to noise variance [7], so we define the SNR between the anchor feature and the compared feature as:
(1) 
where denotes the variance of , and is the mean value of . If , .
The variance in information theory reflects the informativeness. More explicitly, the signal variance measures the useful information, while the noise variance measures the useless information. Therefore, increasing can improve the ratio of useful information to useless information, which reflects the compared feature can be more similar to the anchor feature. On the contrary, decreasing can increase the proportion of noise information, leading to more difference in the two features. Therefore, the values of can be used to measure the difference in a pair of features reasonably, which is an essential to construct a distance metric in metric learning.
SNR distance metric: In deep metric learning, the constraint of most loss functions based on Euclidean distance metric is that similar examples should have short distances in features while dissimilar examples should have large distances in features. According to the constraint, we design a new distance metric as similarity measurement for deep metric learning. On the basis of the definition of SNR, we propose our SNR distance metric. The SNR distance in a pair of features and is defined as:
(2) 
Notably, the commutative property in Euclidean distance is inapplicable in our SNR distance. Because the values of and are usually not equal, our SNR distance is sensitive to which one is the anchor feature in a pair.
To show how SNR distance reflects the differences in a pair of features, we synthesize a 32dimensional Gaussian data with as anchor feature, and a series of Gaussian noises with , where . The compared feature is synthesized by adding the noise data to the anchor feature, then the SNR distance of the anchor feature and compared feature is . As shown in Figure 1, the longer SNR distance reflects that the difference between the anchor feature and the compared feature is larger. Therefore, the SNR distance applied to the loss functions can have a similar property with Euclidean distance (i.e., similar image pairs are supposed to have a short SNR distance in features, while dissimilar image pairs should have a large SNR distance in features). As a result, we can use the SNR distance metric as the similarity measurement to replace the Euclidean distance metric in deep metric learning.
Superiority analysis: To indicate the superiority of SNR distance to Euclidean distance, we compare these two metrics from the view of geometry space and statistical theory.
The Euclidean distance of two points and is defined as:
(3) 
For SNR distance, according to Equations (2) and (3), we can derive that if the features follow zeromean distributions:
(4) 
where denotes the Euclidean distance from to the origin , and is the dimension of learned features . As shown in (4), besides the Euclidean distance of the paired features, the SNR distance also takes into account the Euclidean distance from the feature to the origin.
In order to preserve the semantic similarity, the loss functions with Euclidean distance metric constrain that the Euclidean distances in feature pairs with the same labels should be reduced, while the Euclidean distances in feature pairs with the different labels should be increased. Different from Euclidean distance metric, the loss functions with SNR distance metric can make an additional constraint on the Euclidean distance from origin to the features. As shown in Figure 2, compared to Euclidean distance metric which only measures the Euclidean distances of feature pairs, our SNR distance can not only provide the constraints in Euclidean distances, but also give an additional constraint to enlarge the interclass distances when dealing with similar pairs, and to reduce the intraclass distances when dealing with dissimilar pairs. As a result, in deep metric learning, our SNR distance metric is more powerful to increase the discrimination and robustness of feature pairs.
We also explore the relationship between SNR distance and the correlation coefficient of paired features to further show the superiority to Euclidean distance, If the mean of each feature is zero, and the noise is independent to the signal feature, the correlation coefficient in paired features can be computed via the statistical theory as follows:
(5) 
According to (5), the correlation coefficient of the paired features is an decreasing function of their SNR distance. Increasing the SNR distance will reduce the correlation in dissimilar features, and reducing the SNR distance will increase the correlation in similar pairs. Therefore, by using the SNR distance instead of Euclidean distance, deep metric learning can jointly preserve the semantic similarity and the correlations in learned features.
3.2 Deep SNRbased Metric Learning
Because of the superiority of SNR distance metric, the SNR distance can provide a more effective similarity measurement compared with the Euclidean distance. Besides, the SNR distance can be generally applied to various objective functions of deep metric learning. In order to realize deep SNRbased metric learning (DSML), we select four attractive deep metric learning structures, including contrastive loss [9], triplet loss [22, 36], lifted structured loss [20], and Npair loss [27], to construct our SNRbased objective functions.
In DSML, we denote the learned features as . For an anchor feature , the positive feature is , and the negative one is denoted as . Based on SNR distance metric, the distance of two features , in our DSML functions can be represented as:
(6) 
We use a regularization to constrain that the features have zeromean distributions, and the regularization is defined as:
(7) 
where is a hyperparameter with a small value.
Combined with the four learning structures, the SNRbased objective functions of our DSML are detailed in the following.
DSML(cont): For SNRbased contrastive embedding, our DSML objective function is:
(8) 
where and respectively represent the numbers of positive and negative pairs, denotes the margin to constrain the negative pairs, and denotes the function .
DSML(tri): For SNRbased triplet embedding, the objective function is defined as:
(9) 
which constrains that the positive SNR distance should be smaller than the negative SNR distance with a margin . In triplet embedding learning, we generate all the valid triplets and average the loss over the positive ones.
DSML(lifted): For SNRbased lifted loss function, we deploy the SNR distance as follows:
(10) 
where and denote positive pairs and negative pairs, denotes margin, and is a hyperparameter to ensure the convergence of loss.
Tasks  Image Clustering  Image Retrieval  

score (%)  F1  NMI  Recall@1  Recall@2  
embedding size  16  32  64  16  32  64  16  32  64  16  32  64 
contrastive  9.2  10.6  11.0  31.5  34.4  33.3  8.9  14.0  16.3  10.3  16.1  18.4 
DSML(cont)  12.9  11.9  11.8  39.9  37.0  36.1  15.1  16.5  18.0  17.5  18.6  201 
triplet  19.4  16.9  15.4  50.9  47.9  46.8  24.8  20.6  19.5  28.2  23.5  22.1 
DSML(tri)  25.6  33.1  34.4  52.5  56.8  57.4  38.5  46.3  49.1  42.0  49.8  52.4 
lifted  27.1  29.0  28.1  53.1  54.4  53.9  37.2  39.1  40.6  41.2  42.9  44.3 
DSML(lifted)  30.2  32.1  33.6  54.1  55.6  56.7  35.3  40.3  43.8  38.9  44.0  47.5 
Npair  26.9  29.9  29.5  51.8  53.5  53.6  32.9  36.3  38.3  36.7  39.8  42.1 
DSML(Npair)  30.7  33.1  32.7  54.5  54.4  56.4  37.8  40.4  44.9  39.8  44.5  48.6 
Tasks  Image Clustering  Image Retrieval  

score(%)  F1  NMI  Recall@1  Recall@2  
embedding size  16  32  64  16  32  64  16  32  64  16  32  64 
contrastive  14.6  18.7  19.3  41.6  46.6  47.4  15.8  25.7  29.7  18.0  28.6  32.7 
DSML(cont)  19.6  19.7  22.7  47.5  47.8  50.5  22.2  27.2  33.1  25.3  30.6  36.4 
triplet  23.6  22.1  21.7  56.5  55.6  55.3  33.9  32.8  32.6  37.8  36.4  35.6 
DSML(tri)  36.1  39.0  40.3  63.0  64.0  65.6  45.7  49.8  51.6  49.3  53.5  54.9 
lifted  36.0  36.5  37.2  60.9  61.1  61.4  43.2  44.5  46.8  46.4  47.8  50.4 
DSML(lifted)  41.3  43.9  45.8  63.5  64.5  65.4  46.0  48.8  51.0  49.4  51.9  54.4 
Npair  34.7  35.7  37.6  59.6  60.0  61.5  39.9  40.7  43.1  43.3  44.4  46.9 
DSML(Npair)  37.6  38.1  40.5  62.4  61.9  63.1  42.3  46.2  48.5  48.6  49.7  51.9 
DSML(Npair): In the original Npair loss, each tuplet is composed of , where is the query for , is the positive example, and () are the negative examples. The Npair loss function is constructed by similarity rather than distance, and the similarity is measured by the inner product , which cannot be directly replaced by our SNR distance metric. Therefore, in our DSML(Npair), we construct a SNRbased similarity to adapt our SNRbased metric to Npair learning framework. The similarity of and for DSML(Npair) is:
(11) 
Then, the objective function of DSML(Npair) is:
(12) 
In summary, the objective functions defined in our DSML are easily to be formulated with the guide of the stateoftheart methods in deep metric learning, which implies that our SNRbased metric have a good generality, and it is promising to be widely applied in deep embedding learning.
3.3 Deep SNRbased Hashing Learning
Hashing learning methods aim to generate discriminative binary codes for image samples, where the binary codes of similar images have short Hamming distances, and the binary codes of dissimilar images have long Hamming distances. To indicate the generality of our SNRbased metric, we deploy our SNR distance metric to deep hashing learning.
By using SNRbased contrastive loss (8) as the objective function, we proposed Deep SNRbased Hashing method (DSNRH). The main difference between the deep metric learning and the deep hashing learning is that the learned embeddings need to be quantized to binary features in hashing. Thus, in our DSNRH, after learning the features , we use the sign function to generate binary codes for Hamming space retrieval, where the binary codes is consisted of bit binary codes. Similar to the existing hashing learning methods [14, 35], the similarity labels are given as: if two images and share at least one label, they are similar, otherwise they are dissimilar.
Euclidean Ranking  Hamming Ranking  
score (%)  MAP@59000  F1@5000  MAP@59000  F1@5000  
embedding size  16  32  64  16  32  64  16  32  64  16  32  64 
contrastive  75.5  73.4  69.3  69.1  67.2  61.4  65.5  66.9  61.8  61.2  62.2  56.9 
DSML(cont)  80.0  79.8  79.0  72.9  72.7  72.1  73.7  76.6  76.9  70.0  72.2  71.4 
triplet  75.9  77.3  75.8  70.7  71.2  70.3  71.9  73.7  74.3  67.3  70.2  69.8 
DSML(tri)  78.4  78.3  77.4  72.4  72.5  71.6  73.4  74.5  75.3  69.9  70.8  70.8 
lifted  63.7  54.6  55.5  60.6  52.0  52.0  60.3  52.1  53.9  54.9  50.0  50.8 
DSML(lifted)  78.1  76.2  76.7  73.5  71.1  71.8  66.9  74.3  70.7  58.1  70.5  67.1 
Npair  53.5  51.1  39.5  49.5  47.5  37.8  48.4  48.9  38.6  45.9  46.4  37.3 
DSML(Npair)  62.1  64.1  56.6  57.1  58.8  52.1  55.2  62.0  53.6  50.2  57.3  49.6 
4 Experiments
We mainly conduct experiments on deep metric learning, and also compare our DSNRH with some stateoftheart deep hashing methods.
4.1 Experiments on Deep Metric Learning
4.1.1 Datasets
We choose the finegrained CARS196 and CUB2002011, and the coarsegrained CIFAR10 [12] as the datasets for our deep metric learning experiments. We follow the conventional way to split the training and testing data:
(1) The CARS196 dataset [11] contains 16,185 images of 196 car models. The training set and testing set are composed of 8,144 images and 8,041 images, of 196 models.
(2) The CUB2002011 dataset [32] includes 11,788 images of 200 bird species. The training set and testing set are composed of 5,994 images and 5,794 images, of 200 classes.
(3) The CIFAR10 dataset [12] contains 60,000 32x32 color images of 10 classes. We randomly select 100 images per class as the testing set, then the rest 59,000 images as database set. From the database set, we randomly choose 500 images per class as the training set.
The experiment results of CARS196 and CUB2002011 are reported on the testing set, and the results on CIFAR10 are reported by querying the testing set in the database set.
4.1.2 Implementation Details and Evaluation Metrics
Our method was implemented based on TensorFlow. We adopt the AlexNet
[13] for deep metric learning. In order to generate ddimensional features, we replace the last classifier layer
with an embedding layer ofhidden units. For training, we finetune the layers except of the embedding layer from the model pretrained on ImageNet and train the embedding layer, all through backpropagation. We use minibatch stochastic gradient descent (SGD) with 0.9 momentum, and fix the minibatch size of images as 100, except the relative Npair methods on CIFAR10, which is set to 20 instead. All the input images of these experiments are resized into the 227 x 227 to fit the input size of AlexNet.
To evaluate the performance of different deep metric learning methods, we follow the protocol in [20, 34]
to conduct experiments on both clustering tasks and retrieval tasks. For the clustering tasks, we make experiment on CUB2002011 and CARS196, and use NMI and F1 score to measure the performance of different methods. NMI is defined by the ratio of mutual information and the average entropy of clusters and the entropy of labels. F1 metric computes the harmonic mean of precision(
) and recall(), and F1 = . For image retrieval tasks, we calculate the Recall@K for the experiment results on CUB2002011 and CARS196, and record the MAP and F1 metric for the experiment results on CIAFR10. Recall@K is computed by that each query will score 1 if an semantic similar image is retrieved in K nearest neighbors from test data. MAP is the mean of the Average Precision (AP), and AP of each query is computed as , where is the number of topreturned images, denotes the precision of top retrieved results, and if the th retrieved result is true neighbor of the query, otherwise . We use MAP@59000 and F1@5000 as evaluation criteria for CIFAR10, where MAP@59000 means that MAP on the returned top59000 images, and F1@5000 means F1 scores on the returned top5000 images.method  CIFAR10  method  NUSWIDE  
16 bits  24 bits  32 bits  48 bits  16 bits  24 bits  32 bits  48 bits  
DSRH [44]  0.608  0.611  0.617  0.618  DSRH [44]  0.609  0.618  0.621  0.631 
DSCH [43]  0.609  0.613  0.617  0.620  DSCH [43]  0.592  0.597  0.611  0.609 
DRSCH [43]  0.615  0.622  0.629  0.631  DRSCH [43]  0.618  0.622  0.623  0.628 
DTSH [35]  0.915  0.923  0.925  0.926  DTSH [35]  0.756  0.776  0.785  0.799 
DPSH* [14]  0.903  0.885  0.915  0.911  DPSH [14]  0.715  0.722  0.736  0.741 
DSNRH(Ours)  0.925  0.932  0.934  0.940  DSNRH(Ours)  0.830  0.840  0.852  0.862 
4.1.3 Results and Analysis
Table 1 and Table 2 show the performance of deep metric learning methods on CARS196 and CUB2002011, and we obtain the results by comparing the Euclideanbased deep metric learning methods with our DSML under various embedding sizes, including 16, 32, 64. We observe that the proposed SNRbased metric boosts the performance of stateoftheart metric learning approaches on all the benchmark datasets. The experiment results on CARS196 and CUB2002011 datasets show similar tendency: combined with our DSML, the performance improvements on contrastive, triplet, lifted, Npair loss are all significant.
Figure 3 shows the retrieval results of Recall@K on CARS196 and CUB2002011, at the embedding size of 64. The results show that our DSML obviously outperforms other corresponding Euclideanbased methods. We can find that the most prominent curve in Figure 3 is DSML(tri), which have the highest performance over other methods.
Table 3 shows the comparative results of retrieval tasks on CIFAR10 dataset with two retrieval strategies: Euclidean ranking and Hamming ranking. Euclidean ranking is the general retrieval approach, which computes the Euclidean distance of realvalued features to generate the rank list. Hamming ranking is on the basis of the binary features and computes the Hamming distance. To obtain the binary codes, in our experiment, we make a quantization on realvalued embedding by sign function. As shown the Table 3, our DSML method still has superior results than the related Euclidean distance based metric learning methods. The unsatisfactory results on lifted loss and Npair loss indicate that these losses are not suitable for the CIFAR10 dataset with a large number of images but only ten classes.
Figure 4 shows the tSNE visualizations [19] of the features learned by DSML(cont) and contrastive on CIFAR10. The result indicates that the features learned by our DSML(cont) exhibit more clear discriminative structures, while the original contrastive loss presents relative vague structures.
The encouraging performances of our DSML is because our SNR distance metric has more power to enlarge the interclass distances and reduce the intraclass distances than the traditional Euclidean distance metric. Besides, our SNR distance metric can also preserve correlation information in image pairs to improve the performance in learned embeddings.
4.2 Experiments on Hashing Learning
4.2.1 Datasets
We evaluate the performance on two datasets: CIFAR10 and NUSWIDE, and the results are reported by querying the testing set in the database set.
(1) For CIFAR10 [12], we randomly select 1000 images per class as the test query set, and the rest images are selected as the training set and database set.
(2) NUSWIDE [4] is consisted of 269,648 images associated with 81 tags. Similar to DPSH [14] and DTSH [35], we utilize 21 most frequent concepts to select 195,834 images as experimental dataset. We randomly sample 100 images in each class (2,100 images in total) as the test query images, and the remaining images are used as the training set and database set.
4.2.2 Implementation Details and Evaluation Metrics
Similar to DPSH [14] and DTSH [35], we deploy the CNNF network architecture in our DSNRH. The input images of our experiments are resized into the 224 x 224. We also use minibatch stochastic gradient descent (SGD) with 0.9 momentum, and give the minibatch size of images as 100.
We report MAP@50000 results based on the top 50,000 returned neighbors, at the binary codes length of 16, 24, 32, and 48 bits. In order to have a fair comparison, most of the existing experiment results are directly reported from previous works.
4.2.3 Results and Analysis
We compare the retrieval performance of our DSNRH with five deep hashing methods, including DPSH [14], DTSH [35], DRSCH [43], DSCH [43], DSRH [44]. The MAP results of our experiment are presented in Table 4. We can find that our DSNRH substantially outperforms all the other methods. The performance of some deep hashing methods, including DSRH, DSCH and DRSCH, are inferior to our method, and their average MAP results are only above 60% in two datasets. DPSH and DTSH are also based on the CNNF network architecture, but they have lower precision. The outstand performance of our DSNRH demonstrates that our SNRbased metric can also improve the robustness of hashing code learning.
5 Conclusion
In this paper, we propose a robust distance metric based on SignaltoNoise Ratio (SNR) as similarity measurement for deep metric learning. By replacing the Euclidean distance measurement with our SNR distance metric, we construct deep SNRbased metric learning, which can generate more discriminative features than the Euclideanbased deep metric learning. In the extensive experiments for image clustering and retrieval tasks, our DSML has shown its superiority to the stateoftheart deep metric learning methods on three benchmarks. As an extension of our SNRbased metric, we also propose a deep SNRbased hashing method, and the experiments on two benchmarks show the outstanding performance of DSNRH. Based on the generality of our SNRbased similarity metric, we believe our SNRbased metric is promising to be further applied to more deep learning models.
Acknowledgement
This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61871052, 61573068, 61471048, and 61375031, and by the Beijing Nova Program under Grant No. Z161100004916088.
References
 [1] Binghui Chen and Weihong Deng. Almn: Deep embedding learning with geometrical virtual point generating. arXiv preprint arXiv:1806.00974, 2018.

[2]
Binghui Chen and Weihong Deng.
Energy confused adversarial metric learning for zeroshot image
retrieval and clustering.
In
AAAI Conference on Artificial Intelligence
, 2019. 
[3]
Binghui Chen, Weihong Deng, Jiani Hu, and Haifeng Shen.
Hybridattention based decoupled metric learning for zeroshot image
retrieval.
In
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2019.  [4] TatSeng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. Nuswide: a realworld web image database from national university of singapore. In ICMR, page 48, 2009.
 [5] Yin Cui, Feng Zhou, Yuanqing Lin, and Serge Belongie. Finegrained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In CVPR, pages 1153–1162, 2016.
 [6] Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Informationtheoretic metric learning. In ICML, pages 209–216. ACM, 2007.

[7]
Lee H Dicker.
Variance estimation in highdimensional linear models.
Biometrika, 101(2):269–284, 2014.  [8] Amir Globerson and Sam T Roweis. Metric learning by collapsing classes. In NIPS, pages 451–458, 2006.
 [9] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In null, pages 1735–1742. IEEE, 2006.

[10]
Chen Huang, Chen Change Loy, and Xiaoou Tang.
Local similarityaware deep feature embedding.
In NIPS, pages 1262–1270, 2016.  [11] Jonathan Krause, Michael Stark, Jia Deng, and Li FeiFei. 3d object representations for finegrained categorization. In CVPRW, pages 554–561, 2013.
 [12] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
 [14] WuJun Li, Sheng Wang, and WangCheng Kang. Feature learning based deep supervised hashing with pairwise labels. In AAAI, pages 1711–1717, 2016.
 [15] Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. Person reidentification by local maximal occurrence representation and metric learning. In CVPR, pages 2197–2206, 2015.
 [16] Kevin Lin, Jiwen Lu, ChuSong Chen, and Jie Zhou. Learning compact binary descriptors with unsupervised deep neural networks. In CVPR, pages 1183–1192, 2016.
 [17] Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. Deep supervised hashing for fast image retrieval. In CVPR, pages 2064–2072, 2016.
 [18] Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, and Ling Shao. Deep sketch hashing: Fast freehand sketchbased image retrieval. In CVPR, pages 2862–2871, 2017.

[19]
Laurens van der Maaten and Geoffrey Hinton.
Visualizing data using tsne.
Journal of machine learning research
, 9(Nov):2579–2605, 2008.  [20] Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. Deep metric learning via lifted structured feature embedding. In CVPR, pages 4004–4012, 2016.
 [21] Sakrapee Paisitkriangkrai, Chunhua Shen, and Anton Van Den Hengel. Learning to rank in person reidentification with metric ensembles. In CVPR, pages 1846–1855, 2015.
 [22] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, pages 815–823, 2015.
 [23] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In NIPS, pages 41–48, 2004.
 [24] Shai ShalevShwartz, Yoram Singer, and Andrew Y Ng. Online and batch learning of pseudometrics. In ICML, page 94. ACM, 2004.
 [25] Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, and Heng Tao Shen. Unsupervised deep hashing with similarityadaptive and discrete optimization. IEEE transactions on pattern analysis and machine intelligence, 2018.
 [26] Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training regionbased object detectors with online hard example mining. In CVPR, pages 761–769, 2016.
 [27] Kihyuk Sohn. Improved deep metric learning with multiclass npair loss objective. In NIPS, pages 1857–1865, 2016.
 [28] Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation by joint identificationverification. In NIPS, pages 1988–1996, 2014.
 [29] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to humanlevel performance in face verification. In CVPR, pages 1701–1708, 2014.
 [30] Evgeniya Ustinova and Victor Lempitsky. Learning deep embeddings with histogram loss. In NIPS, pages 4170–4178, 2016.
 [31] Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. In CVPR, pages 5018–5027, 2017.
 [32] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltechucsd birds2002011 dataset. 2011.
 [33] Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. Learning finegrained image similarity with deep ranking. In CVPR, pages 1386–1393, 2014.
 [34] Jian Wang, Feng Zhou, Shilei Wen, Xiao Liu, and Yuanqing Lin. Deep metric learning with angular loss. In ICCV, pages 2612–2620. IEEE, 2017.
 [35] Xiaofang Wang, Yi Shi, and Kris M Kitani. Deep supervised hashing with triplet labels. In ACCV, pages 70–84. Springer, 2016.
 [36] Kilian Q Weinberger, John Blitzer, and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, pages 1473–1480, 2006.
 [37] Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb):207–244, 2009.
 [38] Rongkai Xia, Yan Pan, Hanjiang Lai, Cong Liu, and Shuicheng Yan. Supervised hashing for image retrieval via image representation learning. In AAAI, pages 2156–2162, 2014.
 [39] Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. Distance metric learning with application to clustering with sideinformation. In NIPS, pages 521–528, 2003.
 [40] Peng Xu, Yongye Huang, Tongtong Yuan, Kaiyue Pang, YiZhe Song, Tao Xiang, Timothy M Hospedales, Zhanyu Ma, Jun Guo, et al. Sketchmate: Deep hashing for millionscale human sketch retrieval. In CVPR, pages 8090–8098, 2018.
 [41] Tongtong Yuan, Weihong Deng, and Jiani Hu. Supervised hashing with extreme learning machine. In VCIP, pages 1–4, 2018.
 [42] Tongtong Yuan, Weihong Deng, Jiani Hu, Zhanfu An, and Yinan Tang. Unsupervised adaptive hashing based on feature clustering. Neurocomputing, 323:373–382, 2019.
 [43] Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. Bitscalable deep hashing with regularized similarity learning for image retrieval and person reidentification. IEEE Transactions on Image Processing, 24(12):4766–4779, 2015.
 [44] Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. Deep semantic ranking based hashing for multilabel image retrieval. In CVPR, pages 1556–1564, 2015.