GTH
Matlab codes of GTH
view repo
Recently, learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods, sufficient training images are required and used to learn precise hashing codes. However, in some real-world applications, there are not always sufficient training images in the domain of interest. In addition, some existing supervised approaches need a amount of labeled data, which is an expensive process in term of time, label and human expertise. To handle such problems, inspired by transfer learning, we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e., source domain to help learn precise hashing codes for the domain of interest i.e., target domain. Besides, we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermore,an alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.
READ FULL TEXT VIEW PDF
Most existing learning to hash methods assume that there are sufficient ...
read it
Hashing is widely applied to large-scale image retrieval due to the stor...
read it
With the recent explosive increase of digital data, image recognition an...
read it
In recent years, deep neural networks have emerged as a dominant machine...
read it
Image hashing is a fundamental problem in the computer vision domain wit...
read it
Domain adaptive image retrieval includes single-domain retrieval and
cro...
read it
Parallel best-first search algorithms such as Hash Distributed A* (HDA*)...
read it
Matlab codes of GTH
In recent years, learning to hash algorithms have been proposed to handle the large-scale information retrieval problems in machine learning, computer vision, and big data communities
[Wang et al.2017b]. The main goal of hashing techniques is to encode documents, images, and videos to a set of compact binary codes that preserve the feature similarity/dissimilarity in Hamming space. As a result, there will be less storage cost and faster computational speed by using binary features.However, most existing learning to hash methods are faced with two problems. On one hand, most existing learning to hash methods usually require a large amount of data instances to learn a set of binary hashing codes. However, in some real-world applications, for a domain of interest, i.e., the target domain, the data instances may not be sufficient enough to learn a precise hashing model. Some supervised methods need a large number of labeled images to learn hashing codes. It is well-known that it takes a lot of time, labor and human expertise to tag images. On the other hand, they assume that the distributions of training and testing data are similar, which may not hold in many real-world applications such as cross pose and cross camera cases, etc.
To handle the above problems, inspired by transfer learning, we propose a simple yet effective Optimal Projection Guided Transfer Hashing (GTH) method in this paper. Due to the distribution disparity of source and target domains, we propose to learn two hashing projections for target and source domains respectively in our GTH. Moreover, the knowledge from source domain can be easily used to promote target domain to learn precise hashing codes. In transfer hashing, it is important to guarantee similar images between target and source domains have similar hashing codes. In our GTH, we assume that similar images between target and source domains should mean small discrepancy between hashing projections. To this end, we let the hashing projection (functions) of target domain close to the hashing projection of source domain.
It is easy to adopt minimizing or loss between the two hashing projections of source and target domains directly. In other words, in the term of maximum likelihood estimation, we actually assume that errors between two projections of source and target domains obey Gaussian or Laplacian distribution with the or loss. However, the data distributions of source and target domains are not similar due to the existence of cross pose, cross camera, and illumination variation, etc. Therefore, the distribution of errors may be far from Gaussian or Laplacian distribution. To improve the above problem, we propose the GTH model from the view of maximum likelihood estimation in this paper. Inspired by [Yang et al.2011], we design an iteratively weighted loss for the errors between the projections of source and target domains, which makes our GTH more adaptive to cross-domain case.
Besides, an alternating optimization method is adopted to obtain the two projections of target and source domain such that the domain disparity is reduced gradually. The two different domains can share the hashing projections each other. In other words, the target projection learning is guided by source projection and, in return, the source projection learning is guided by target projection. Finally, the optimal projections of target and source domains will be obtained. The overview of our GTH is shown as Fig. 1. The main contributions and novelties of this paper are summarized as follows.
Guided by transfer learning, we propose a simple Optimal Projection Guided Transfer Hashing (GTH) method. To the best of our knowledge, there are few methods proposed to handle the problem that there are insufficient training images to learn precise model. We first develop a total unsupervised transfer hashing method to solve cross-domain hashing problem for image retrieval based on conventional machine learning.
We first propose to learn hashing projections for target and source domains respectively due to the domain disparity. The domain gap is reduced by modeling on hashing projections rather than data level.
In our GTH, we propose to seek for the maximum like-lihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap, and design an iteratively weighted loss for the errors between the projections of source and target domains such that the high error will be punished. Besides, the projections of target and source domain are optimized in a sharing way such that the domain hashing disparity is reduced gradually.
Extensive experiments on various benchmark databases have been conducted. The experimental results verify that our method outperforms many state-of-the-art learning to hash methods.
In this section, we present related works on learning to hash and transfer learning.
In the past 10 years, various hashing methods have been proposed. Based on whether priori semantic information is used, they can categorized into two major groups: supervised hashing and unsupervised hashing. There are a lot of supervised hashing methods such as LDA hashing [Strecha et al.2011], Minimal Loss Hashing [Norouzi and Fleet2011], FastHash [Lin et al.2014], Kernel-based Supervised Hashing (KSH) [Liu et al.2012], Supervised Discrete Hashing (SDH) [Shen et al.2015], the Kernel-based Supervised Discrete Hashing (KSDH) [Shi et al.2016], and Supervised Quantization for similarity search (SQ) [Wang et al.2016] that preserve similarity/dissimilarity of intra-class/inter-class images by using semantic information. However, there always lacks label information for model learning due to the high cost of labour and finance in some real-world application situation.
Unsupervised hashing methods aim to explore the intrinsic structure of data to preserve the similarity of neighbors without any supervised information. A number of unsupervised hashing methods have been developed in recent years. Locality-sensitive Hashing (LSH) [Gionis, Indyk, and Motwani1999], a typical data-independent method, uses a set of randomly generating projection to transform the image features to hashing codes. The representative unsupervised and data-dependent hashing methods include Spectral Hashing (SH) [Weiss, Torralba, and Fergus2008], Anchor Graph Hashing (AGH) [Liu et al.2011], Iterative Quantization (ITQ) [Gong et al.2013], Density Sensitive Hashing (DSH) [Jin et al.2014], Circulant Binary Embedding (CBE) [Yu et al.2014], etc. Several ranking-preserved hashing algorithms have been proposed recently to learn more discriminative binary codes e.g., Scalable Graph Hashing (SGH) [Jiang and Li2015], and Ordinal Constraint Hashing (OCH) [Liu et al.2018].
Transfer learning (TL) [Pan and Yang2010]
, a new proposed learning conception, aims to transfer knowledge across two different domains such that rich source domain knowledge can be utilized to generate better classifiers on a target domain. In transfer learning, the transferred knowledge can be labels
[Zhou et al.2014], [Yang et al.2017], features [Zhang and Zhang2016], [Xu et al.2017], [Yang et al.2016], [Wang, Zhang, and Zuo2017] and cross domain correspondences [Zhang, Zuo, and Zhang2016], [Wang et al.2017a]. Transfer learning has shown promising results in many machine learning tasks, such as classification and regression. To the best of our knowledge, there are few works on studying transfer learning for hashing. Most of them are based on deep learning
[Venkateswara et al.2017]. The recent work [Zhou et al.2018] proposes a transfer hashing from shallow to deep. Different from their works, we focus on how to transfer knowledge across hashing projection in an unsupervised manner. It is worth noting that the labels in neither of target and source domains are used in our GTH.In this section, we present the detailed discussion of Optimal Projection Guided Transfer Hashing (GTH) method.
Suppose that we have target data points . We aim to learn a set of binary code to well preserve feature information of the original dataset. is the corresponding binary codes of . , , and denote the number of the target domain samples, the dimension of each sample, and the code length of binary feature, respectively. Similar with most of learning to hash methods, we also learn hashing projection to map and quantize each into a binary codes . However, when the available target training data is limited, i.e., is small, the binary codes learned by existing learning to hash methods can’t perform well. In our GTH, we take advantage of the knowledge (i.e., features) of another known domain (i.e., source domain). Suppose that we have already obtained source data points .
We denote and where is hashing projection of target domain and is hashing projection of source domain. equals to 1 if and -1 otherwise. In our GTH, to reduce the distribution discrepancy, we let hashing projection of target domain close to source domain:
(1) |
We denote that represents the error matrix.
is one element in the error matrix. As discussed above, from the view of maximum likelihood estimation (MLE), the error matrix follows Gaussian distribution by using the Eq.
1. However, the different data distributions of source and target domains may lead to that the probability distribution of error matrix is far from Gaussian distribution. Without loss of generality, we let
. Assume thatare independently and identically distributed according to some probability density function (PDF)
where and denotes the parameter set that characterizes the distribution. The likelihood estimation can be represented as and MLE aims to maximize this likelihood function or minimize the negative log likelihood function: where .With the above analysis, the Eq. 1 with uncertain probability density function can be transformed into the following minimization problem:
(2) |
In general, we assume that the unknown PDF is symmetric, and the bigger error will assign a low probability value if . Therefore, has the following properties: is the global minimal of . Specially, we denote ; ; if .
Denote that . We approximate by using its first order Taylor expansion in the neighborhood :
(3) |
where is the second-order remained term, and is the derivative of .
(4) |
is a diagonal matrix and we denote
(5) |
where we randomly assign a value to which satisfies if otherwise . represents first derivative. Because is the global minimal of , we can get . We denote such that we can obtain the following objective function
(6) |
It is obvious that each element in the diagonal matrix can be regarded as a weight coefficient to each error value . We expect that the higher value will be assigned a lower weight coefficient .
According to [Yang et al.2011] and [Zhang et al.2003], we also choose the signmoid function as the weight function
(7) |
where and are positive scalars. Parameter controls the decreasing rate from 1 to 0, and controls the location of demarcation point. For the choice of and , we just follow [Yang et al.2011]. Considering the Eq.5, Eq.7, and , we obtain as following
(8) |
Therefore, we can transform Eq.6 into matrix form as following objective function.
(9) |
We denote where we randomly choose a value as which satisfies if otherwise . Note that is the matrix form of all diagonal elements in .
It is worth noting that the Eq.9 can be viewed as a inductive model. If we let , the Eq.9 is just Eq.1 which assumes that the errors obey Gaussian distribution. Specially, in this paper, GTH-h refers to Eq.9 with being Eq.7 and GTH-g refers to Eq.9 with .
The quantization loss between hashing codes and its magnitude is used as regularization term in GTH. Besides, we impose orthogonality constraints to hashing projections. The overall objective function is as following
(10) |
where and denote the regularization coefficients.
In this paper, we propose a weighted loss for the errors between the projections of source and target domains, and update the weight coefficients by using the errors from the last iteration. As the non-convex function makes Eq. 10 a NP-hard problem, we relax the function as its signed magnitude [Lazebnik2011]. Therefore, the Eq. 10 can be rewritten as
(11) |
As mentioned above, we will adopt a relax way to solve problem (10). The solutions for optimization problem (11) can be calculated by alternatingly updating the variables, , , , , and .
-Step. By fixing , , , and , the projection of target domain can be obtained by solving the following subproblem
(12) |
Updating is a typical optimization problem with orthogonality constraints. We apply the optimization procedure in [Wen and Yin2013] to update . Let be the partial derivative of the objective function with respect to . is represented as
(13) |
To preserve the orthogonality constraint on
, we first define the skew-symmetric matrix
[Armstrong2005] as . Then, we adopt Crank Nicolson like scheme [Wen and Yin2013]to update the orthogonal matrix
:(14) |
where denotes the step size. We empirically set . By solving Eq. 14, we can get
(15) |
and . We iteratively update several times based on Eq. 15 with the Barzilai-Borwein (BB) method [Wen and Yin2013].
-Step. By fixing , , , and , the projection of source domain can be solved as:
(16) |
Updating is the same as . Let be the partial derivative of the objective function with respect to . is represented as
(17) |
To preserve the orthogonality constraint on , we define the skew-symmetric matrix as . Then, we adopt Crank Nicolson like scheme to update the orthogonal matrix :
(18) |
where denotes the step size. We empirically set same as updating . By solving Eq. 18, we can get
(19) |
and . We iteratively update several times based on Eq. 19 with the Barzilai-Borwein (BB) method.
-Step and -Step . As and are two binary matrixes, the solutions can be directly obtained as:
(20) |
(21) |
-Step. The weight matrix is directly computed as following:
(22) |
The overall solving procedures are summarized in Algorithm 1.
In this section, extensive experiments are conducted to evaluate the proposed hashing method on image retrieval performance. We perform the experiments on three groups benchmark datasets: PIE-C29&PIE-C05 from PIE [Sim, Baker, and Bsat2002], Amazon&Dslr from Office [Saenko et al.2010], and VOC2007&Caltech101 from VLCS [Torralba and Efros2011]. We also choose five state-of-the-art learning-to-hash methods, LSH [Gionis, Indyk, and Motwani1999], ITQ [Gong et al.2013], CBE [Yu et al.2014], DSH [Jin et al.2014], and OCH [Liu et al.2018] as baselines. For fair comparison, we introduce a NoDA method acted as OCH method.
Description of Datasets: The PIE dataset consists of 41,368 face images from 68 subjects as a whole. The images are under five near frontal poses (C05, C07, C09, C27 and C29). We use two subsets chosen from poses C05 and C29. Each image is resized to
and represented by a 1024-dim vector. We use pose C29 (containing 1632 images) as target domain and pose C05 (containing 3332 images) as source domain. Specially, for target domain, we randomly select 500 samples as testing images and the rest samples as training images.
PIE-C29&PIE-C05 | Amazon&Dslr | VOC2007&Caltech101 | |||||||||||||
Bit | 16 | 24 | 32 | 48 | 64 | 16 | 24 | 32 | 48 | 64 | 16 | 24 | 32 | 48 | 64 |
LSH | 18.23 | 21.79 | 25.26 | 29.91 | 32.96 | 19.69 | 28.92 | 35.12 | 46.72 | 53.07 | 11.06 | 16.51 | 20.61 | 27.41 | 33.12 |
ITQ | 18.17 | 21.63 | 23.74 | 26.82 | 28.86 | 43.15 | 51.74 | 56.80 | 62.47 | 65.84 | 21.69 | 28.52 | 33.46 | 39.50 | 42.34 |
CBE | 16.31 | 22.13 | 27.10 | 30.06 | 32.51 | 20.82 | 27.60 | 36.21 | 47.52 | 51.96 | 11.04 | 15.64 | 20.68 | 26.97 | 33.84 |
DSH | 17.05 | 19.60 | 22.01 | 25.65 | 28.12 | 26.51 | 32.34 | 37.39 | 48.29 | 50.12 | 8.69 | 6.23 | 13.40 | 15.56 | 20.21 |
OCH | 20.75 | 26.29 | 28.96 | 33.33 | 34.39 | 41.77 | 52.41 | 56.00 | 62.38 | 65.45 | 32.94 | 35.45 | 38.00 | 41.46 | 42.25 |
NoDA | 21.06 | 24.76 | 26.51 | 32.11 | 32.34 | 41.64 | 51.96 | 57.21 | 63.29 | 65.63 | 30.77 | 34.81 | 36.95 | 40.78 | 41.80 |
GTH-g | 24.16 | 28.40 | 31.69 | 34.95 | 35.70 | 44.16 | 53.57 | 57.59 | 63.91 | 66.96 | 28.62 | 41.20 | 46.42 | 56.59 | 63.10 |
GTH-h | 25.45 | 29.42 | 31.76 | 35.25 | 36.56 | 45.23 | 52.36 | 57.26 | 63.17 | 65.63 | 30.05 | 39.70 | 48.14 | 57.33 | 63.53 |
The Office dataset is a most popular benchmark dataset for object recognition in the domain adaptation computer vision community. The dataset consists of daily objects in an office environment. Office has 3 domains: Amazon (A), Dslr (D), and Webcam (W). We use Amazon with 2817 images as the source domain and Dslr with 498 images as target domain. 100 images from target domain are randomly selected as testing images and the rest images are used as training images. Each image is represented by a 4096-d CNN feature vector [Donahue et al.2013].
The VLCS aggregates photos from Caltech, LabelMe, Pascal VOC 2007 and SUN09. It provides a 5-way multiclass benchmark on the five common classes: bird , car , chair , dog and person . The VOC 2007 dataset containing 3376 images is used as source domain and Caltech containing 1415 images is used as target domain. 100 images from target domain are randomly selected as testing images and the rest images are used as training images. Each image is represented by a 4096-d CNN feature vector [Donahue et al.2013].
Parameter settings and Implementation details: There are two trade-off parameters in the objective function (10). and are used to penalize the loss between the binary codes and its signed magnitude. For our GTH, we empirically set to 0.1 and to 1.
The compared baseline methods are proposed under no domain adaption assumption. For a fair contrast, we use all the source domain data and target domain training data (except the queries on the target domain) as the model input for all compared methods. Besides, we use OCH as a NoDA method. In training phase, we use the training images in target domain as the input of NoDA method. We only focus the retrieval performance on target domain.
Retrieval evaluation: In the Table. 1, we report the MAP scores of all the compared methods and our GTH on PIE-C29&PIE-C05, Amazon&Dslr, and VOC2007&Caltech101 databases. The code lengths are varying from 16 to 64. From the table, we can see that our GTH outperforms compared methods on all databases in most cases. More detailedly, our GTH-h outperforms best compared method NoDA over 4% on PIE-C29&PIE-C05 datasets when the code length is set as 16 bit. On Amazon&Dslr datasets, our GTH-h outperforms best compared method OCH almost 4% with code length set to 16. On the VOC2007&Caltech101 databases, our GTH outperforms much more than the best compared method when the code length is set as 24, 32, 48, and 64. The above results demonstrate the effectiveness of our GTH model and our GTH is more suitable to the condition that there are not enough training images used to learn precise hashing codes in the domain of interest. We also show the PR-curve, Precision and Recall for PIE-C29&PIE-C05 datasets as shown in Fig. 2, Amazon&Dslr dataset as shown in Fig. 3, and VOC2007&Caltech101 databases as shown in Fig. 4. The code length is set to 32 in Figures 2, 3, and 4. From the figures, we can see that our GTH always presents competitive retrieval performance compared to baselines, which demonstrates the efficiency of our GTH.
In order to further demonstrate the efficiency of our GTH by using less target training data, we use different numbers of training data on target domain to learn the hashing functions. Specially, we choose 10%, 30%, 50%, and 70% images from training data of target domain as training data i.e., model input. After training, we also use testing hashing codes to search the most similar hashing codes in the whole training samples. The experiments are conducted on PIE-C29&PIE-C05, Amazon&Dslr, and VOC2007&Caltech101 databases respectively. The MAP scores of all compared methods and our GTH are shown in Fig. 5. Due to the input number limitation of OCH method, there are empty MAP scores in some cases. The code length is set as 32. It is worth noting that our GTH always outperforms all the compared methods, which further demonstrates the efficiency of our GTH on the condition that there are less target domain samples to learn precise hashing codes on the domain of interest.
In order to further investigate the properties of the proposed method, the retrieval performances versus the different values of regularization parameters, and , are explicitly explored. To clearly show the results, we perform experiments on Amazon&Dslr databases to verify the parameters sensitivity. Specifically, we tune the value of both parameters from {0.0001, 0.001, 0.01, 0.1, 1, 10}. The MAP scores with code length set to 64 are shown in Fig. 6. We can observe that the performances of our GTH-g and GTH-h models are not very sensitive to the settings of and . Apparently, when the parameters are not very large, the MAP scores of our methods are not severely influenced. This also demonstrates that both regularization terms are indispensable for superior performances. Overall, the proposed models are not sensitive to the parameters in a reasonable range.
We propose a simple but effective transfer hashing method named Optimal Projection Guided Transfer Hashing (GTH) in this paper. Inspired by transfer learning, we propose to borrow the knowledge from a related but different domain. We assume that similar images between target and source domains should mean small discrepancy between hashing projections. Therefore, we let the projections of target and source domain close to each other so that the similar instances between those two domain will be transformed into similar hashing codes. We propose the GTH model from the view of maximum likelihood estimation in this paper and design a iteratively weighted loss for the errors between the projections of source and target domains, which makes our GTH more adaptive to cross-domain case. Extensive experiments on three groups benchmark databases have been conducted. The experimental results show that our GTH always show much higher retrieval performance when there are much less target samples, which verify that our method outperforms many state-of-the-art learning to hash methods.
This work is supported by National Natural Science Fund of China (Grant 61771079) and the Fundamental Research Funds for the Central Universities.
Numerical solution of partial differential equations. by smith g. d. . pp. viii, 179. 25s. 1965. (oxford university press).
Mathematical Gazette 50(374):179–449.International Conference on Artificial Intelligence
, 2248–2254.IEEE Conference on Computer Vision and Pattern Recognition
, 817–824.Fast supervised hashing with decision trees for high-dimensional data.
In Computer Vision and Pattern Recognition, 1971–1978.Robust sparse coding for face recognition.
In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 625–632. IEEE.Modified logistic regression: an approximation to svm and its applications in large-scale text categorization.
In Twentieth International Conference on International Conference on Machine Learning, 888–895.IEEE Transactions on Neural Networks and Learning Systems
.
Comments
There are no comments yet.