Regression-based Hypergraph Learning for Image Clustering and Classification

03/14/2016 ∙ by Sheng Huang, et al. ∙ Rutgers University 0

Inspired by the recently remarkable successes of Sparse Representation (SR), Collaborative Representation (CR) and sparse graph, we present a novel hypergraph model named Regression-based Hypergraph (RH) which utilizes the regression models to construct the high quality hypergraphs. Moreover, we plug RH into two conventional hypergraph learning frameworks, namely hypergraph spectral clustering and hypergraph transduction, to present Regression-based Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph Transduction (RHT) models for addressing the image clustering and classification issues. Sparse Representation and Collaborative Representation are employed to instantiate two RH instances and their RHSC and RHT algorithms. The experimental results on six popular image databases demonstrate that the proposed RH learning algorithms achieve promising image clustering and classification performances, and also validate that RH can inherit the desirable properties from both hypergraph models and regression models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

As the generalization of the graph model [1, 2], the hypergraph model is more flexible and more intuitive to depict the complex relation of data, since the edge of hypergraph, which is known as the hyperedge, can contain more than two vertices. Due to this desirable property, hypergraph learning is recently drawn intensive attention. Over past decades, extensive hypergraph learning approaches has been proposed and successfully applied to tackle a lot of fundamental tasks, such as clustering [1, 3], classification [4, 5, 6], segmentation [7], dimensionality reduction [8, 9] and multi-label learning [10, 11].

As same as graph learning, the hypergraph construction process plays a vital role in hypergraph learning and a good quality hypergraph should well reveal the real relation of samples. Hypergraph learning is a frequently used tool for unsupervised and semi-supervised learning. In these two cases, the previous hypergraph learning works often adopt the neighbourhood-based (distance-based) strategy to build the hypergraph 

[1, 12, 13]. More specifically, for each sample, a hyperedge is generated by connecting this centroid sample and its nearest neighbors. However, as such neighborhood-based often cannot well even correctly discover the real relations of samples and it is also very sensitive to noises, the quality of hypergraph is lowered which directly degrades the performance of hypergraph learning.

In the recent decade, Sparse Representation (SR) has achieved remarkable successes for addressing dozens of computer vision and machine learning issues 

[14, 15, 16]. The main merits of SR are the strong discriminating power and the excellent robustness to noises which endow SR with a better related sample selection capacity for a given sample in comparison with the conventional neighborhood-based approaches (We refer to an toy example in our early work [17] to experimentally verify this argument, please see Fig 1). In other words, SR can better discover the real relations of data. Motivated by this fact, several novel graph learning approaches have been developed via leveraging SR to construct the graphs [18, 17, 19, 20, 21]. Compared to the traditional graph learning approaches, these works have achieved better performances. Since the hypergraph model is the generalization of the graph model and hypergraph learning is closely related to graph learning, we believe that the success of SR in graph learning should also be applied to hypergraph learning. Moreover, SR is essentially a or -norm regularized regression model and its success also motivates the presentations of many influential regression models which often enjoy some desirable properties [14, 15, 22, 23, 24, 25]. Cleary, these successful works can also provide some new ways for graph or hypergraph construction.

(a) Raw Samples and their rank scores
(b) Samples with noise and their rank scores
Fig. 1: This figure was originally shown in our early work [17]

. We use it here for intuitively specifying the advantages of SR over the conventional neighbourhood-based method in the relevant sample selection procedure. The figure shows the top 10 most relevant face images selected by SR and K-Nearest Neighbour (KNN) based on a given query face image. This experiment is conducted in a subset of FERET database 

[26] (72 subjects with 6 images in each subject). The first two rows of the figure are the selection results of SR while the last two rows are the selection results of KNN. The left subfigure reports the results on the original FERET database while the right one reports the results on the modified FERET database in which 30% of pixels of each image has been corrupted by noise. In the figure, the first face image of each image array is the query image and the rest ten images are the relevant face images selected by SR or KNN. The histograms above the image array demonstrates the confidence scores of these top ten relevant face images. If the subjects of the return face image and the query face image are identical, its corresponding histogram is positive otherwise it is negative. In the figure, SR gets five hits either on the original FERET database or on the noisy FERET database while KNN only gets three and two hits on these two datasets respectively. Clearly, this phenomenon demonstrates the advantages of SR in relevant sample selection.

In this paper, we generalize the idea of sparse graph to present a novel hypergraph construction framework, which can leverage the regression model to construct the high quality hypergraph. We name such framework Regression-based Hypergraph (RH) model. More specifically, in RH, each sample represents a vertex and constructs a regression system together with the rest samples for measuring correlations of samples. Then, based on the obtained correlations, each sample and its top most relevant samples are employed to define a hyperedge. Moreover, the mean of the correlations among samples in a hyperedge is considered as the weight of this hyperedge, since the correlation is an intuitive measure of the closeness between two samples. We also plug the regression-based hypergraph into two classical hypergraph learning frameworks, namely hypergraph spectral clustering and hypergraph transduction, to present Regression-based Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph Transduction (RHT) models for addressing the clustering and classification issues. As two of the most influential regression models for visual learning, Sparse Representation (SR) and Collaborative Representation (CR) are adopted as two examples to instantiate two RH instances. Since SR and CR are actually the -norm and -norm regularized regression models, we name these two instances -Hypergraph (H) and -Hypergraph (H) respectively. Similarly, their hypergraph spectral clustering and hypergraph transduction algorithms are named as -Hypergraph Spectral Clustering (HSC) and -Hypergraph Transduction (HT) where or 2 if H or H is applied.

Regression-based Hypergraph (RH) model inherits the advantages of both hypergraph and regression models. Compared to the conventional hypergraph models, RH can incorporate some properties from the chosen regression models. For example, if the Sparse Representation (SR) or Collaborative Representation (CR) is selected for hypergraph construction, the constructed RH should be more discriminative and robust, since these two regression models are better to discover the relevances among samples over the conventional neighborhood-based hypergraph construction fashions. Compared to the regression approaches, RH constructs a hypergraph to sufficiently exploit the correlation of each pair of samples instead of just utilizing the correlations between the target sample and the other samples as the regression approaches do. Compared to the regression-based graph approaches, such as the sparse graph, RH is a hypergraph model which owns a better capability and flexibility to depict the complex high-order data relations.

We employ six popular visual databases to validate our works. The experimental results demonstrate the superiority of the RH model over the conventional hypergraph models. We conclude three main contributions of our works as follows:

  1. We provide a general idea which utilizes the regression models to construct the high quality hypergraphs. To the best of our knowledge, this paper is the first to formally and systematically build a bridge between the hypergraph model and the regression model.

  2. We present two novel hypergraph learning frameworks based on the RH model to tackle the clustering and classification tasks respectively.

  3. We adopt two recently influential regression models, namely Sparse Representation (SR) and Collaborative Representation (SR), to instantiate two RH instances called -Hypergraph (H) and -Hypergraph (H) which are experimentally proved to be more discriminative and robust than the conventional hypergraphs.

The rest of paper is organized as follows: the previous works are reviewed in Section II. Section III introduces the methodology of our works; experiments are presented in section IV; the conclusion is finally summarized in section V.

Ii Previous Works

Ii-a Regression Models

Regression model is a common technique for data analysis and has been successfully applied to almost all the areas in computer vision, machine learning and image processing [27, 14, 22]. Sparse Representation (SR) may be the most influential regression approach in the recent decade. SR is mainly inspired by the idea of compressed sensing [28]. In SR, a or -norm constraint is introduced to the common regression model for compulsively selecting only a few of relevant measurements and ignoring the irrelevant ones by assigning their corresponding regression coefficients to zero. This endows SR with a strong discriminating power and a good robustness. However, Zhang _meaning:NTF . et al _catcode:NTF a et al. et al. [22, 24]

argued that the collaboration of samples instead of the sparsity is the essential factor that leads to such good discriminating ability and robustness. They proposed a linear regression model named Collaborative Representation (CR) via employing a relatively mild

-norm constraint to replace the

-norm constraint to achieve the collaboration property. Many works have shown that CR is more efficient and can get a similar or even better performance. Due to the desirable properties of SR and CR, they have achieved remarkable successes in many areas and promotes the presentations of many impressive regression approaches for addressing different computer vision, machine learning and image processing issues. For examples, Gao _meaning:NTF . et al _catcode:NTF a et al. et al. kernelized SR for face recognition and image classification 

[15]. Yuan _meaning:NTF . et al _catcode:NTF a et al. et al. presented a multitask joint sparse representation model to combine the strength of multiple features and/or instances for visual classification [29]

. Huang _meaning:NTF . et al _catcode:NTF a et al. et al. presented a SR-based classifier named Class Specific Sparse Representation (CSSR) which incorporated the properties of both SR and CR 

[25] via defining the homogenous samples as a group and making them competition for representing the test sample. Yang _meaning:NTF . et al _catcode:NTF a et al. et al. proposed Relaxed Collaborative Representation (RCR) to effectively exploit the similarity and distinctiveness of samples [23]. Although these regression approaches have obtained promising performances in different fields, they all have an obvious drawback that they can only utilize the correlation between the testing sample and the training samples. On the contrary, the proposed Regression-based Hypergraph (RH) model can sufficiently exploit the correlations among all samples. Another merit of RH is that there exists extensive regression approaches which can bring more flexility to the hypergraph model.

Ii-B Sparse Graph

Since Sparse Representation (SR) is good at selecting the relevant samples for a test sample even in the noisy conditions, some researchers have attempted to use SR to construct high quality graphs for addressing different issues. In these works, such constructed graphs are often called -graph or sparse graph and have achieved very promising performances. More specifically, Qiao _meaning:NTF . et al _catcode:NTF a et al. et al. and Timofte _meaning:NTF . et al _catcode:NTF a et al. et al. successively use SR to construct a sparse graph for dimensionality reduction [19, 20]. Huang _meaning:NTF . et al _catcode:NTF a et al. et al. leverage SR to measure the correlations between each two samples and then construct a sparse graph for transduction [30]. The Sparse Subspace Clustering (SSC) algorithms [21, 31, 32] learn a sparse graph for clustering via considering the data self representation problem as a SR issue. Similar to [19, 20], Cheng _meaning:NTF . et al _catcode:NTF a et al. et al. utilize SR to construct the -graph (sparse-graph) for spectral clustering, subspace learning and semi-supervised learning [33]. Although the applications and the learning (or construction) procedures of these works are very different, the obtained sparse graphs are very similar which all demonstrate the better discriminative abilities and robustness over the conventional graph models. The main drawback of the sparse graph models is that they cannot intuitively describe the high-order complex data relations, because these sparse graph models are essentially graph model whose edges can only depict the simple pairwise data relation. Since Regression-based Hypergraph (RH) model is deemed as a generalization of sparse graph from the perspectives of both regression and hypergraph, it does not suffer from this issue.

Ii-C Hypergraph Models

As a generalization of graph, hypergraph represents the structure of data via measuring the similarity between groups of points [13, 12, 4, 34, 1, 35]

. The main difference between graph and hypergraph is that the edge of hypergraph can own more than two vertices which endows hypergraph with a high flexility for depicting the high-order relation. Benefitted by this desirable property, hypergraph models have been successfully applied into dozens of computer vision, machine learning and pattern recognition areas. In the past, the researchers were more keen to develop different hypergraph frameworks which define different theories to depict the hypergraph structure. The representative approaches include Clique Expansion 

[2], Star Expansion [2], Zhou’s Normalized Laplacian [1], Clique Averaging [36], Bolla’s Laplacian [37] and so on. However, as was shown in [38]

, all of the previous approaches, despite their very different formulations, can be proved to be equivalent to each other under specific conditions. Currently, the researchers pay more attention on developing the algorithms for the hypergraph constructions under the aforementioned hypergraph frameworks. In hypergraph, hyperedge defines the relation of data. Therefore, the hyperedge generation is very crucial to the quality of the constructed hypergraph. Conventionally, most of hypergraph models adopt the neighbourhood-based fashion to generate the hyperedges. For examples, Huang _meaning:NTF . et al _catcode:NTF a et al. et al. proposed a hypergraph learning framework for image retrieval, in which each image and its

-nearest neighbors form the hyperedge [12]. Zhou _meaning:NTF . et al _catcode:NTF a et al. et al. also adopted such neighbourhood-based fashion to generate the hyperedges for unsupervised and semi-supervised hypergraph learning [1]. The main problems of these approaches are that they often cannot well reveal the real relation of data and are sensitive to noise. Some researchers also employed the clustering techniques to generate the hyperedges and then construct the hypergraph. As the representative approach of such category, Gao _meaning:NTF . et al _catcode:NTF a et al. et al. proposed a hypergraph-based 3-D object retrieval approach via utilizing the -means to cluster the views of the 3-D objects and consider each cluster as a hyperedge [13]. Since the hyperedges, which are formed by the clusters, cannot share intersection vertices, these hypergraphs cannot capture the correlations of data. Another popular method is to adaptively contruct the hypergraph via imposing some meaningful constraints. As an instance of this category, Yu _meaning:NTF . et al _catcode:NTF a et al. et al. introduced a -norm constraint to the hyperedge weight matrix to present a hypergraph transduction approach for image classification [4]. This method generates the hyperedges via adaptively assigning the weights to the hyperedges. However, it cannot guarantee the inexistence of the isolated vertices. Similar to the work [4], Wang _meaning:NTF . et al _catcode:NTF a et al. et al. imposed a Laplacian cost constraint and a -norm constraint to the hyperedge weights for adaptively learning the hyperedge weights in a hypergraph model [5]. In its hyperedge generation procedure, the traditional neighbourhood-based fashion is employed to define a candidate hyperedge vertex set and then SR is applied to prune noisy vertices in this set for forming the final hyperedge. Such idea is similar but also different to us. It still considers the neighbours of a sample as its relevant samples and SR here only plays a role as a noise remover. On the contrary, Regression-based Hypergraph (RH) model thoroughly utilizes the regression model (includes SR) to generate the hyperedged. And RH is more formal and systematic to introduce how to use regression models to construct the high quality hypergraph.

Iii Methodology

Iii-a Regression-based Hypergraph

In order to incorporate some desirable properties of the regression algorithms, we introduce a new hypergraph learning framework named Regression-based Hypergraph (RH), which leverages different regression models to construct the hypergraphs. Let a -dimensional matrix be the sample matrix, where is the dimension of sample and is the number of sample. The

-dimensional column vector

is a sample which is also the

-th column of sample matrix. We apply the general formulation of regression model to estimate the correlations between each sample and the rest samples,

(1)

where and are the regression error and the regularization term respectively. is the sample matrix which excludes the -th sample . The ()-dimensional column vector is the regression coefficient vector with respect to the sample . Each element of regression coefficient vector encodes the correlation between the target sample and the sample. According to the aforementioned correlation computation fashion, each pair of samples can get two correlations, _meaning:NTF . i.e _catcode:NTF a i.e. i.e., the samples and has two correlations and . Extensive literatures [18, 19, 39, 30, 40] show that such correlation between two samples is a high quality similarity measure of samples. Therefore, following the sample similarity computation fashion in [18, 19], we define the sample similarity as the mean of the correlation absolute values of each pair of samples to guarantee the nonnegativity and symmetry of the similarity. More specifically, the similarity between the sample and the sample can be mathematically denoted as follows

(2)

The regression model cannot compute the self-correlations of samples. In other words, the self-similarity computation of a sample is still not provided. In such case, we define the self-similarity of a sample as the sum of the similarities between this sample and the rest samples,

. According to the obtained similarities among all the samples, it is not hard to construct a similarity matrix (or affinity matrix)

where is the -th element of . Then, the normalization of the sample similarities can be done as follows

(3)

where -dimensional matrix is diagonal matrix whose -th diagonal element is the sum of elements in the -th row of , .

After obtaining the similarities, we define a hypergraph for depicting the relations among samples where and are the collections of its vertices and hyperedges respectively. In this hypergraph, each sample is deemed as a vertex, _meaning:NTF . i.e _catcode:NTF a i.e. i.e., the sample is corresponding to the vertex . Same as the conventional hypergraph construction fashion, a sample and its top most similar samples are employed to define a -length hyperedge . Therefore, for a data collection constructed by samples, we can obtain hyperedges. The weight of the hyperedge is defined as the mean similarity of samples in this hyperedge,

(4)

where is the number of pairs of vertices in the hyperedge .

Iii-B Learning with Regression-based Hypergraph

Spectral clustering and hypergraph transduction are the most common unsupervised and supervised hypergraph learning techniques respectively. In this subsection, we apply our Regression-based Hypergraph (RH) model to these two techniques for validating the effectiveness of our model. We develop a novel spectral clustering and hypergraph transduction framework and name them Regression-based Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph Transduction (RHT) respectively. We begin by introducing some common definitions of hypergraph learning [1]. We denote the degrees of vertex and hyperedge as the sum of weights of hyperedges which are incident to the given vertex and the number of vertices in the hyperedge respectively. Mathematically, the degrees of vertex and hyperedge are respectively reformulated as and . The vertex-edge incident matrix is a common tool for depicting the structure of hypergraph. each of its rows and columns are corresponding to the vertex and the hyperedge of hypergraph respectively. More specifically, for a hypergraph consisted by vertices and hyperedges, its vertex-edge incident matrix is a -dimensional binary matrix. If the vertex is on the hyperedge , the ()-th element of is 1, otherwise, 0. Due to the hyperedge generation fashion of RH, the dimension of its vertex-edge incident matrix is .

From the perspective of graph learning, the spectral clustering is actually a graph (or hypergraph) partition issue [1, 41, 42]. Then, we can consider the regression hypergraph-based spectral clustering problem as a normalize hypergraph cut issue. According to Zhou’s work [1], such issue can be solved by following optimization model,

(5)

where the -dimensional matrix is the collection of the hypergraph cuts of the given regression hypergraph . The -dimensional column vector is a hypergraph cut which introduces a binary partition to the given hypergraph and its elements indicate the confidences of how the corresponding vertices belonging to a subgraph after partition. is a row of matrix which encodes the elements of hypergraph cuts corresponding to the vertex .

According to the normalized cut criterion [42], the optimal hypergraph cuts should maximize the compactness of partitioned subgraphs and minimize the compactness of the boundaries between the subgraphs simultaneously. The compactness of subgraphs and boundaries is measured by the normalized summation of the hyperedge weights of a vertex set. With several reductions, Equation 5 can be further translated into the following matrix expression,

where function returns the trace of matrix. , and are the diagonal matrix forms of , and respectively.

is the identity matrix and

is the derived normalized hypergraph Laplacian matrix which encodes the structure of the regression hypergraph . The detail deductions of this equation can be referred to the works [1, 11].

The problem in Equation III-B

is a typical eigenvalue problem. It can be easily solved by eigenvalue decomposition technique. The top

optimal hypergraph cuts are exactly the top eigenvectors corresponding the top minimal nonzero eigenvalues. Finally, the learned hypergraph cut collection is deemed as the new representation of data for clustering.

Graph-based transduction is a semi-supervised learning technique which is often leveraged to address the labeling and classification issues. In the semi-supervised case, the labels of some of data are available. Therefore, an optimal hypergraph cut should not only minimize the loss of the geometric structure of data (the loss of data relation) but also minimize the labeling error. Conventionally, the labeling error is measured by the Euclidean distance between the labels and the hypergraph cuts, since the hypergraph cuts can be deemed as the collection of the label indicator of vertex. Thus, the original hypergraph partition model in Equation 5 can be further improved as the following regularized hypergraph partition model for considering the labeling error of data,

(7)

where is a positive parameter to reconcile these two losses and the -dimensional matrix is the collection of labels. is the label vector of the -th class. Let us denote is the label of vertex . Then, we have or -1 if the vertex belonging to -th class or other classes respectively, and 0 if the vertex is unlabeled. Note, here the collection of hypergraph cuts has the same size as where .

Such problem is a typical Least Square (LS) problem which can be efficiently solved. According to works [1, 4], its solution is as follows

(8)

After obtaining , the labeling (or classification) of -th sample can be accomplished by assigning it to the -th class that satisfies .

Iii-C Two RH Instances: -Hypergraph and -Hypergraph

Sparse Representation (SR) and Collaborative Representation are two of the recent most influential regression models in computer vision and machine learning. We employ them to instantiate two Regression-based Hypergraph (RH) instances. Since SR and CR are actually the -norm and -norm regularized regression models, we name these two RH instances -Hypergraph (H) and -Hypergraph (H) respectively. The detail information of H and H are presented in Table 1. We have also plug H and H into Regression-based Hypergraph Transduction (RHT) and Regression-based Hypergraph Spectral Clustering (RHSC) frameworks to produce four new semi-supervised or unsupervised hypergraph learning approaches respectively named -Hypergraph Transduction (HT), -Hypergraph Transduction (HT), -Hypergraph Spectral Clustering (HSC), and -Hypergraph Spectral Clustering (HSC). The relations of these approaches are shown in Table II. In the next section, we apply these four RH instance algorithms to demonstrate the superiorities of our models over the conventional hypergraph models and verify the assumption that RH should inherit some desirable properties from the chosen regression models.

Name Regression model in Equation 1 in Equation 1
H SR [14]
H CR [22]
TABLE I: The detail information of two mentioned RH instances.
RH Instances Hypergraph Learning Frameworks
RHT (Semi-Supervised) RHSC (Unsupervised)
H (SR [14]) HT HSC
H (CR [22]) HT HSC
TABLE II: The relations of the proposed four hypergraph learning approaches.

Iv Experiments

In this section, we conduct some experiments to employ the aforementioned four RHSC and RHT intances, namely HSC, HSC, HT and HT, to tackle the image clustering and classification tasks respectively. Since these four algorithms are all generated from Sparse Representation (SR) or Collaborative Representation (CR) which enjoy the robustness to noise and occlusion, we also conduct some experiments to discuss if these four algorithms have inherited such desirable property.

(a) AR
(b) ORL
(c) COIL20
(d) ETH80
(e) Scene15
(f) Caltech256
Fig. 2: The samples from six involved image databases.

Iv-a Datasets and Compared Methods

Six image datasets, named AR [43], ORL [44], COIL20 [45], ETH80 [46], Scene15 [47] and Caltech256 [48], are leveraged for validating our works. AR face database consists of more than 4,000 color images of 126 subjects [43]. Following paper [27, 8], a subset contains 2600 images with 100 subjects are constructed in our experiment. Each subject has 26 images. The first 14 images of each subject are not involved any occlusion while the rest 12 images are involved the occlusions. In these 12 images, the faces in the first six images are occluded by the sunglasses and the faces in the other six images are occluded by the scarfs. In the general image classification and clustering experiments, only the images without any occlusion are utilized. The whole dataset is leveraged to analysis the robustness of the proposed work to disguise. The size of face image on AR database is 6043 pixels. ORL database is a face image database, which contains 400 images from 40 subjects [44]. Each subject has ten images acquired at different times. The size of face image on ORL database is 3232 pixels. The COIL-20 database has 20 objects and each object has 72 images which are obtained by the rotation of the object through 360 in 5 steps (1440 images in total) [45]. The size of each image is 3232 pixels on COIL20 database. The ETH80 object database [46] contains 80 objects from 8 categories. Each object is represented by 41 views spaced evenly over the upper viewing hemisphere (3280 images in total). The original size of each image in this dataset is 128128 pixels. We resize them to 3232 pixels. Scene15 database [47] is a scene database, which has 15 classes with 100 samples per category. Following paper [1], a subset of Caltech256 database [48], which has 20 classes with 100 samples per category, is used in our experiments. We directly use grayscale as the feature on ORL, AR, COIL20 and ETH80 databases. PiCoDes [49] is adoptted to represent the images on Scene15 and Caltech256 databases, since they are more challenging. The dimension of PiCoDes feature is 2048. Figure 2 shows some samples of these six image databases.

Nonnegative Matrix Factorization (NMF) [50], Graph regularized Nonnegative Matrix Factorization (GNMF) [51], Normalized Cut (NCut) [42], Normalized Hypergraph Spectral Clustering (NHSC) [1], Large-scale Spectral Clustering (LSC) [52], Clique Expansion-based Hypergraph Spectral Clustering using Matrix Trace Weights (CEHSC+Trace) [3], Normalized Hypergraph Spectral Clustering using the mean of distances between the centroid and the vertices in a hyperedge (NHSC+Cent) [3], and -Graph Spectral Clustering (GSC) [18] are chosen as the compared approaches in the image clustering experiments. NCut can be deemed as a sort of regular graph spectral clustering algorithm. So there are three graph spectral clustering algorithms. They are Ncut, LSC and GSC. NHSC, CEHSC+Trace and NHSC+Cent are three hypergraph spectral clustering methods. The only difference between NHSC and NHSC+Cent is the weights of hyperedges. NHSC uses the mean of distances between each two samples in a hyperedge while NHSC+Cent uses the mean of distances between the hyperedge centroid and the samples in a hyperedge. NHSC+Cent and CEHSC+Trace are referenced from our recent work [3] which empirically studied the effect of hyperedge weighting scheme to hypergraph learning and select the most optimal weighting schemes for different hypergraph frameworks. NHSC+Cent and CEHSC+Trace are the best hyperedge weighting scheme and hypergraph spectral clustering combinations for addressing clustering issue reported in [3].

We employ Sparse Representation-based Classifier (SRC) [14], Collaborative Representation-based Classifier (CRC) [22]

, LIB-Support Vector Machine (LIBSVM) 

[53], Normalized Hypergraph Transduction (NHT) [1], Graph Transduction (GT) [54], Adaptive Hypergraph-based Classifier (AHC) [4], Normalized Hypergraph Transduction using Matrix Trace Weights (NHT+Trace) [3], Normalized Hypergraph Transduction using the Volume of Simplex Weights (NHT+Volume) [3], and Sparse Graph-based Classifier (SGC) [17] as the compared approaches for image classification. GT and SGC are the graph transduction algorithms while AHC, NHT, CEHT+Trace and NHT+Volume are the hypergraph transduction algorithms. CEHT+Trace and NHT+Volume are the best hyperedge weighting scheme and hypergraph transduction combinations for addressing classification issue reported in [3]. In the experiments, all the compared methods are well tuned.

Methods Clustering Accuracy Normalized Mutual Information (NMI)
AR ORL COIL20 ETH80 Scene15 Caltech256 AR ORL COIL20 ETH80 Scene15 Caltech256
NMF [50] 24.29 51.25 60.59 46.86 59.33 38.25 56.12 70.31 70.89 41.13 58.41 38.89
GRNMF [51] 28.86 65.75 82.22 52.16 62.87 39.80 60.52 82.19 89.99 46.93 61.32 38.45
NCut [42] 61.79 67.75 69.60 45.61 56.87 38.00 80.71 82.01 77.00 38.02 60.21 38.69
NHSC [1] 36.71 66.75 76.60 50.85 58.87 36.40 64.34 81.57 85.26 47.76 58.46 37.32
LSC [52] 35.86 66.00 76.04 55.55 66.33 43.95 65.43 82.39 86.13 56.22 64.01 41.32
CEHSC+Trace [3] 35.50 70.50 82.29 51.89 67.47 36.30 64.00 82.75 89.12 46.83 62.03 34.01
NHSC+Cent [3] 36.71 69.50 76.60 46.86 63.33 44.15 64.36 81.62 85.26 46.72 59.53 43.29
GSC [18] 59.93 69.50 68.13 51.77 56.27 34.30 78.77 82.50 77.76 50.00 56.74 37.86
HSC 70.21 78.25 82.85 56.34 67.60 42.15 86.71 87.21 90.99 58.72 64.40 41.88
HSC 73.43 79.50 82.15 53.38 69.20 44.95 86.73 87.22 89.26 53.97 65.80 43.60

TABLE III: Image Clustering performance comparison (in percents) on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 databases. (In percentage)

Iv-B Image Clustering

We conduct the image clustering experiments on all six image databases. For each database, the cluster number is fixed to its category number. Following [51, 55]

, Clustering Accuracy and Normalized Mutual Information (NMI) are leveraged as two evaluation metrics.

Table III reports the clustering performances of different clustering methods. The bold number indicates the best accuracy in a dataset under same metric. We can find that HSC and HSC outperform almost all compared approaches in all experiments and achieve remarkable improvements over the conventional hypergraph spectral clustering algorithms. For examples, the clustering accuracy gains of HSC over NHSC are 33.50%, 11.50%, 6.25%, 5.49%, 9.73% and 5.75% on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets respectively. Similarly, such gains of HSC are 36.72%, 11.75%, 5.55%, 10.33% and 8.55%. The experimental results also show that, as the generalization of sparse graph spectral clustering, HSC and HSC obtain the better performances over GSC which is known as a sparse graph spectral clustering algorithm. More specifically, The NMI gains of HSC over GSC on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets are 7.94%, 4.71%, 13.23%, 8.72%, 7.66% and 4.02% respectively. Similarly, the NMI improvements of HSC over GSC are 7.96%, 4.72%, 11.50%, 3.97%, 9.06% and 5.74% on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets. Another interesting phenomenon can be observed from the experimental results is that HSC performs much better than HSC comprehensively. We attribute this to the fact that Collaborative Representation (CR) is often more discriminative than Sparse Representation (SR) [22, 23].

Methods Classification Errors (Mean 

 Standard Deviation, %)

AR COIL20 ETH80 Scene15
CRC [22] 25.071.31 11.191.10 20.346.34 26.731.43
SRC [14] 19.572.22 11.810.98 29.708.80 26.802.83
LIBSVM [53] 26.500.10 12.501.18 30.825.99 25.402.17
NHT [1] 32.001.41 4.860.00 27.59 26.801.89
GT [54] 29.570.81 43.130.49 27.320.95 32.402.07
AHC [4] 33.141.21 10.131.41 27.132.85 25.541.59
NHT+Trace [3] 32.211.31 3.680.69 26.104.31 25.471.89
NHT+Volume [3] 32.291.21 4.790.29 25.824.61 25.401.98
SGC [17] 16.791.91 9.441.18 28.386.25 26.130.42
HT 10.501.91 2.150.49 17.441.98 26.133.96
HT 4.790.10 4.514.42 17.320.43 25.272.92
TABLE IV: Classification performance comparison (in percents) using AR, COIL20, ETH80 and Scene15 databases.

Iv-C Image Classification

We employ AR, COIL20, ETH80 and Scene15 databases for evaluating image classification performances of different classifiers. The two-fold cross-validation scheme is applied in the image classification experiments.

Table IV reports the classification errors of different classifiers on different databases. Similar to the experimental results of image clustering, the proposed approaches, HT and HT achieve very promising classification performance and consistently outperform the other hypergraph transduction approaches. Particularly, It is worthwhile to point out that HT ranks first on three databases among all four databases. More specifically, the classification accuracy gains of HT over the other hypergraph transduction approaches, namely NHT, AHC, NHT+Trace and NHT+Volume on AR database are 27.21%, 28.35%, 27.42% and 27.50% respectively. Such numbers of HT on ETH80 database are 10.15%, 9.69%, 8.66% and 8.50% respectively. From the observations, it is not hard to find that HT often performs much better than the SGC which can be deemed as the pairwise version of HT. We believe that such phenomenon well verifies the better representational power of -hypergraph over -graph.

(a) Noise Level = 0.1
(b) Noise Level = 0.2
(c) Noise Level = 0.3
(d) Noise Level = 0.4
(e) Noise Level = 0.5
Fig. 3: The samples from five noisy versions of COIL20 database.
(a) Accuracy (Image Clustering)
(b) NMI (Image Clustering)
(c) Error (Image Classification)
Fig. 4: The image clustering or classification performances of different hypergraph-based approaches using different noisy versions of COIL20 database. means the higher the better while means the lower the better.

Iv-D Robustness Analysis

It is well known that Sparse Representation (SR) and Collaborative Representation (CR) are robust to the disguise and noise. Since -Hypergraph (H) and -Hypergraph (H) are generated by SR and CR, here we conduct several experiments to see if H and H also enjoy such desirable properties.

Iv-D1 Robust to Noise

In order to evaluate the robustness of our works to noise, we construct five noisy versions of COIL dataset via randomly assigning zero or 255 to a certain proportion of pixels in the image. In these noisy COIL20 databases, we use the Noise Level to measure the noise degree of data where the Noise Level is defined as the ratio of the number of noisy dimensions to the number of total dimensions. The Noise Levels of these five noisy versions of COIL20 database are 0.1, 0.2, 0.3, 0.4 and 0.5 respectively (see the examples in Figure 3). We follow the same fashions as mentioned in the previous sections to conduct the image classification and clustering experiments in these noisy image databases.

We plot the experimental results of different approaches under different Noise Levels in Figure 4. Figures 4(a) and 4(b) report the clustering accuracy and NMI respectively. From these observations, we can see that the clustering performances of all approaches are decreased along with the increasing of Noise Levels. However, compared with the conventional hypergraph methods, such as NHSC and CEHSC, GSC, HSC and HSC perform much better and their clustering performances are apparently decreased much slower along with the increasing of Noise Level. Clearly, such phenomenon shows that HSC and HSC are more robust to noise in comparison with the conventional hypergraph spectral clustering models. Figure 4(c) shows the classification errors of different approaches under different Noise Levels. The observations of image classification experiments are quite different to the ones of the aforementioned image clustering experiments. The performances of all approaches are very similar, and only HT slightly outperforms the others. We mainly attribute this phenomenon to the fact that the training samples all contain noise and the supervision from category labels compulsively introduces the noise samples to the hypergraph learning models. Therefore, the contributions of the robust sample selection procedures from Collaborative Representation (CR) and Sparse Representation (SR) are weakened. Comprehensively speaking, H and H enjoy a certain amount of robustness to noise particularly in the unsupervised way.

Metric Methods
NHSC NHSC+Cent CEHSC+Trace HSC HSC
AC 20.32 20.38 19.06 48.00 68.19
NMI 46.11 45.99 46.60 67.81 80.42
TABLE V: Classification performance comparison (in percents) on AR database with real disguise.
Methods Classification Errors (%)
CRC [22] 17.08
SRC [14] 58.25
NHT [1] 92.67
AHC [4] 92.83
NHT+Trace [3] 92.67
NHT+Volume [3] 92.92
SGC [17] 20.00
HT 20.42
HT 4.00
TABLE VI: Classification performance comparison (in percents) using AR, COIL20, ETH80 and Scene15 databases.

Iv-D2 Robust to Disguise

We employ AR database to evaluate the image clustering and classification performances under the disguise case, since AR database have already provided the face images with natural disguise (sunglasses and scarfs) for each subject. In the image clustering experiments, all the face images are applied. In the image classification case, we use the face images without any occlusion for training while use the face images with occlusion for testing.

Table V lists the clustering performances of our works and three other conventional hypergraph approaches. In these experiments, HSC and HSC show the significant advantages. For examples, the clustering accuracy gains of HSC over NHSC, NHSC+Cent and CEHSC+Trace are 27.68%, 27.62% and 29.00% respectively. HSC performs even better. The clustering accuracy improvements of HSC over these conventional hypergraph approaches are 47.87%, 47.81% and 49.13% respectively. Table VI tabulates the classification errors of different approaches in the disguise case. From the observations, HT and HT obtain very promising performance while all the other traditional hypergraph transduction approaches are totally failed. The classification error of HT is 23 times lower than the ones of NHT, AHC, NHT+Trace and NHT+Volume. Clearly, the experimental results demonstrate that HT and HT are robust to disguise.

(a) The impact of to HSC
(b) The impact of to HSC
Fig. 5: The impacts of different parameters to the clustering performances of HSC and HSC.
Database Candidates The optimal
HSC HSC
ORL {2:11}
AR {2:14}
COIL20 {3,5,10,20,36,54,72}
ETH80 {3,5,10,41:41:410}
Scene15 {5:5:50,60:10:100}
Caltech256 {5:5:50,60:10:100}
TABLE VII: The hyperedge length () candidate collections of HSC and HSC and their optimal hyperedge lengths in different datasets.
(a) The impact of to HT
(b) The impact of to HT
(c) The impact of to HT
(d) The impact of to HT
Fig. 6: The impacts of different parameters to the classification performances of HT and HT.
Database Candidates The optimal
HT HT
AR {2:14}
COIL20 {3,5,10,20,36,54,72}
ETH80 {3,5,10,41:41:410}
Scene15 {5:5:50,60:10:100}
TABLE VIII: The hyperedge length () candidate collections of HT and HT and their optimal hyperedge lengths in different datasets.

Iv-E Parameter Settings

In this section, we discuss the influences of different parameters to our works. The RHSC methods, such as HSC and HSC, mainly involve two parameters, and , where is used for controlling the regression regularization term in a regression model while is the hyperedge length. The RHT methods, such as HT and HT, have one more parameter, , which is used for balancing the labeling error of data and the loss of hypergraph partition. Although the hyperedge length is an important parameter which can deeply influence the quality of hypergraph, it has numerous possible values and it is impossible for us to evaluate each possible value. So, in our experiments, we define a hyperedge length candidate collection and then conduct experiments to empirically find the optimal value of hyperedge length. Tables VII and VIII respectively list the hyperedge length candidate collections of the proposed RH works and report their optimal hyperedge lengths. Figure 5 demonstrates the influences of to the clustering performances of HSC and HSC. We choose the mean of clustering accuracy and NMI as the comprehensive clustering performance evaluation metrics. From the observations in Figure 5, we can find that the performance of HSC is quite insensitive to and the optimal of HSC on ORL, AR, COIL20, ETH80, Scene15 and Caltech256 databases, are , , , and respectively. With regard to HSC, a higher value of is much better on Scene15, Caltech256, COIL20 and ETH80 databases while a medium is more suitable to the AR and ORL databases. Figure 6 shows the impacts of and to the classification performances of HT and HT. The phenomena as similar as the ones in Figure 5 are observed. HSC is not very sensitive to and the best for AR, COIL20, ETH80 and Scene15 databases are , , and respectively. A medium works well for HT on AR and COIL20 databases while HT with a higher shows the better performances on Scene15 and ETH80 databases. Moreover, from the observations in Figures 6(c) and 6(d), both HT and HT can achieve the best classification performances when is larger than 1.

V Conclusion

In this paper, we presented a new solution for hypergraph construction in which the regression models are used for measuring the closeness among samples. We named this new hypergraph framework Regression-based Hypergraph (RH). Based on two conventional hypergraph learning models, namely Hypergraph Spectral Clustering (HSC) and Hypergraph Transduction (HT), We also developed the Regression-based Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph Transduction (RHT) models for addressing the clustering and classification issues. As two influential regression approaches for visual learning, Sparse Representation (SR) and Collaborative Representation (CR) are employed to instantiate two instances of RH and their RHSC and RHT algorithms. Six popular image databases are leveraged for validating the effectiveness of our works. It can be concluded from observations of the experiments that RH inherits the desirable properties from both hypergraph and regression model.

There still exist many interesting future works based on our models, since RH is a general framework for hypergraph construction in which researchers can flexibly choose their own appropriate regression models to construct RHs for tackling different tasks. For examples, RH can be further applied to hypergraph-based subspace learning [8]

, feature selection 

[56], multi-label learning [6, 57] or attribute learning [11]. Moreover, to investigate the adaptive and efficient RH construction fashion is also a very meaningful direction for improving the utility and efficiency [4, 58].

Acknowledgement

References

  • [1] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: Clustering, classification, and embedding,” in Advances in neural information processing systems (NIPS), 2006, pp. 1601–1608.
  • [2] J. Y. Zien, M. D. Schlag, and P. K. Chan, “Multilevel spectral hypergraph partitioning with arbitrary vertex sizes,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 9, pp. 1389–1399, 1999.
  • [3] S. Huang, A. Elgammal, and D. Yang, “On the effect of hyperedge weights on hypergraph learning,” arXiv preprint arXiv:1410.6736, 2014.
  • [4] J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning and its application in image classification,” IEEE Transactions on Image Processing, vol. 21, no. 7, pp. 3262–3272, 2012.
  • [5] M. Wang, X. Wu et al., “Visual classification by l1-hypergraph modeling,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 9, pp. 2564–2574, 2015.
  • [6] G. Chen, J. Zhang, F. Wang, C. Zhang, and Y. Gao, “Efficient multi-label classification with hypergraph regularization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 1658–1665.
  • [7] P. Ochs and T. Brox, “Higher order motion models and spectral clustering,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 614–621.
  • [8] S. Huang, D. Yang, Y. Ge, D. Zhao, and X. Feng, “Discriminant hyper-laplacian projections with its applications to face recognition,” in IEEE conference on Multimedia and Expo Workshop on HIM (ICMEW), 2014, pp. 1–6.
  • [9] S. Huang, D. Yang, J. Zhou, and X. Zhang, “Graph regularized linear discriminant analysis and its generalization,” Pattern Analysis and Applications, vol. 18, no. 3, pp. 639–650, 2015.
  • [10] L. Sun, S. Ji, and J. Ye, “Hypergraph spectral learning for multi-label classification,” in ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.
  • [11] S. Huang, M. Elhoseiny, A. Elgammal, and D. Yang, “Learning hypergraph-regularized attribute predictors,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 409–417.
  • [12] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas, “Image retrieval via probabilistic hypergraph ranking,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3376–3383.
  • [13] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-D object retrieval and recognition with hypergraph analysis,” IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 4290–4303, 2012.
  • [14] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
  • [15] S. Gao, I. W.-H. Tsang, and L.-T. Chia, “Kernel sparse representation for image classification and face recognition,” in European Conference on Computer Vision (ECCV), 2010, pp. 1–14.
  • [16] C.-Y. Lu, H. Min, J. Gui, L. Zhu, and Y.-K. Lei, “Face recognition via weighted sparse representation,” Journal of Visual Communication and Image Representation, vol. 24, no. 2, pp. 111–116, 2013.
  • [17] S. Huang, D. Yang, J. Zhou, and L. Huangfu, “Sparse graph-based transduction for image classification,” Journal of Electronic Imaging, vol. 24, no. 2, p. 023007, 2015.
  • [18] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with L1-graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, 2010.
  • [19] R. Timofte and L. Van Gool, “Sparse representation based projections,” in British machine vision conference (BMVC), 2011, pp. 61–1.
  • [20] L. Qiao, S. Chen, and X. Tan, “Sparsity preserving projections with applications to face recognition,” Pattern Recognition, vol. 43, no. 1, pp. 331–341, 2010.
  • [21] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2790–2797.
  • [22] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?” in International Conference on Computer Vision (ICCV), 2011, pp. 471–478.
  • [23] M. Yang, L. Zhang, D. Zhang, and S. Wang, “Relaxed collaborative representation for pattern classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2224–2231.
  • [24] R. Timofte and L. Van Gool, “Weighted collaborative representation and classification of images,” in International Conference on Pattern Recognition (ICPR), 2012, pp. 1606–1610.
  • [25] S. Huang, Y. Yang, D. Yang, L. Huangfu, and X. Zhang, “Class specific sparse representation for classification,” Signal Processing, vol. 116, pp. 38–42, 2015.
  • [26] P. J. Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The feret database and evaluation procedure for face-recognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, 1998.
  • [27] I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106–2112, 2010.
  • [28] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  • [29] X.-T. Yuan, X. Liu, and S. Yan, “Visual classification with multitask joint sparse representation,” IEEE Transactions on Image Processing, vol. 21, no. 10, pp. 4349–4360, 2012.
  • [30] S. Huang, D. Yang, J. Zhou, L. Huangfu, and X. Zhang, “Sparse graph-based transduction for image classification,” Journal of Electronic Imaging, vol. 24, no. 2, p. 023007, 2014.
  • [31] X. Peng, L. Zhang, and Z. Yi, “Scalable sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 430–437.
  • [32] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in International Conference on Machine Learning (ICML), 2010, pp. 663–670.
  • [33] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with L1-graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, 2010.
  • [34] L. Pu and B. Faltings, “Hypergraph learning with hyperedge expansion,” in Machine Learning and Knowledge Discovery in Databases, 2012, pp. 410–425.
  • [35] A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios, “Simultaneous cast shadows, illumination and geometry inference using hypergraphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 437–449, 2013.
  • [36] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie, “Beyond pairwise clustering,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp. 838–845.
  • [37] M. Bolla, “Spectra, euclidean representations and clusterings of hypergraphs,” Discrete Mathematics, vol. 117, 1993.
  • [38] S. Agarwal, K. Branson, and S. Belongie, “Higher order learning with graphs,” in International Conference on Machine Learning (ICML), 2006, pp. 17–24.
  • [39] S. Yan and H. Wang, “Semi-supervised learning by sparse representation.” in SIAM International Conference on Data Mining (SDM).   SIAM, 2009, pp. 792–801.
  • [40] L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, and N. Yu, “Non-negative low rank and sparse graph for semi-supervised learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, 2012, pp. 2328–2335.
  • [41] A. Y. Ng, M. I. Jordan, Y. Weiss et al.

    , “On spectral clustering: Analysis and an algorithm,”

    Advances in neural information processing systems (NIPS), vol. 2, pp. 849–856, 2002.
  • [42] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
  • [43] A. Martínez and R. Benavente, “The ar face database,” Bellatera, Jun 1998.
  • [44] F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of the Second IEEE Workshop on Applications of Computer Vision, 1994, pp. 138–142.
  • [45] S. A. Nene, S. K. Nayar, H. Murase et al., “Columbia object image library (COIL-20),” Technical Report CUCS-005-96, Tech. Rep., 1996.
  • [46] B. Leibe and B. Schiele, “Analyzing appearance and contour based methods for object categorization,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2003, pp. II–409.
  • [47] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2006, pp. 2169–2178.
  • [48] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset,” 2007.
  • [49] A. Bergamo, L. Torresani, and A. W. Fitzgibbon, “Picodes: Learning a compact code for novel-category recognition,” in Advances in Neural Information Processing Systems (NIPS), 2011, pp. 2088–2096.
  • [50] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
  • [51] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548–1560, 2011.
  • [52] X. Chen and D. Cai, “Large scale spectral clustering with landmark-based representation.” in

    AAAI Conference on Artificial Intelligence (AAAI)

    , 2011.
  • [53] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011.
  • [54] O. Duchenne, J.-Y. Audibert, R. Keriven, J. Ponce, and F. Ségonne, “Segmentation by transduction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
  • [55] D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 12, pp. 1624–1637, 2005.
  • [56] Z. Zhang, P. Ren, and E. R. Hancock, “Unsupervised feature selection via hypergraph embedding.” in Birtish Machine Vision Conference (BMVC), 2012, pp. 1–11.
  • [57] L. Sun, S. Ji, and J. Ye, “Hypergraph spectral learning for multi-label classification,” in ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.
  • [58] W. Liu, J. He, and S.-F. Chang, “Large graph construction for scalable semi-supervised learning,” in International conference on machine learning (ICML), 2010, pp. 679–686.