I Introduction
As the generalization of the graph model [1, 2], the hypergraph model is more flexible and more intuitive to depict the complex relation of data, since the edge of hypergraph, which is known as the hyperedge, can contain more than two vertices. Due to this desirable property, hypergraph learning is recently drawn intensive attention. Over past decades, extensive hypergraph learning approaches has been proposed and successfully applied to tackle a lot of fundamental tasks, such as clustering [1, 3], classification [4, 5, 6], segmentation [7], dimensionality reduction [8, 9] and multilabel learning [10, 11].
As same as graph learning, the hypergraph construction process plays a vital role in hypergraph learning and a good quality hypergraph should well reveal the real relation of samples. Hypergraph learning is a frequently used tool for unsupervised and semisupervised learning. In these two cases, the previous hypergraph learning works often adopt the neighbourhoodbased (distancebased) strategy to build the hypergraph
[1, 12, 13]. More specifically, for each sample, a hyperedge is generated by connecting this centroid sample and its nearest neighbors. However, as such neighborhoodbased often cannot well even correctly discover the real relations of samples and it is also very sensitive to noises, the quality of hypergraph is lowered which directly degrades the performance of hypergraph learning.In the recent decade, Sparse Representation (SR) has achieved remarkable successes for addressing dozens of computer vision and machine learning issues
[14, 15, 16]. The main merits of SR are the strong discriminating power and the excellent robustness to noises which endow SR with a better related sample selection capacity for a given sample in comparison with the conventional neighborhoodbased approaches (We refer to an toy example in our early work [17] to experimentally verify this argument, please see Fig 1). In other words, SR can better discover the real relations of data. Motivated by this fact, several novel graph learning approaches have been developed via leveraging SR to construct the graphs [18, 17, 19, 20, 21]. Compared to the traditional graph learning approaches, these works have achieved better performances. Since the hypergraph model is the generalization of the graph model and hypergraph learning is closely related to graph learning, we believe that the success of SR in graph learning should also be applied to hypergraph learning. Moreover, SR is essentially a or norm regularized regression model and its success also motivates the presentations of many influential regression models which often enjoy some desirable properties [14, 15, 22, 23, 24, 25]. Cleary, these successful works can also provide some new ways for graph or hypergraph construction.. We use it here for intuitively specifying the advantages of SR over the conventional neighbourhoodbased method in the relevant sample selection procedure. The figure shows the top 10 most relevant face images selected by SR and KNearest Neighbour (KNN) based on a given query face image. This experiment is conducted in a subset of FERET database
[26] (72 subjects with 6 images in each subject). The first two rows of the figure are the selection results of SR while the last two rows are the selection results of KNN. The left subfigure reports the results on the original FERET database while the right one reports the results on the modified FERET database in which 30% of pixels of each image has been corrupted by noise. In the figure, the first face image of each image array is the query image and the rest ten images are the relevant face images selected by SR or KNN. The histograms above the image array demonstrates the confidence scores of these top ten relevant face images. If the subjects of the return face image and the query face image are identical, its corresponding histogram is positive otherwise it is negative. In the figure, SR gets five hits either on the original FERET database or on the noisy FERET database while KNN only gets three and two hits on these two datasets respectively. Clearly, this phenomenon demonstrates the advantages of SR in relevant sample selection.In this paper, we generalize the idea of sparse graph to present a novel hypergraph construction framework, which can leverage the regression model to construct the high quality hypergraph. We name such framework Regressionbased Hypergraph (RH) model. More specifically, in RH, each sample represents a vertex and constructs a regression system together with the rest samples for measuring correlations of samples. Then, based on the obtained correlations, each sample and its top most relevant samples are employed to define a hyperedge. Moreover, the mean of the correlations among samples in a hyperedge is considered as the weight of this hyperedge, since the correlation is an intuitive measure of the closeness between two samples. We also plug the regressionbased hypergraph into two classical hypergraph learning frameworks, namely hypergraph spectral clustering and hypergraph transduction, to present Regressionbased Hypergraph Spectral Clustering (RHSC) and Regressionbased Hypergraph Transduction (RHT) models for addressing the clustering and classification issues. As two of the most influential regression models for visual learning, Sparse Representation (SR) and Collaborative Representation (CR) are adopted as two examples to instantiate two RH instances. Since SR and CR are actually the norm and norm regularized regression models, we name these two instances Hypergraph (H) and Hypergraph (H) respectively. Similarly, their hypergraph spectral clustering and hypergraph transduction algorithms are named as Hypergraph Spectral Clustering (HSC) and Hypergraph Transduction (HT) where or 2 if H or H is applied.
Regressionbased Hypergraph (RH) model inherits the advantages of both hypergraph and regression models. Compared to the conventional hypergraph models, RH can incorporate some properties from the chosen regression models. For example, if the Sparse Representation (SR) or Collaborative Representation (CR) is selected for hypergraph construction, the constructed RH should be more discriminative and robust, since these two regression models are better to discover the relevances among samples over the conventional neighborhoodbased hypergraph construction fashions. Compared to the regression approaches, RH constructs a hypergraph to sufficiently exploit the correlation of each pair of samples instead of just utilizing the correlations between the target sample and the other samples as the regression approaches do. Compared to the regressionbased graph approaches, such as the sparse graph, RH is a hypergraph model which owns a better capability and flexibility to depict the complex highorder data relations.
We employ six popular visual databases to validate our works. The experimental results demonstrate the superiority of the RH model over the conventional hypergraph models. We conclude three main contributions of our works as follows:

We provide a general idea which utilizes the regression models to construct the high quality hypergraphs. To the best of our knowledge, this paper is the first to formally and systematically build a bridge between the hypergraph model and the regression model.

We present two novel hypergraph learning frameworks based on the RH model to tackle the clustering and classification tasks respectively.

We adopt two recently influential regression models, namely Sparse Representation (SR) and Collaborative Representation (SR), to instantiate two RH instances called Hypergraph (H) and Hypergraph (H) which are experimentally proved to be more discriminative and robust than the conventional hypergraphs.
Ii Previous Works
Iia Regression Models
Regression model is a common technique for data analysis and has been successfully applied to almost all the areas in computer vision, machine learning and image processing [27, 14, 22]. Sparse Representation (SR) may be the most influential regression approach in the recent decade. SR is mainly inspired by the idea of compressed sensing [28]. In SR, a or norm constraint is introduced to the common regression model for compulsively selecting only a few of relevant measurements and ignoring the irrelevant ones by assigning their corresponding regression coefficients to zero. This endows SR with a strong discriminating power and a good robustness. However, Zhang _meaning:NTF . et al _catcode:NTF a et al. et al. [22, 24]
argued that the collaboration of samples instead of the sparsity is the essential factor that leads to such good discriminating ability and robustness. They proposed a linear regression model named Collaborative Representation (CR) via employing a relatively mild
norm constraint to replace thenorm constraint to achieve the collaboration property. Many works have shown that CR is more efficient and can get a similar or even better performance. Due to the desirable properties of SR and CR, they have achieved remarkable successes in many areas and promotes the presentations of many impressive regression approaches for addressing different computer vision, machine learning and image processing issues. For examples, Gao _meaning:NTF . et al _catcode:NTF a et al. et al. kernelized SR for face recognition and image classification
[15]. Yuan _meaning:NTF . et al _catcode:NTF a et al. et al. presented a multitask joint sparse representation model to combine the strength of multiple features and/or instances for visual classification [29]. Huang _meaning:NTF . et al _catcode:NTF a et al. et al. presented a SRbased classifier named Class Specific Sparse Representation (CSSR) which incorporated the properties of both SR and CR
[25] via defining the homogenous samples as a group and making them competition for representing the test sample. Yang _meaning:NTF . et al _catcode:NTF a et al. et al. proposed Relaxed Collaborative Representation (RCR) to effectively exploit the similarity and distinctiveness of samples [23]. Although these regression approaches have obtained promising performances in different fields, they all have an obvious drawback that they can only utilize the correlation between the testing sample and the training samples. On the contrary, the proposed Regressionbased Hypergraph (RH) model can sufficiently exploit the correlations among all samples. Another merit of RH is that there exists extensive regression approaches which can bring more flexility to the hypergraph model.IiB Sparse Graph
Since Sparse Representation (SR) is good at selecting the relevant samples for a test sample even in the noisy conditions, some researchers have attempted to use SR to construct high quality graphs for addressing different issues. In these works, such constructed graphs are often called graph or sparse graph and have achieved very promising performances. More specifically, Qiao _meaning:NTF . et al _catcode:NTF a et al. et al. and Timofte _meaning:NTF . et al _catcode:NTF a et al. et al. successively use SR to construct a sparse graph for dimensionality reduction [19, 20]. Huang _meaning:NTF . et al _catcode:NTF a et al. et al. leverage SR to measure the correlations between each two samples and then construct a sparse graph for transduction [30]. The Sparse Subspace Clustering (SSC) algorithms [21, 31, 32] learn a sparse graph for clustering via considering the data self representation problem as a SR issue. Similar to [19, 20], Cheng _meaning:NTF . et al _catcode:NTF a et al. et al. utilize SR to construct the graph (sparsegraph) for spectral clustering, subspace learning and semisupervised learning [33]. Although the applications and the learning (or construction) procedures of these works are very different, the obtained sparse graphs are very similar which all demonstrate the better discriminative abilities and robustness over the conventional graph models. The main drawback of the sparse graph models is that they cannot intuitively describe the highorder complex data relations, because these sparse graph models are essentially graph model whose edges can only depict the simple pairwise data relation. Since Regressionbased Hypergraph (RH) model is deemed as a generalization of sparse graph from the perspectives of both regression and hypergraph, it does not suffer from this issue.
IiC Hypergraph Models
As a generalization of graph, hypergraph represents the structure of data via measuring the similarity between groups of points [13, 12, 4, 34, 1, 35]
. The main difference between graph and hypergraph is that the edge of hypergraph can own more than two vertices which endows hypergraph with a high flexility for depicting the highorder relation. Benefitted by this desirable property, hypergraph models have been successfully applied into dozens of computer vision, machine learning and pattern recognition areas. In the past, the researchers were more keen to develop different hypergraph frameworks which define different theories to depict the hypergraph structure. The representative approaches include Clique Expansion
[2], Star Expansion [2], Zhou’s Normalized Laplacian [1], Clique Averaging [36], Bolla’s Laplacian [37] and so on. However, as was shown in [38], all of the previous approaches, despite their very different formulations, can be proved to be equivalent to each other under specific conditions. Currently, the researchers pay more attention on developing the algorithms for the hypergraph constructions under the aforementioned hypergraph frameworks. In hypergraph, hyperedge defines the relation of data. Therefore, the hyperedge generation is very crucial to the quality of the constructed hypergraph. Conventionally, most of hypergraph models adopt the neighbourhoodbased fashion to generate the hyperedges. For examples, Huang _meaning:NTF . et al _catcode:NTF a et al. et al. proposed a hypergraph learning framework for image retrieval, in which each image and its
nearest neighbors form the hyperedge [12]. Zhou _meaning:NTF . et al _catcode:NTF a et al. et al. also adopted such neighbourhoodbased fashion to generate the hyperedges for unsupervised and semisupervised hypergraph learning [1]. The main problems of these approaches are that they often cannot well reveal the real relation of data and are sensitive to noise. Some researchers also employed the clustering techniques to generate the hyperedges and then construct the hypergraph. As the representative approach of such category, Gao _meaning:NTF . et al _catcode:NTF a et al. et al. proposed a hypergraphbased 3D object retrieval approach via utilizing the means to cluster the views of the 3D objects and consider each cluster as a hyperedge [13]. Since the hyperedges, which are formed by the clusters, cannot share intersection vertices, these hypergraphs cannot capture the correlations of data. Another popular method is to adaptively contruct the hypergraph via imposing some meaningful constraints. As an instance of this category, Yu _meaning:NTF . et al _catcode:NTF a et al. et al. introduced a norm constraint to the hyperedge weight matrix to present a hypergraph transduction approach for image classification [4]. This method generates the hyperedges via adaptively assigning the weights to the hyperedges. However, it cannot guarantee the inexistence of the isolated vertices. Similar to the work [4], Wang _meaning:NTF . et al _catcode:NTF a et al. et al. imposed a Laplacian cost constraint and a norm constraint to the hyperedge weights for adaptively learning the hyperedge weights in a hypergraph model [5]. In its hyperedge generation procedure, the traditional neighbourhoodbased fashion is employed to define a candidate hyperedge vertex set and then SR is applied to prune noisy vertices in this set for forming the final hyperedge. Such idea is similar but also different to us. It still considers the neighbours of a sample as its relevant samples and SR here only plays a role as a noise remover. On the contrary, Regressionbased Hypergraph (RH) model thoroughly utilizes the regression model (includes SR) to generate the hyperedged. And RH is more formal and systematic to introduce how to use regression models to construct the high quality hypergraph.Iii Methodology
Iiia Regressionbased Hypergraph
In order to incorporate some desirable properties of the regression algorithms, we introduce a new hypergraph learning framework named Regressionbased Hypergraph (RH), which leverages different regression models to construct the hypergraphs. Let a dimensional matrix be the sample matrix, where is the dimension of sample and is the number of sample. The
dimensional column vector
is a sample which is also theth column of sample matrix. We apply the general formulation of regression model to estimate the correlations between each sample and the rest samples,
(1) 
where and are the regression error and the regularization term respectively. is the sample matrix which excludes the th sample . The ()dimensional column vector is the regression coefficient vector with respect to the sample . Each element of regression coefficient vector encodes the correlation between the target sample and the sample. According to the aforementioned correlation computation fashion, each pair of samples can get two correlations, _meaning:NTF . i.e _catcode:NTF a i.e. i.e., the samples and has two correlations and . Extensive literatures [18, 19, 39, 30, 40] show that such correlation between two samples is a high quality similarity measure of samples. Therefore, following the sample similarity computation fashion in [18, 19], we define the sample similarity as the mean of the correlation absolute values of each pair of samples to guarantee the nonnegativity and symmetry of the similarity. More specifically, the similarity between the sample and the sample can be mathematically denoted as follows
(2) 
The regression model cannot compute the selfcorrelations of samples. In other words, the selfsimilarity computation of a sample is still not provided. In such case, we define the selfsimilarity of a sample as the sum of the similarities between this sample and the rest samples,
. According to the obtained similarities among all the samples, it is not hard to construct a similarity matrix (or affinity matrix)
where is the th element of . Then, the normalization of the sample similarities can be done as follows(3) 
where dimensional matrix is diagonal matrix whose th diagonal element is the sum of elements in the th row of , .
After obtaining the similarities, we define a hypergraph for depicting the relations among samples where and are the collections of its vertices and hyperedges respectively. In this hypergraph, each sample is deemed as a vertex, _meaning:NTF . i.e _catcode:NTF a i.e. i.e., the sample is corresponding to the vertex . Same as the conventional hypergraph construction fashion, a sample and its top most similar samples are employed to define a length hyperedge . Therefore, for a data collection constructed by samples, we can obtain hyperedges. The weight of the hyperedge is defined as the mean similarity of samples in this hyperedge,
(4) 
where is the number of pairs of vertices in the hyperedge .
IiiB Learning with Regressionbased Hypergraph
Spectral clustering and hypergraph transduction are the most common unsupervised and supervised hypergraph learning techniques respectively. In this subsection, we apply our Regressionbased Hypergraph (RH) model to these two techniques for validating the effectiveness of our model. We develop a novel spectral clustering and hypergraph transduction framework and name them Regressionbased Hypergraph Spectral Clustering (RHSC) and Regressionbased Hypergraph Transduction (RHT) respectively. We begin by introducing some common definitions of hypergraph learning [1]. We denote the degrees of vertex and hyperedge as the sum of weights of hyperedges which are incident to the given vertex and the number of vertices in the hyperedge respectively. Mathematically, the degrees of vertex and hyperedge are respectively reformulated as and . The vertexedge incident matrix is a common tool for depicting the structure of hypergraph. each of its rows and columns are corresponding to the vertex and the hyperedge of hypergraph respectively. More specifically, for a hypergraph consisted by vertices and hyperedges, its vertexedge incident matrix is a dimensional binary matrix. If the vertex is on the hyperedge , the ()th element of is 1, otherwise, 0. Due to the hyperedge generation fashion of RH, the dimension of its vertexedge incident matrix is .
From the perspective of graph learning, the spectral clustering is actually a graph (or hypergraph) partition issue [1, 41, 42]. Then, we can consider the regression hypergraphbased spectral clustering problem as a normalize hypergraph cut issue. According to Zhou’s work [1], such issue can be solved by following optimization model,
(5) 
where the dimensional matrix is the collection of the hypergraph cuts of the given regression hypergraph . The dimensional column vector is a hypergraph cut which introduces a binary partition to the given hypergraph and its elements indicate the confidences of how the corresponding vertices belonging to a subgraph after partition. is a row of matrix which encodes the elements of hypergraph cuts corresponding to the vertex .
According to the normalized cut criterion [42], the optimal hypergraph cuts should maximize the compactness of partitioned subgraphs and minimize the compactness of the boundaries between the subgraphs simultaneously. The compactness of subgraphs and boundaries is measured by the normalized summation of the hyperedge weights of a vertex set. With several reductions, Equation 5 can be further translated into the following matrix expression,
where function returns the trace of matrix. , and are the diagonal matrix forms of , and respectively.
is the identity matrix and
is the derived normalized hypergraph Laplacian matrix which encodes the structure of the regression hypergraph . The detail deductions of this equation can be referred to the works [1, 11].The problem in Equation IIIB
is a typical eigenvalue problem. It can be easily solved by eigenvalue decomposition technique. The top
optimal hypergraph cuts are exactly the top eigenvectors corresponding the top minimal nonzero eigenvalues. Finally, the learned hypergraph cut collection is deemed as the new representation of data for clustering.Graphbased transduction is a semisupervised learning technique which is often leveraged to address the labeling and classification issues. In the semisupervised case, the labels of some of data are available. Therefore, an optimal hypergraph cut should not only minimize the loss of the geometric structure of data (the loss of data relation) but also minimize the labeling error. Conventionally, the labeling error is measured by the Euclidean distance between the labels and the hypergraph cuts, since the hypergraph cuts can be deemed as the collection of the label indicator of vertex. Thus, the original hypergraph partition model in Equation 5 can be further improved as the following regularized hypergraph partition model for considering the labeling error of data,
(7)  
where is a positive parameter to reconcile these two losses and the dimensional matrix is the collection of labels. is the label vector of the th class. Let us denote is the label of vertex . Then, we have or 1 if the vertex belonging to th class or other classes respectively, and 0 if the vertex is unlabeled. Note, here the collection of hypergraph cuts has the same size as where .
IiiC Two RH Instances: Hypergraph and Hypergraph
Sparse Representation (SR) and Collaborative Representation are two of the recent most influential regression models in computer vision and machine learning. We employ them to instantiate two Regressionbased Hypergraph (RH) instances. Since SR and CR are actually the norm and norm regularized regression models, we name these two RH instances Hypergraph (H) and Hypergraph (H) respectively. The detail information of H and H are presented in Table 1. We have also plug H and H into Regressionbased Hypergraph Transduction (RHT) and Regressionbased Hypergraph Spectral Clustering (RHSC) frameworks to produce four new semisupervised or unsupervised hypergraph learning approaches respectively named Hypergraph Transduction (HT), Hypergraph Transduction (HT), Hypergraph Spectral Clustering (HSC), and Hypergraph Spectral Clustering (HSC). The relations of these approaches are shown in Table II. In the next section, we apply these four RH instance algorithms to demonstrate the superiorities of our models over the conventional hypergraph models and verify the assumption that RH should inherit some desirable properties from the chosen regression models.
Iv Experiments
In this section, we conduct some experiments to employ the aforementioned four RHSC and RHT intances, namely HSC, HSC, HT and HT, to tackle the image clustering and classification tasks respectively. Since these four algorithms are all generated from Sparse Representation (SR) or Collaborative Representation (CR) which enjoy the robustness to noise and occlusion, we also conduct some experiments to discuss if these four algorithms have inherited such desirable property.
Iva Datasets and Compared Methods
Six image datasets, named AR [43], ORL [44], COIL20 [45], ETH80 [46], Scene15 [47] and Caltech256 [48], are leveraged for validating our works. AR face database consists of more than 4,000 color images of 126 subjects [43]. Following paper [27, 8], a subset contains 2600 images with 100 subjects are constructed in our experiment. Each subject has 26 images. The first 14 images of each subject are not involved any occlusion while the rest 12 images are involved the occlusions. In these 12 images, the faces in the first six images are occluded by the sunglasses and the faces in the other six images are occluded by the scarfs. In the general image classification and clustering experiments, only the images without any occlusion are utilized. The whole dataset is leveraged to analysis the robustness of the proposed work to disguise. The size of face image on AR database is 6043 pixels. ORL database is a face image database, which contains 400 images from 40 subjects [44]. Each subject has ten images acquired at different times. The size of face image on ORL database is 3232 pixels. The COIL20 database has 20 objects and each object has 72 images which are obtained by the rotation of the object through 360 in 5 steps (1440 images in total) [45]. The size of each image is 3232 pixels on COIL20 database. The ETH80 object database [46] contains 80 objects from 8 categories. Each object is represented by 41 views spaced evenly over the upper viewing hemisphere (3280 images in total). The original size of each image in this dataset is 128128 pixels. We resize them to 3232 pixels. Scene15 database [47] is a scene database, which has 15 classes with 100 samples per category. Following paper [1], a subset of Caltech256 database [48], which has 20 classes with 100 samples per category, is used in our experiments. We directly use grayscale as the feature on ORL, AR, COIL20 and ETH80 databases. PiCoDes [49] is adoptted to represent the images on Scene15 and Caltech256 databases, since they are more challenging. The dimension of PiCoDes feature is 2048. Figure 2 shows some samples of these six image databases.
Nonnegative Matrix Factorization (NMF) [50], Graph regularized Nonnegative Matrix Factorization (GNMF) [51], Normalized Cut (NCut) [42], Normalized Hypergraph Spectral Clustering (NHSC) [1], Largescale Spectral Clustering (LSC) [52], Clique Expansionbased Hypergraph Spectral Clustering using Matrix Trace Weights (CEHSC+Trace) [3], Normalized Hypergraph Spectral Clustering using the mean of distances between the centroid and the vertices in a hyperedge (NHSC+Cent) [3], and Graph Spectral Clustering (GSC) [18] are chosen as the compared approaches in the image clustering experiments. NCut can be deemed as a sort of regular graph spectral clustering algorithm. So there are three graph spectral clustering algorithms. They are Ncut, LSC and GSC. NHSC, CEHSC+Trace and NHSC+Cent are three hypergraph spectral clustering methods. The only difference between NHSC and NHSC+Cent is the weights of hyperedges. NHSC uses the mean of distances between each two samples in a hyperedge while NHSC+Cent uses the mean of distances between the hyperedge centroid and the samples in a hyperedge. NHSC+Cent and CEHSC+Trace are referenced from our recent work [3] which empirically studied the effect of hyperedge weighting scheme to hypergraph learning and select the most optimal weighting schemes for different hypergraph frameworks. NHSC+Cent and CEHSC+Trace are the best hyperedge weighting scheme and hypergraph spectral clustering combinations for addressing clustering issue reported in [3].
We employ Sparse Representationbased Classifier (SRC) [14], Collaborative Representationbased Classifier (CRC) [22]
, LIBSupport Vector Machine (LIBSVM)
[53], Normalized Hypergraph Transduction (NHT) [1], Graph Transduction (GT) [54], Adaptive Hypergraphbased Classifier (AHC) [4], Normalized Hypergraph Transduction using Matrix Trace Weights (NHT+Trace) [3], Normalized Hypergraph Transduction using the Volume of Simplex Weights (NHT+Volume) [3], and Sparse Graphbased Classifier (SGC) [17] as the compared approaches for image classification. GT and SGC are the graph transduction algorithms while AHC, NHT, CEHT+Trace and NHT+Volume are the hypergraph transduction algorithms. CEHT+Trace and NHT+Volume are the best hyperedge weighting scheme and hypergraph transduction combinations for addressing classification issue reported in [3]. In the experiments, all the compared methods are well tuned.Methods  Clustering Accuracy  Normalized Mutual Information (NMI)  

AR  ORL  COIL20  ETH80  Scene15  Caltech256  AR  ORL  COIL20  ETH80  Scene15  Caltech256  
NMF [50]  24.29  51.25  60.59  46.86  59.33  38.25  56.12  70.31  70.89  41.13  58.41  38.89 
GRNMF [51]  28.86  65.75  82.22  52.16  62.87  39.80  60.52  82.19  89.99  46.93  61.32  38.45 
NCut [42]  61.79  67.75  69.60  45.61  56.87  38.00  80.71  82.01  77.00  38.02  60.21  38.69 
NHSC [1]  36.71  66.75  76.60  50.85  58.87  36.40  64.34  81.57  85.26  47.76  58.46  37.32 
LSC [52]  35.86  66.00  76.04  55.55  66.33  43.95  65.43  82.39  86.13  56.22  64.01  41.32 
CEHSC+Trace [3]  35.50  70.50  82.29  51.89  67.47  36.30  64.00  82.75  89.12  46.83  62.03  34.01 
NHSC+Cent [3]  36.71  69.50  76.60  46.86  63.33  44.15  64.36  81.62  85.26  46.72  59.53  43.29 
GSC [18]  59.93  69.50  68.13  51.77  56.27  34.30  78.77  82.50  77.76  50.00  56.74  37.86 
HSC  70.21  78.25  82.85  56.34  67.60  42.15  86.71  87.21  90.99  58.72  64.40  41.88 
HSC  73.43  79.50  82.15  53.38  69.20  44.95  86.73  87.22  89.26  53.97  65.80  43.60 

IvB Image Clustering
We conduct the image clustering experiments on all six image databases. For each database, the cluster number is fixed to its category number. Following [51, 55]
, Clustering Accuracy and Normalized Mutual Information (NMI) are leveraged as two evaluation metrics.
Table III reports the clustering performances of different clustering methods. The bold number indicates the best accuracy in a dataset under same metric. We can find that HSC and HSC outperform almost all compared approaches in all experiments and achieve remarkable improvements over the conventional hypergraph spectral clustering algorithms. For examples, the clustering accuracy gains of HSC over NHSC are 33.50%, 11.50%, 6.25%, 5.49%, 9.73% and 5.75% on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets respectively. Similarly, such gains of HSC are 36.72%, 11.75%, 5.55%, 10.33% and 8.55%. The experimental results also show that, as the generalization of sparse graph spectral clustering, HSC and HSC obtain the better performances over GSC which is known as a sparse graph spectral clustering algorithm. More specifically, The NMI gains of HSC over GSC on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets are 7.94%, 4.71%, 13.23%, 8.72%, 7.66% and 4.02% respectively. Similarly, the NMI improvements of HSC over GSC are 7.96%, 4.72%, 11.50%, 3.97%, 9.06% and 5.74% on AR, ORL, COIL20, ETH80, Scene15 and Caltech256 datasets. Another interesting phenomenon can be observed from the experimental results is that HSC performs much better than HSC comprehensively. We attribute this to the fact that Collaborative Representation (CR) is often more discriminative than Sparse Representation (SR) [22, 23].
Methods  Classification Errors (Mean  

AR  COIL20  ETH80  Scene15  
CRC [22]  25.071.31  11.191.10  20.346.34  26.731.43 
SRC [14]  19.572.22  11.810.98  29.708.80  26.802.83 
LIBSVM [53]  26.500.10  12.501.18  30.825.99  25.402.17 
NHT [1]  32.001.41  4.860.00  27.59  26.801.89 
GT [54]  29.570.81  43.130.49  27.320.95  32.402.07 
AHC [4]  33.141.21  10.131.41  27.132.85  25.541.59 
NHT+Trace [3]  32.211.31  3.680.69  26.104.31  25.471.89 
NHT+Volume [3]  32.291.21  4.790.29  25.824.61  25.401.98 
SGC [17]  16.791.91  9.441.18  28.386.25  26.130.42 
HT  10.501.91  2.150.49  17.441.98  26.133.96 
HT  4.790.10  4.514.42  17.320.43  25.272.92 
IvC Image Classification
We employ AR, COIL20, ETH80 and Scene15 databases for evaluating image classification performances of different classifiers. The twofold crossvalidation scheme is applied in the image classification experiments.
Table IV reports the classification errors of different classifiers on different databases. Similar to the experimental results of image clustering, the proposed approaches, HT and HT achieve very promising classification performance and consistently outperform the other hypergraph transduction approaches. Particularly, It is worthwhile to point out that HT ranks first on three databases among all four databases. More specifically, the classification accuracy gains of HT over the other hypergraph transduction approaches, namely NHT, AHC, NHT+Trace and NHT+Volume on AR database are 27.21%, 28.35%, 27.42% and 27.50% respectively. Such numbers of HT on ETH80 database are 10.15%, 9.69%, 8.66% and 8.50% respectively. From the observations, it is not hard to find that HT often performs much better than the SGC which can be deemed as the pairwise version of HT. We believe that such phenomenon well verifies the better representational power of hypergraph over graph.
IvD Robustness Analysis
It is well known that Sparse Representation (SR) and Collaborative Representation (CR) are robust to the disguise and noise. Since Hypergraph (H) and Hypergraph (H) are generated by SR and CR, here we conduct several experiments to see if H and H also enjoy such desirable properties.
IvD1 Robust to Noise
In order to evaluate the robustness of our works to noise, we construct five noisy versions of COIL dataset via randomly assigning zero or 255 to a certain proportion of pixels in the image. In these noisy COIL20 databases, we use the Noise Level to measure the noise degree of data where the Noise Level is defined as the ratio of the number of noisy dimensions to the number of total dimensions. The Noise Levels of these five noisy versions of COIL20 database are 0.1, 0.2, 0.3, 0.4 and 0.5 respectively (see the examples in Figure 3). We follow the same fashions as mentioned in the previous sections to conduct the image classification and clustering experiments in these noisy image databases.
We plot the experimental results of different approaches under different Noise Levels in Figure 4. Figures 4(a) and 4(b) report the clustering accuracy and NMI respectively. From these observations, we can see that the clustering performances of all approaches are decreased along with the increasing of Noise Levels. However, compared with the conventional hypergraph methods, such as NHSC and CEHSC, GSC, HSC and HSC perform much better and their clustering performances are apparently decreased much slower along with the increasing of Noise Level. Clearly, such phenomenon shows that HSC and HSC are more robust to noise in comparison with the conventional hypergraph spectral clustering models. Figure 4(c) shows the classification errors of different approaches under different Noise Levels. The observations of image classification experiments are quite different to the ones of the aforementioned image clustering experiments. The performances of all approaches are very similar, and only HT slightly outperforms the others. We mainly attribute this phenomenon to the fact that the training samples all contain noise and the supervision from category labels compulsively introduces the noise samples to the hypergraph learning models. Therefore, the contributions of the robust sample selection procedures from Collaborative Representation (CR) and Sparse Representation (SR) are weakened. Comprehensively speaking, H and H enjoy a certain amount of robustness to noise particularly in the unsupervised way.
Metric  Methods  

NHSC  NHSC+Cent  CEHSC+Trace  HSC  HSC  
AC  20.32  20.38  19.06  48.00  68.19 
NMI  46.11  45.99  46.60  67.81  80.42 
IvD2 Robust to Disguise
We employ AR database to evaluate the image clustering and classification performances under the disguise case, since AR database have already provided the face images with natural disguise (sunglasses and scarfs) for each subject. In the image clustering experiments, all the face images are applied. In the image classification case, we use the face images without any occlusion for training while use the face images with occlusion for testing.
Table V lists the clustering performances of our works and three other conventional hypergraph approaches. In these experiments, HSC and HSC show the significant advantages. For examples, the clustering accuracy gains of HSC over NHSC, NHSC+Cent and CEHSC+Trace are 27.68%, 27.62% and 29.00% respectively. HSC performs even better. The clustering accuracy improvements of HSC over these conventional hypergraph approaches are 47.87%, 47.81% and 49.13% respectively. Table VI tabulates the classification errors of different approaches in the disguise case. From the observations, HT and HT obtain very promising performance while all the other traditional hypergraph transduction approaches are totally failed. The classification error of HT is 23 times lower than the ones of NHT, AHC, NHT+Trace and NHT+Volume. Clearly, the experimental results demonstrate that HT and HT are robust to disguise.
Database  Candidates  The optimal  

HSC  HSC  
ORL  {2:11}  
AR  {2:14}  
COIL20  {3,5,10,20,36,54,72}  
ETH80  {3,5,10,41:41:410}  
Scene15  {5:5:50,60:10:100}  
Caltech256  {5:5:50,60:10:100} 
Database  Candidates  The optimal  

HT  HT  
AR  {2:14}  
COIL20  {3,5,10,20,36,54,72}  
ETH80  {3,5,10,41:41:410}  
Scene15  {5:5:50,60:10:100} 
IvE Parameter Settings
In this section, we discuss the influences of different parameters to our works. The RHSC methods, such as HSC and HSC, mainly involve two parameters, and , where is used for controlling the regression regularization term in a regression model while is the hyperedge length. The RHT methods, such as HT and HT, have one more parameter, , which is used for balancing the labeling error of data and the loss of hypergraph partition. Although the hyperedge length is an important parameter which can deeply influence the quality of hypergraph, it has numerous possible values and it is impossible for us to evaluate each possible value. So, in our experiments, we define a hyperedge length candidate collection and then conduct experiments to empirically find the optimal value of hyperedge length. Tables VII and VIII respectively list the hyperedge length candidate collections of the proposed RH works and report their optimal hyperedge lengths. Figure 5 demonstrates the influences of to the clustering performances of HSC and HSC. We choose the mean of clustering accuracy and NMI as the comprehensive clustering performance evaluation metrics. From the observations in Figure 5, we can find that the performance of HSC is quite insensitive to and the optimal of HSC on ORL, AR, COIL20, ETH80, Scene15 and Caltech256 databases, are , , , and respectively. With regard to HSC, a higher value of is much better on Scene15, Caltech256, COIL20 and ETH80 databases while a medium is more suitable to the AR and ORL databases. Figure 6 shows the impacts of and to the classification performances of HT and HT. The phenomena as similar as the ones in Figure 5 are observed. HSC is not very sensitive to and the best for AR, COIL20, ETH80 and Scene15 databases are , , and respectively. A medium works well for HT on AR and COIL20 databases while HT with a higher shows the better performances on Scene15 and ETH80 databases. Moreover, from the observations in Figures 6(c) and 6(d), both HT and HT can achieve the best classification performances when is larger than 1.
V Conclusion
In this paper, we presented a new solution for hypergraph construction in which the regression models are used for measuring the closeness among samples. We named this new hypergraph framework Regressionbased Hypergraph (RH). Based on two conventional hypergraph learning models, namely Hypergraph Spectral Clustering (HSC) and Hypergraph Transduction (HT), We also developed the Regressionbased Hypergraph Spectral Clustering (RHSC) and Regressionbased Hypergraph Transduction (RHT) models for addressing the clustering and classification issues. As two influential regression approaches for visual learning, Sparse Representation (SR) and Collaborative Representation (CR) are employed to instantiate two instances of RH and their RHSC and RHT algorithms. Six popular image databases are leveraged for validating the effectiveness of our works. It can be concluded from observations of the experiments that RH inherits the desirable properties from both hypergraph and regression model.
There still exist many interesting future works based on our models, since RH is a general framework for hypergraph construction in which researchers can flexibly choose their own appropriate regression models to construct RHs for tackling different tasks. For examples, RH can be further applied to hypergraphbased subspace learning [8]
[56], multilabel learning [6, 57] or attribute learning [11]. Moreover, to investigate the adaptive and efficient RH construction fashion is also a very meaningful direction for improving the utility and efficiency [4, 58].Acknowledgement
References
 [1] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: Clustering, classification, and embedding,” in Advances in neural information processing systems (NIPS), 2006, pp. 1601–1608.
 [2] J. Y. Zien, M. D. Schlag, and P. K. Chan, “Multilevel spectral hypergraph partitioning with arbitrary vertex sizes,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 18, no. 9, pp. 1389–1399, 1999.
 [3] S. Huang, A. Elgammal, and D. Yang, “On the effect of hyperedge weights on hypergraph learning,” arXiv preprint arXiv:1410.6736, 2014.
 [4] J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning and its application in image classification,” IEEE Transactions on Image Processing, vol. 21, no. 7, pp. 3262–3272, 2012.
 [5] M. Wang, X. Wu et al., “Visual classification by l1hypergraph modeling,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 9, pp. 2564–2574, 2015.
 [6] G. Chen, J. Zhang, F. Wang, C. Zhang, and Y. Gao, “Efficient multilabel classification with hypergraph regularization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 1658–1665.
 [7] P. Ochs and T. Brox, “Higher order motion models and spectral clustering,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 614–621.
 [8] S. Huang, D. Yang, Y. Ge, D. Zhao, and X. Feng, “Discriminant hyperlaplacian projections with its applications to face recognition,” in IEEE conference on Multimedia and Expo Workshop on HIM (ICMEW), 2014, pp. 1–6.
 [9] S. Huang, D. Yang, J. Zhou, and X. Zhang, “Graph regularized linear discriminant analysis and its generalization,” Pattern Analysis and Applications, vol. 18, no. 3, pp. 639–650, 2015.
 [10] L. Sun, S. Ji, and J. Ye, “Hypergraph spectral learning for multilabel classification,” in ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.
 [11] S. Huang, M. Elhoseiny, A. Elgammal, and D. Yang, “Learning hypergraphregularized attribute predictors,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 409–417.
 [12] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas, “Image retrieval via probabilistic hypergraph ranking,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3376–3383.
 [13] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3D object retrieval and recognition with hypergraph analysis,” IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 4290–4303, 2012.
 [14] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
 [15] S. Gao, I. W.H. Tsang, and L.T. Chia, “Kernel sparse representation for image classification and face recognition,” in European Conference on Computer Vision (ECCV), 2010, pp. 1–14.
 [16] C.Y. Lu, H. Min, J. Gui, L. Zhu, and Y.K. Lei, “Face recognition via weighted sparse representation,” Journal of Visual Communication and Image Representation, vol. 24, no. 2, pp. 111–116, 2013.
 [17] S. Huang, D. Yang, J. Zhou, and L. Huangfu, “Sparse graphbased transduction for image classification,” Journal of Electronic Imaging, vol. 24, no. 2, p. 023007, 2015.
 [18] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with L1graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, 2010.
 [19] R. Timofte and L. Van Gool, “Sparse representation based projections,” in British machine vision conference (BMVC), 2011, pp. 61–1.
 [20] L. Qiao, S. Chen, and X. Tan, “Sparsity preserving projections with applications to face recognition,” Pattern Recognition, vol. 43, no. 1, pp. 331–341, 2010.
 [21] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2790–2797.
 [22] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?” in International Conference on Computer Vision (ICCV), 2011, pp. 471–478.
 [23] M. Yang, L. Zhang, D. Zhang, and S. Wang, “Relaxed collaborative representation for pattern classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2224–2231.
 [24] R. Timofte and L. Van Gool, “Weighted collaborative representation and classification of images,” in International Conference on Pattern Recognition (ICPR), 2012, pp. 1606–1610.
 [25] S. Huang, Y. Yang, D. Yang, L. Huangfu, and X. Zhang, “Class specific sparse representation for classification,” Signal Processing, vol. 116, pp. 38–42, 2015.
 [26] P. J. Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The feret database and evaluation procedure for facerecognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, 1998.
 [27] I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106–2112, 2010.
 [28] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [29] X.T. Yuan, X. Liu, and S. Yan, “Visual classification with multitask joint sparse representation,” IEEE Transactions on Image Processing, vol. 21, no. 10, pp. 4349–4360, 2012.
 [30] S. Huang, D. Yang, J. Zhou, L. Huangfu, and X. Zhang, “Sparse graphbased transduction for image classification,” Journal of Electronic Imaging, vol. 24, no. 2, p. 023007, 2014.
 [31] X. Peng, L. Zhang, and Z. Yi, “Scalable sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 430–437.
 [32] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by lowrank representation,” in International Conference on Machine Learning (ICML), 2010, pp. 663–670.
 [33] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with L1graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, 2010.
 [34] L. Pu and B. Faltings, “Hypergraph learning with hyperedge expansion,” in Machine Learning and Knowledge Discovery in Databases, 2012, pp. 410–425.
 [35] A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios, “Simultaneous cast shadows, illumination and geometry inference using hypergraphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 437–449, 2013.
 [36] S. Agarwal, J. Lim, L. ZelnikManor, P. Perona, D. Kriegman, and S. Belongie, “Beyond pairwise clustering,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp. 838–845.
 [37] M. Bolla, “Spectra, euclidean representations and clusterings of hypergraphs,” Discrete Mathematics, vol. 117, 1993.
 [38] S. Agarwal, K. Branson, and S. Belongie, “Higher order learning with graphs,” in International Conference on Machine Learning (ICML), 2006, pp. 17–24.
 [39] S. Yan and H. Wang, “Semisupervised learning by sparse representation.” in SIAM International Conference on Data Mining (SDM). SIAM, 2009, pp. 792–801.
 [40] L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, and N. Yu, “Nonnegative low rank and sparse graph for semisupervised learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012, pp. 2328–2335.

[41]
A. Y. Ng, M. I. Jordan, Y. Weiss et al.
, “On spectral clustering: Analysis and an algorithm,”
Advances in neural information processing systems (NIPS), vol. 2, pp. 849–856, 2002.  [42] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
 [43] A. Martínez and R. Benavente, “The ar face database,” Bellatera, Jun 1998.
 [44] F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of the Second IEEE Workshop on Applications of Computer Vision, 1994, pp. 138–142.
 [45] S. A. Nene, S. K. Nayar, H. Murase et al., “Columbia object image library (COIL20),” Technical Report CUCS00596, Tech. Rep., 1996.
 [46] B. Leibe and B. Schiele, “Analyzing appearance and contour based methods for object categorization,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2003, pp. II–409.
 [47] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2006, pp. 2169–2178.
 [48] G. Griffin, A. Holub, and P. Perona, “Caltech256 object category dataset,” 2007.
 [49] A. Bergamo, L. Torresani, and A. W. Fitzgibbon, “Picodes: Learning a compact code for novelcategory recognition,” in Advances in Neural Information Processing Systems (NIPS), 2011, pp. 2088–2096.
 [50] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
 [51] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548–1560, 2011.

[52]
X. Chen and D. Cai, “Large scale spectral clustering with landmarkbased
representation.” in
AAAI Conference on Artificial Intelligence (AAAI)
, 2011.  [53] C.C. Chang and C.J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011.
 [54] O. Duchenne, J.Y. Audibert, R. Keriven, J. Ponce, and F. Ségonne, “Segmentation by transduction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
 [55] D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 12, pp. 1624–1637, 2005.
 [56] Z. Zhang, P. Ren, and E. R. Hancock, “Unsupervised feature selection via hypergraph embedding.” in Birtish Machine Vision Conference (BMVC), 2012, pp. 1–11.
 [57] L. Sun, S. Ji, and J. Ye, “Hypergraph spectral learning for multilabel classification,” in ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.
 [58] W. Liu, J. He, and S.F. Chang, “Large graph construction for scalable semisupervised learning,” in International conference on machine learning (ICML), 2010, pp. 679–686.
Comments
There are no comments yet.