Introduction
Learning effective representations for high dimensional data plays an essential role in machine learning tasks for various types of data, such as text, image, speech and video data. Among these techniques, sparse coding is attracting more and more attention due to its comprehensive theoretical studies and its excellent performance in many signal processing applications, especially computer vision tasks, including supervised image classification
[Wright et al.2009], semisupervised image classification [Lu et al.2015], image clustering [Yang et al.2014, Feng, Wu, and Zhou2017], image restoration [Dong et al.2015], etc. Usually, in these areas, the signals studied are image feature or instance, and sparse coding is a powerful tool for feature/image reconstruction[Yang et al.2009, Feng, Wu, and Zhou2017] and similarity measurement [Yan and Wang2009, Yang et al.2014]. Besides, it has also been successfully applied in text mining [Li et al.2015], speech signal processing [Zhen et al.2016], recommendation [Feng et al.2016], etc.Basically, given a data matrix , consisting of data points measured on dimensions,
is a vector representing the
th data point. Sparse Coding (SC) [Lee et al.2006] aims at extracting a dictionary (also called a base) that consists of a set of atom items (called a basis), and converts data instances into sparse coordinates with respect to the dictionary. In other words, it decomposes the data matrix into a dictionary matrix and a sparse code matrix , so that each data sample can be represented as a sparse linear combination of atom items (i.e., ) in the dictionary with corresponding sparse coefficient vector . Formally, the sparse coding problem can be described as the following optimization problem:(1)  
s.t. 
where denotes the matrix Frobenius norm (i.e., , ), denotes the norm of a given vector (i.e., , ), and is the sparsity regularization parameter that represents the tradeoff between reconstruction error and sparsity.
Naturally, the optimization of sparse coding in (1) is not jointly convex for both and , but it is convex in (while holding fixed) and convex in (while holding fixed) separately. Thus, it is usually solved iteratively by alternatingly minimizing over and , while holding the other fixed. In this way, when is fixed, (1) becomes an regularized nonsmooth convex optimization problem and many efforts have been done for solving this type of problem [Yang et al.2010]. When is fixed, it is a welldefined constrained least squares problem, which can be solved efficiently by existing optimization methods [Lee et al.2006].
Though we can optimize SC by an alternative method, there is a big limitation caused by the nonconvexity of the objective in (1
). This deficiency will drive the current SC optimization approach get stuck into bad local minima, especially when many noises and outliers exist in the datasets. To relieve the affect of nonconvexity, a heuristic approach is to run the algorithm multiple times with different initializations or learning rates, and pick the best results based on crossvalidation. However, this strategy is ad hoc and generally inconvenient to implement in unsupervised setting, since there is no straightforward criterion for choosing a proper solution. Thus, for robust SC optimization, our goal is to alleviate the influence of the noisy and confusing data, as the confusing data usually correspond to the highly nonlinear local patterns which is hardly learnable for the model space, and the noisy ones are the outliers that should not be learned, such as noisy pixels in images, or noisy words in texts. Typically, this learning robustness will be achieved by a well sample selection to distinguish the reliable samples from the confusing ones. Fortunately, a recently studied learning regime, named
 (SPL) [Kumar, Packer, and Koller2010] is making effort for this issue. The core idea of SPL is to learn the model with easier samples first and then gradually process more complex ones, which is simulating the learning principle of humans/animals. This learning mechanism is empirically demonstrated to be beneficial for some learning tasks [Xu, Tao, and Xu2015, Zhao et al.2015, Li et al.2017b], where it can in avoid bad local minima and achieve a better generalization result. Therefore, incorporating it into SC is expected to alleviate the local minimum issue of alternative learning.To enhance the learning robustness, in this paper, we propose a unified framework named  (SPSC). With an adaptive pace from easy to hard in optimization, SPSC can dynamically introduce the data values into learning from easy ones to complex ones. Specifically, for different levels of dynamic selection, we extends SPSC into three variants, that is  SPSC,  SPSC and  SPSC. We also present a simple yet effective algorithm framework for solving the proposed SPSC problems. Experimental results on the realworld clean datasets and those with corruption demonstrate the effectiveness of the proposed approach, compared to the stateofthearts.
Related Work
In this section, we will review related works on sparse coding and selfpaced learning, as our work  lies in cross road of these two fields.
Sparse Coding
Sparse coding consists of two components, that is, the optimization of the corresponding sparse codes (as dictionary is known) and the learning of optimal dictionary (as sparse code is given). For sparse code optimization, when the sparsity of codes is measured as norm, it can be solved by methods such as matching pursuit, orthogonal matching pursuit, basis pursuit, etc; and multiple approaches are presented for norm ionization, such as coordinate descent [Wu and Lange2008], interiorpoint method [Kim et al.2007], active set method [Lee et al.2007], and augmented lagrange multiplier method [Lin, Chen, and Ma2010]
The dictionary is usually automatically learned from data rather than by a predetermined dictionary. Typically, sparse coding is suitable to reconstruct data, and consequently variants of traditional sparse coding are presented for different purposes: (i) Structure regularized sparse coding. Structured sparse coding methods exploit different structure priors among the data. [Sun et al.2014] incorporate additional structured sparsity to the sparse coding objective functions, which leads to the promising performances on the applications. Graph/Hypergrpah regularized sparse coding [Zheng et al.2011, Gao, Tsang, and Chia2013, Feng, Wu, and Zhou2017] are proposed to preserve the intrinsic manifold structure among high dimensional data objects, and impose Laplacian regularization as a manifold regularization on sparse coefficients. (ii) Supervised/Semisupervised sparse coding. In this group, label information of a training dataset is leveraged to learn a discriminative dictionary for better classification performances. Approaches in [M. Yang and Feng2011] attempt to learn multiple dictionaries or categoryspecific dictionaries to promote discrimination between classes. Others in [Z. Jiang and Davis2012] learn a dictionary shared by all classes by merging or selecting atom items from an initially large dictionary or incorporates discriminative terms into the objective function during training [Zhang and Li2010, Z. Jiang and Davis2013, Zheng and Tao2015]. (iii) Mulimodal sparse coding. In this group, traditional sparse coding approaches can be extended to benefit from the information encoded in multiple sources for crossdomain sparse representation [Jing et al.2014, Li et al.2017a].
Most of the current SC models are developed on nonconvex optimization problems, and thus always encounter the local minimum problem, especially when there are noisy and outlier data. We thus aim to alleviate this issue by advancing it into the SPL framework.
SelfPaced Learning
Inspired by the intrinsic cognitive principle of humans/animals, Bengio et al. Ref:bengio2009curriculum initialized a new learning paradigm called (CL), the core idea of which is to learn a model by gradually including samples into training from easy to complex for training nonconvex objectives. This learning process is beneficial for alleviating local solation in nonconvex optimization, as supported by empirical evaluation [Basu and Christensen2013]. However, it requires a priori identification to identify the easy and the hard samples in advance, which is difficult in realworld applications. To overcome the limitation due to the heuristic sample easiness ranking as in CL, Kumar et al. Ref:Kumar2010Self proposed a concise model, called  (SPL) to introduce a regularizor into the learning objective. SPL can automatically optimize an appropriate curriculum by the model itself with a weighted loss term on all samples and a general SPL regularizer imposed on sample weights. Jiang et al. Ref:jiang2015self presented a more comprehensive model learning underlying SPL/CL, as a general optimization problem as
s.t.  (2) 
where a training set ;
denotes the loss function which calculates the cost between the true label
and estimated label
under a model with as its model parameter to learn; denotes the weight variables reflecting the samples confidence; corresponds to a selfpaced regularizer function for the sample selection scheme; is a feasible region that encodes the information of a predetermined curriculum; is a parameter to control the learning pace, also referred as ”pace age”. For the selfpaced regularizer function , Jiang et al. Ref:jiang2015self abstracted three necessary condition should be satisfied.By sequentially optimizing the model with gradually increasing pace parameter on the SPL regularizer, more samples can be automatically discovered in a pure selfpaced way. The number of samples selected at each iteration is determined by a weight that is gradually annealed such that later iterations introduce more samples. The algorithm converges when all samples have been considered and the objective function cannot be improved further. By virtue of its generality, SPL have been considered in a broad spectrum of latent variable models, such as matrix factorization [Zhao et al.2015], clustering [Xu, Tao, and Xu2015], multitask learning [Li et al.2017b], dictionary learning [Tang, Yang, and Gao2012, Xu et al.2016]. Opposite to existed selfpaced dictionary learning, in this paper we try to utilize the generalized SPL sheila and implement a soft sample selection accordingly rather than a heuristic hard strategy sample selection in [Tang, Yang, and Gao2012]. Also, in [Xu et al.2016]
, it only address the issue of bad local minimum of nonconvex sparse coding optimization by assigning weights to different samples of training data. However, due to the complexity and noisy nature of real data, the confidence levels of features of the dataset may also vary (eg., features extracted with different approaches for a set of images should have different confidence level), and the confidence levels of the specific values in the same sample also vary (eg., in image denoising task, noisy pixels should have lower confidence level comparing to the uncorrupted ones in the same image). Thus, considering a unified selfpaced learning for sparse coding may be the further direction.
SelfPaced Sparse Coding
The Formulation of SPSC
Generally, the objective of SC is to learn a dictionary and representation from such that it can well reconstruct the data and can follow some prior knowledge as in (1). Inspired by the fact that humans often learn concepts from the easiest to the hardest as in SL and SPL, we incorporate the easytohard strategy operated on the samples into the learning process of SC, as  (SPSC). Thus, following the framework of SPL in (SelfPaced Learning), the new learning objective function of SPSC can be formulated as:
s.t.  (3) 
where denotes the weight vector imposed on samples in ; let ’’ denote elementwise square root of a matrix; ‘’ denotes the diagonal matrix with diagonal entries as those in the vector, that is,
It is noted that, different from the original SPL learning framework as in (SelfPaced Learning
), as sparse coding is an originally unsupervised learning, the loss function
corresponds to , which measures the reconstruction error between and estimated after learning.SPL Regularizer
For regularizer , the original SPL [Kumar, Packer, and Koller2010] adopted negative norm as
Under this regularizer, with fixed , the global optimal is calculated by the optimization as
(4) 
Then, the solution can be written as a hard threshold schema:
(5) 
Instead of hard weighting, in our experiments, we utilize the linear soft weighting regularizer [Jiang et al.2015] due to its relatively easy implementation and well adaptability to complex scenarios. This regularizer penalizes the sample weights linearly in terms of the loss. Specifically, we have
(6) 
(6) is convex with respect to , and thus the global minimum can be obtained with analytical solution for the linear soft weighting, as,
(7) 
It is seen that if an sample can be well reconstructed (), it will be selected as an easy sample () and the weight is decreasing with respect to the reconstruction error, or otherwise unselected (). The parameter controls the pace at which the model learns new samples, and physically it can be interpreted as the ”age” of the model. When is smaller, more samples will be unselected as their corresponding losses are more likely to be greater than . As becomes larger, more samples with larger losses will be gradually considered.
Regularization on Sparse Codes
Prior relational knowledge among the samples is very useful when learning new representations. To embed this knowledge into sparse coding, we will adopt the newly proposed manifold regularizer, called hypergraph consistency regularization (HC) [Feng, Wu, and Zhou2017], which is empirically shown to perform better in sparse coding. Specifically, it tries to reconstruct the hyperedge incidence matrix using sparse codes in addition to the Laplcaican regulation on as
(8) 
where, is the incidence matrix of hypergraph which consists of a set of vertices , and a set of hyperedges . The hypergraph can be constructed using the original data of to record locally geometrical structure of the original data; , if ; , otherwise. More generally, can also be defined in a probabilistic way [Huang et al.2010], such that . denotes the Laplacian matrix defined as (
is the identity matrix);
is the weight matrix of a hyergraph to measure how close each pair of two vertices in the manifold space are, which can be defined as follows (with normalization): where and respectively denote the diagonal matrices of hyperedge degrees and vertex degrees, and denotes a diagonal matrix of the hyperedge weights. By minimizing , the incidence matrix can be well reconstructed and the consistency of the hypergraph structure on sparse code space is guaranteed.SPL Selection on Different Levels
It is seen that SPSC in (The Formulation of SPSC) is implementing samplelevel selection during the learning process, where the sample can be an image instance, a text, or a user in various tasks. Accordingly, we note the SPSC in (The Formulation of SPSC) as  SPSC (SPSC). Usually, a sample is represented by a multidimensional vector with each element as a partial description of . As in in (The Formulation of SPSC), each element is treated the same, that is, equal confidence for model learning. However, due to the complexity and noisy nature of real data, the confidence levels of the specific element in the same sample also vary (eg., in image denoising task, noisy pixels should have lower confidence level comparing to the uncorrupted ones in the same image). Also the confidence levels of features could be also different (eg., features extracted with different approaches for a set of images should corresponds to different confidence level), We then show how to incorporate the easytohard strategy operated on elements and features into the learning process of SC.
 Spsc
When operating SPSC on elementlevel learning, the loss for each element and could be defined as . Then the learning objective of  SPSC (SPSC) becomes the following formulation:
s.t.  (9) 
Where represents the matrix composing of which denotes the weights imposed on the observed elements of ; ‘’ denotes elementwise multiplication, that is,
Here denotes the th row vector of dictionary . Thus, is the selfpaced regularizer determining the elements to be selected in learning. To defines , the SPL regularizer in (6) can be easily generalized as
(10) 
Accordingly, the optimal solution of can be obtained by
(11) 
 Spsc
When operating SPSC on featurelevel learning, the learning objective of  SPSC (SPSC) is as the following optimization:
s.t.  (12) 
where denotes the weight vector imposed on different features in ; the loss of each feature corresponds to , which measures the reconstruction error of each feature and represents the th row vector in , that is, a feature value across all samples. The SPL regularizer and optimal weight canaliculation in (6) and (7) can be easily adopted to SPSC, as the formulations in ( SPSC) and (The Formulation of SPSC) are mathematically equivalent to each other.
Optimization of SPSC
Optimization Algorithm
Comparing the proposed three formulations of SPSC in (The Formulation of SPSC), ( SPSC) and ( SPSC), SPSC in ( SPSC) can be seen as a general form of these three. Thus, in the following we will show how to optimize ( SPSC). The optimization problem is not jointly convex, so we will use alternative search strategy (ASS) to solve it, suggested in [Kumar, Packer, and Koller2010]. Following ASS, we can alternatively optimizes while keeping the other set of variables fixed.
(i) Solve for : With fixed and , optimization of selection weight matrix can be solved under (11).
(ii) Solve for : With fixed codes and weight matrix , the optimization problem for dictionary writes:
(13) 
This problem is a Quadratically Constrained Quadratic Program (QCQP) which can be solved using Lagrangian dual [Lee et al.2006].
(iii) Solve for : With fixed dictionary and weight matrix , combing ( SPSC) and (8), the optimization problem for codes corresponds to the weighted SC:
(14) 
This problem is a regularized convex optimization and offtheshelf algorithms can be employed for solving it, such as featuresign algorithm in [Gao, Tsang, and Chia2013]. The whole process is summarized in Algorithm 1.
SelfPaced Sparse Coding (SPSC) 
Input : Data matrix and a hypergraph with and as its corresponding incidence matrix and weight matrix; nonnegative tradeoff parameters and stepsize . 1: Initializing: solve the SC problem with all elements selected () to obtain optimal and ; calculate the loss of all elements; initialize selfpaced parameters ; 2: 3: Update via (11). 4: Update via Solving (13). 5: Update via Solving (14). 6: Compute current loss . 7: , . /* updating the learning pace */ 8: convergence. 9: return , 
The algorithm converges when all samples have been considered and the objective function cannot be improved further. It is seen from the algorithm that the number of samples selected at each iteration is determined by a weight that will be gradually annealed such that later iterations introduce more elements. For optimization of SPSC or SPSC, in Step 6, it can be easily solved by respectively calculating  loss or  loss defined above. For weight update in in Step 3, all weights of elements are set equal to the weights of corresponding sample or feature, i.e. or . During each iteration, an convex minimization problem is solved for each Vstep, Bstep and Sstep, and the global optimum solution is obtained for each substep. Thus the overall objective is nonincreasing with the iterations and Algorithm 1 is guaranteed to convergence.
Experiment Results
In this section, we extensively evaluate the proposed approach when applied to image clustering task on two real datasets. Experimental results demonstrate the correctness and effectiveness of the proposed model.
Experiment Setup
Datasets
Two widely used realworld image dataset benchmarks are considered in the experiments^{1}^{1}1We downloaded these datasets from:http://www.cad.zju.edu.cn/home/dengcai/Data..

COIL20. The COIL20 image library contains images of objects, with images per object. The size of each image is , with grey levels per pixel. Images of the objects were taken at angle intervals of . Each image is represented by a dimensional vector.

CMUPIE. The CMUPIE face database contains facial images of subjects in various angles, facial expressions, and lighting conditions. The size of each image is , with grey levels per pixel. Thus, each image is represented by a dimensional vector. In our experiments, we fix the angle and the facial expression to select 21 images under different lighting conditions for each subject, totaling images.
Baselines
We compared the proposed approach with several stateoftheart methods.

Original and SC. The “original” method is to cluster image objects in the original data space (denoted as ’OS’). Sparse coding [Lee et al.2006] is the basic sparse coding method without any regularization except sparsity constraints.

LSC, LSC and CSC. (LSC) [Zheng et al.2011] and (LSC) [Gao, Tsang, and Chia2013] add a graph/hyperGraph Laplacian regularization term to the original sparse coding framework, while CSC [Feng, Wu, and Zhou2017] (hypergraph consistent sparse coding) integrates a hypergraph incidence consistency regularization term.

SPSC, SPSC and SPSC. These are proposed methods as in (The Formulation of SPSC), ( SPSC) and ( SPSC) respectively.
Experimental Setting
We use means clustering to group image datasets and compare the clustering results with two commonly used metrics, i.e., clustering accuracy (ACC) and normalized mutual information (NMI). Both ACC and NMI range from 0 to 1, while ACC reveals the clustering accuracy (also called purity) and NMI indicates whether the different clustering sets are identical (NMI = 1) or independent (NMI = 0). To ensure stability of results, the direct means clustering on the sparse code space is respectively repeated 30 times, each time with a random set of initial cluster centers. The average ACC and NMI for different methods over these 30 repetitions on each dataset will be reported.
OS  SC  HLSC  HIC  SPSC  SPSC  SPSC  
ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  
0  0.6120  0.7353  0.6022  0.7291  0.6967  0.8035  0.7320  0.8729  0.7828  0.8999  0.7889  0.9043  0.7863  0.8973 
0.2  0.6081  0.7323  0.5847  0.7151  0.6742  0.8025  0.7319  0.8668  0.7878  0.8953  0.7580  0.8989  0.7726  0.8895 
0.4  0.6043  0.7303  0.5807  0.7172  0.6888  0.8068  0.7258  0.8820  0.7737  0.8920  0.7576  0.8886  0.7511  0.8826 
0.6  0.5968  0.7294  0.5688  0.7187  0.6806  0.8068  0.7237  0.8726  0.7655  0.8963  0.7545  0.8843  0.7563  0.8826 
0.8  0.5819  0.7132  0.5868  0.7115  0.6705  0.7927  0.7122  0.8540  0.7636  0.8851  0.7474  0.8665  0.7484  0.8740 
1  0.5879  0.7103  0.5794  0.7151  0.6790  0.7903  0.7042  0.8493  0.7624  0.8807  0.7433  0.8737  0.7449  0.8752 
OS  SC  HLSC  HIC  SPSC  SPSC  SPSC  
ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  
0  0.5656  0.5252  0.5076  0.5074  0.6643  0.6411  0.6678  0.6679  0.6814  0.6871  0.6816  0.6791  0.6887  0.6792 
0.2  0.5649  0.5234  0.5252  0.5056  0.6551  0.6452  0.6670  0.6611  0.6898  0.6823  0.6765  0.6675  0.6785  0.6820 
0.4  0.5555  0.5192  0.5298  0.5054  0.6554  0.6393  0.6613  0.6673  0.6762  0.6825  0.6728  0.6527  0.6689  0.6767 
0.6  0.5426  0.5084  0.5189  0.4932  0.6420  0.6212  0.6423  0.6363  0.6659  0.6717  0.6550  0.6538  0.6546  0.6615 
0.8  0.5442  0.5021  0.5175  0.4940  0.6381  0.6171  0.6440  0.6314  0.6682  0.6604  0.6512  0.6471  0.6557  0.6539 
1  0.5362  0.4939  0.5128  0.4888  0.6239  0.6130  0.6362  0.6326  0.6509  0.6580  0.6382  0.6477  0.6503  0.6511 
For each dataset, we normalize each image vector into a unit norm as the input of comared sparse coding algorithms. The dictionary size of all these models is set to be , since several recent works on sparse coding have advocated the use of overcomplete representations for images [Olshausen and others1996, Zheng et al.2011]. Thus, after sparse coding, each image is represented as a dimension vector, and will be used as the input of means clustering (means). The graph in LSC and hypergraph in CSC and SPSC are constructed in the original feature space after unitnorm normalization. Specifically, The nearest neighborbased graph (or hypergraph) (with by Euclidean distance) is used to characterize the intrinsic manifold with a binary weighting scheme as suggested in [Zheng et al.2011, Gao, Tsang, and Chia2013].
The parameters are tuned as follows. For SC, the sparsity weight is tuned in the range of to get the best performance (i.e., the highest average of ACC in 30 runs), and the best value is used for all sparse coding frameworkbased methods. For LSC, the Laplacian regularization weight is tuned in the range of . For CSC, we respectively fix the best values of and , and tune . For SPSC, we use the best , and of CSC; and we initialize to a value when of the elements (samples, features) are selected in the first iteration and set stepsize .
To test the robustness of proposed SPSC to corruption using COIL20 dataset. For each image with pixel value representation, we add white Gaussian noise according to , where
is the noise following a normal distribution with zero average and
is set to be the standard deviation over entries in data matrix
, is a corruption ratio which increase from to in intervals of . Clearly, white Gaussian noise is additive. Figure. 2 shows some samples on COIL20 dataset.Results and Discussion
Tables 1 and 2 report the performance of compared methods on clustering. They show the following points. 1) All three variants of proposed SPSC perform better than other baselines on both ACC and NIM, the power of selfpaced learning is demonstrated. 2) Among three variants of proposed SPSC,  SPSC (SPSC) is more robust than  SPSC and , especially when the noise level is larger. We think that this is because the  SPSC is implementing on the dynamic selection on the smallest granularity. 3) When the noise is on a low level (), the clustering performance will be not severely affected. and however when the noise level is high (), the clustering performance will sharply decrease (we implement experiments for but don’t show it due to the lower accuracy).
Discussion on the Weight Learning
In the experiment, we record the weight matrix in each iteration in Algorithm 1. To show the process of dynamic selection of samples, we plot the correlation between noise level (the mean absolute value of noise over each sample) and the corresponding weight in each iteration, as in Figure. 2.
We can see that, the values in the weight matrix are becoming larger after each iteration and they keep a high value after about 15 iterations, following the mechanism of selfpaced learning that more elements are selected after each iteration of the optimization process. Also, there is a negative correlation between noise level and the corresponding weight in the beginning of iterations, means the algorithm of SPSC is learning samples with lower noise corruption (easier samples) first.
Conclusion
Under the framework of alternative optimization of sparse coding, the nonconvexity usually drive the optimization approach get stuck into bad local minima, especially when many noises and outliers exist in the datasets. To relieve the affect of nonconvexity, we propose a unified framework named  (SPSC) by incorporating Selfpaced learning methodology with Sparse coding as well as manifold regularization. For different levels of dynamic selection, we extends SPSC into three variants, that is  SPSC,  SPSC and  SPSC. The effectiveness of proposed SPSC was demonstrated by experiments on real image dataset and it with noise. The proposed method shows its advantage over current SC methods on more accurately approximating the ground truth clustering, and we will evaluate it on other tasks, such as classification and recommendation.
References
 [Basu and Christensen2013] Basu, S., and Christensen, J. 2013. Teaching classification boundaries to humans. In AAAI, 109–115.
 [Bengio et al.2009] Bengio, Y.; Louradour, J.; Collobert, R.; and Weston, J. 2009. Curriculum learning. In ICML, 41–48. ACM.
 [Dong et al.2015] Dong, W.; Li, X.; Ma, Y.; and Shi, G. 2015. Image restoration via bayesian structured sparse coding. In ICIP, 4018–4022.
 [Feng et al.2016] Feng, X.; Sharma, A.; Srivastava, J.; Wu, S.; and Tang, Z. 2016. Social network regularized sparse linear model for topn recommendation. EAAI 51(C):5–15.
 [Feng, Wu, and Zhou2017] Feng, X.; Wu, S.; and Zhou, W. 2017. Multihypergraph consistent sparse coding. ACM TIST 8(6):75.
 [Gao, Tsang, and Chia2013] Gao, S.; Tsang, I.; and Chia, L. 2013. Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE TPAMI 35:92–104.
 [Huang et al.2010] Huang, Y.; Liu, Q.; Zhang, S.; and Metaxas, D. N. 2010. Image retrieval via probabilistic hypergraph ranking. In CVPR, 3376–3383.
 [Jiang et al.2015] Jiang, L.; Meng, D.; Zhao, Q.; Shan, S.; and Hauptmann, A. G. 2015. Selfpaced curriculum learning. In AAI, 2694–2700.
 [Jing et al.2014] Jing, X. Y.; Hu, R. M.; Wu, F.; Chen, X. L.; Liu, Q.; and Yao, Y. F. 2014. Uncorrelated multiview discrimination dictionary learning for recognition. In AAAI, 2787–2795.
 [Kim et al.2007] Kim, S. J.; Koh, K.; Lustig, M.; Boyd, S.; and Gorinevsky, D. 2007. An interiorpoint method for largescale l 1regularized least squares. IEEE Journal of Selected Topics in Signal Processing 1(4):606–617.
 [Kumar, Packer, and Koller2010] Kumar, M. P.; Packer, B.; and Koller, D. 2010. Selfpaced learning for latent variable models. In NPIS, 1189–1197.
 [Lee et al.2006] Lee, H.; Battle, A.; Raina, R.; and Ng, A. Y. 2006. Efficient sparse coding algorithms. NIPS 801–808.
 [Lee et al.2007] Lee, H.; Battle, A.; Raina, R.; and Ng, A. Y. 2007. Efficient sparse coding algorithms. In NPIS, 801–808.

[Li et al.2015]
Li, P.; Bing, L.; Lam, W.; Li, H.; and Liao, Y.
2015.
Readeraware multidocument summarization via sparse coding.
In IJCAI, 1270–1276.  [Li et al.2017a] Li, B.; Yuan, C.; Xiong, W.; Hu, W.; Peng, H.; Ding, X.; and Maybank, S. 2017a. Multiview multiinstance learning based on joint sparse representation and multiview dictionary learning. IEEE TPAMI PP(99):1–1.
 [Li et al.2017b] Li, C.; Yan, J.; Wei, F.; Dong, W.; Liu, Q.; and Zha, H. 2017b. Selfpaced multitask learning. In AAAI, 2175–2181.
 [Lin, Chen, and Ma2010] Lin, Z.; Chen, M.; and Ma, Y. 2010. The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices. arXiv preprint arXiv:1009.5055.

[Lu et al.2015]
Lu, Z.; Gao, X.; Wang, L.; Wen, J.; and Huang, S.
2015.
Noiserobust semisupervised learning by largescale sparse coding.
In AAAI, 2828–2834.  [M. Yang and Feng2011] M. Yang, D. Z., and Feng, X. 2011. Fisher discrimination dictionary learning for sparse representation. In ICCV, 543–550.
 [Olshausen and others1996] Olshausen, B. A., et al. 1996. Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609.
 [Sun et al.2014] Sun, X.; Qu, Q.; Nasrabadi, N. M.; and Tran, T. D. 2014. Structured priors for sparserepresentationbased hyperspectral image classification. IEEE Geoscience & Remote Sensing Letters 11(7):1235–1239.
 [Tang, Yang, and Gao2012] Tang, Y.; Yang, Y.B.; and Gao, Y. 2012. Selfpaced dictionary learning for image classification. In ACM MM, 833–836. ACM.

[Wright et al.2009]
Wright, J.; Yang, A. Y.; Ganesh, A.; Sastry, S. S.; and Ma, Y.
2009.
Robust face recognition via sparse representation.
IEEE TPAMI 31:210–227.  [Wu and Lange2008] Wu, T. T., and Lange, K. 2008. Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics 224–244.
 [Xu et al.2016] Xu, D.; Song, J.; AlamedaPineda, X.; Ricci, E.; and Sebe, N. 2016. Multipaced dictionary learning for crossdomain retrieval and recognition. In ICPR, 3228–3233.
 [Xu, Tao, and Xu2015] Xu, C.; Tao, D.; and Xu, C. 2015. Multiview selfpaced learning for clustering. In IJCAI, 3974–3980.
 [Yan and Wang2009] Yan, S., and Wang, H. 2009. Semisupervised learning by sparse representation. In SDM, 792–801. SIAM.
 [Yang et al.2009] Yang, J.; Yu, K.; Gong, Y.; and Huang, T. 2009. Linear spatial pyramid matching using sparse coding for image classification. 1794–1801.
 [Yang et al.2010] Yang, A. Y.; Sastry, S. S.; Ganesh, A.; and Ma, Y. 2010. Fast minimization algorithms and an application in robust face recognition: A review. In ICIP, 1849–1852.
 [Yang et al.2014] Yang, Y.; Yang, J.; Yang, J.; Wang, J.; Chang, S.; and Huang, T. S. 2014. Data clustering by laplacian regularized graph. In AAAI, 3148–3149.
 [Z. Jiang and Davis2012] Z. Jiang, G. Z., and Davis, L. S. 2012. Submodular dictionary learning for sparse coding. In CVPR, 3418–3425.
 [Z. Jiang and Davis2013] Z. Jiang, Z. L., and Davis, L. S. 2013. Label consistent ksvd: learning a discriminative dictionary for recognition. IEEE TPMAI 35:2651–2664.
 [Zhang and Li2010] Zhang, Q., and Li, B. 2010. Discriminative ksvd for dictionary learning in face recognition. In CVPR, 2691–2698.
 [Zhao et al.2015] Zhao, Q.; Meng, D.; Jiang, L.; Xie, Q.; Xu, Z.; and Hauptmann, A. G. 2015. Selfpaced learning for matrix factorization. In AAAI, 3196–3202.
 [Zhen et al.2016] Zhen, L.; Peng, D.; Yi, Z.; Xiang, Y.; and Chen, P. 2016. Underdetermined blind source separation using sparse coding. IEEE TNNLS PP(99):1–7.
 [Zheng and Tao2015] Zheng, H., and Tao, D. 2015. Discriminative dictionary learning via fisher discrimination KSVD algorithm. Neurocomputing 162:9–15.
 [Zheng et al.2011] Zheng, M.; Bu, J.; Chen, C.; Wang, C.; Zhang, L.; Qiu, G.; and Cai, D. 2011. Graph regularized sparse coding for image representation. IEEE TIP 20:1327–1336.
Comments
There are no comments yet.