1 Introduction
Dictionary learning (DL), as a particular sparse signal model, aims to learn a set of atoms, or called visual words in the computer vision community, in which a few atoms can be linearly combined to well approximate a given signal. From the view of compression sensing, it is originally designed to learn an adaptive codebook to faithfully represent the signals with sparsity constraint. In recent years, researchers have applied DL framework to other applications and achieved stateoftheart performances, such as image denoising
[3] and inpainting [4], clustering [2, 9], classification [1, 6], etc.It is wellknown that the conventional DL framework is not adapted to classification as a result that the learned dictionary is merely used for signal reconstruction. Therefore, to circumvent this problem, researchers have developed several approaches to learn a classificationoriented dictionary in a supervised learning fashion by exploring the label information. In this note, we review the some existing representative DLbased classification methods. Through comparison, we can roughly divide them into two categories: (1) directly forcing the dictionary discriminative, or (2) making the sparse coefficients discriminative (usually through simultaneously learning a classifier) to promote the discrimination of the dictionary. The first category, named Track I in this note, mainly uses representation error for the final classification, whereas, the second category (Track II) can utilize the sparse coefficients as new feature representation for classification.
Track I includes Metaface learning [12] and DL with structured incoherence [8], and Track II contains supervised DL [6], discriminative KSVD [13], label consistence KSVD [5] and Fisher discrimination DL [11]. The abbreviations of these methods are listed in Table 1.
The organization of this note is as follows. In the end of this section, we review an important method called sparse representationbased classification [10], then introduce the general dictionary learning framework with notations used in this note. Note that even though SRC do not learn dictionaries, it opens the prologue of classification based on sparse coding technique. In Section 2, we introduce Metaface learning [12] and DLSI [8] as two specific examples of Track I, which uses the reconstruction error for the final classification like what SRC does. Its counterpart, i.e. Track II, will presented in Section 3, including SupervisedDL [6], DKSVD [13], LCKSVD [5] and FisherDL [11]. In Section 4, we give a brief summary on DLbased classification methods, and expect some extensions in the future work.
Category  Representative Approaches 

Track I  Metaface learning [12], DLSI [8] 
Track II  SupervisedDL [6], DKSVD [13], LCKSVD [5], Fisher DL [11] 
1.1 Sparse RepresentationBased Classification
Wright et al. [10]
propose the sparse representation based classification (SRC) method for robust face recognition, and achieve very impressive results. Suppose there are
classes of individual faces, let be the set of original training samples, where is the subset of all the vectorrepresented training samples from class . SRC treats the original data set as an overall dictionary. Denote by a query facial image, then SRC identifies as the following twostage procedure:
sparsely code over via norm minimization
(1) where is a scalar constant.

identify to the class that
(2) where is a vector indicator function that extract the elements corresponding to the class.
SRC achieves very impressive performance in face recognition, and robust to noises such as occlusion, lighting, etc. Even if SRC learns no dictionaries for classification, it acts as one vanguard to open the prologue of classification with the help of sparse coding. In this view, we can see SRC naively uses all the training samples as one dictionary, wherein the classspecific training sets are subdictionaries contributing to discrimination.
1.2 Dictionary Learning Framework
Learning an adaptive dictionary (possible overcomplete) aims to provide a basis pool in which a few bases can be linearly combined to approximate a novel signal. Suppose there are a set of signals, denoted by , where is the signal. Then the conventional dictionary learning framework learns the dictionary as below:
(3) 
where is the coefficient matrix and .
It is widely known that classic dictionary learning framework is designed for a reconstruction task instead of classification tasks, even if good classification results are achieved in the literature. It is believed that classification performance will be further improved if we carefully learn a classificationoriented dictionary. In next section, we will have a look at several DLbased classification methods belonging to Track I.
2 Track I: Directly Making the Dictionary Discriminative
The methods from Track I use the reconstruction error for the final classification, thus the learned dictionary ought to be as discriminative as possible. Inspired by SRC, Yang et al. propose metaface learning [12] to learn an adaptive dictionary for each class, and Ramirez et al. add a sophisticated term to derive more delicate classificationoriented dictionaries. Now, we present the two methods.
2.1 MetaFace Learning
SRC directly adopts the original facial images as the dictionary, however, as discussed in [12], this predefined dictionary will incorporate much redundancy as well as noise and trivial information that can can be negative to the face recognition. Additionally, when the training data grows, the computation of sparse coding will become a main bottleneck. Focusing on this problem, Yang et al. [12] propose a Metaface learning method to learn a classspecific dictionary for each object:
(4) 
where matrix contains all the training images from the class as its columns, is the column of the classspecific subdictionary , and is defined as the summation of norm of all the columns of , i.e. . Metaface learning method concatenates all the subdictionaries as an overall dictionary for classification, the same as the second stage of SRC.
2.2 Dictionary Learning with Structured Incoherence
Ramirez et al. note that the learned subdictionaries may share some common bases, i.e. some visual words from different subdictionaries can be very coherent [8]. Undoubtedly, the coherence of the atoms can be used for reconstructing the query image interchangeably, and the reconstruction error based classifier will fail in identifying some queries. To circumvent this problem, they add an incoherence term term to drive the dictionaries associated to different classes as independent as possible.
The incoherence term is denoted as . It is easy to see this term drives the atoms from different subdictionaries to be as independent/incoherent as possible. Therefore, Ramirez et al. derive the final dictionary learning method with structured incoherence as below:
(5) 
where , each column is the sparse code corresponding to the signal in class .
They empirically note that even though the incoherence term is imposed in the dictionaries, atoms representing common features in all classes tend to appear repeated almost exactly in dictionaries corresponding to different classes [8]. Being so common, these atoms are used often and their associated reconstruction coefficients have a high absolute value , , thus making the reconstruction costs similar. They further propose to detect such atoms is to inspect the already available matrices, whose absolute values represent the inner products between atoms. By ignoring the coefficients associated to these common atoms when computing the reconstruction error, they improve the discriminatory power of the system.
3 Track II: Making the Coefficients Discriminative
Track II is different from Track I in the way of discrimination. Contrary to Track I, it forces the sparse coefficients to be discriminative, and indirectly propagates the discrimination power to the overall dictionary. Track II only need to learn an overall dictionary, instead of classspecific dictionaries. In this section, we list several recentproposed methods belonging to Track II.
3.1 Supervised Dictionary Learning
Before presenting this method, we have to clarify that the Supervised DL (SupervisedDL) method is a specific approach proposed in [6], regardless of other possible supervised DL framework.
Mairal et al.
propose to combine the logistic regression with conventional dictionary learning framework as below:
(6) 
where
is the logistic loss function (
), which enjoys properties similar to that of the hinge loss from the SVM literature, while being differentiable, and is a regularization parameter which prevents overfitting. This is the approach chosen in [7]. And is a classification function — linear in : wherein , or bilinear in and : wherein .3.2 Discriminative KSVD for Dictionary Learning
Zhang and Li propose discriminative KSVD (DKSVD) to simultaneously achieve a desired dictionary which has good representation power while supporting optimal discrimination of the classes [13]
. DKSVD adds a simple linear regression as a penalty term to the conventional DL framework:
(7) 
where is the label of the training images, in which : the position of nonzero element indicates the class. And is the parameter of the classifier, , and are scalars controlling the relative contribution of the corresponding terms.
Note that the first two terms can be fused into one, and the term can be dropped during computation owing to the protocol of the original KSVD algorithm(details in [13]). After obtaining the classifier parameter and the dictionary, the final classification can be very fast for a query image.
3.3 Label Consistent KSVD
Jiang et al. propose a label consistent KSVD (LCKSVD) method to learn a discriminative dictionary for sparse coding [5]. They introduce a label consistent constraint called “discriminative sparsecode error”, and combine it with the reconstruction error and the classification error to form a unified objective function as below:
(8) 
where and are the same as that of DKSVD described in the previous subsection, is the label consistence term. Here is an indicator corresponding to the input signal from suitable class: the nonzero values of occur at those indices where the input signal and the dictionary codeword share the same label.
The term represents the discriminative sparsecode error, which enforces that the sparse codes approximate the discriminative sparse codes . It forces the signals from the same class to have very similar sparse representations, i.e. encouraging label consistency in resulting sparse codes. At the same time, the linear regression term is added, which is the same as that of DKSVD [13]. Intuitively, the final classification mechanism is very fast owing to the classifier parameter matrix .
3.4 Fisher Discriminant Dictionary Learning
Yang et al. propose Fisher discrimination dictionary learning (FisherDL) method based on the Fisher criterion to learn a structured dictionary [11], whose atom has correspondence to the class label. The structured dictionary is denoted as , where is the classspecific subdictionary associated with the class. Denote the data set , where is the subset of the training samples from the class. Then they solve the following formulation over the dictionary and the coefficients to derive the desired discriminative dictionary:
(9) 
where is the discriminative fidelity term (pending to discuss it as below); is the sparsity constraint; is a discrimination constraint (as discussed below) imposed on the coefficient matrix .
The discriminative fidelity term We can write , the representation of over , as , where is the coding coefficient of over the subdictionary . Denote the representation of to as . First of all, the dictionary should be able to well represent , and there is . Second, since is associated with the class, it is expected that should be well represented by but not by , . This implies that should have some significant coefficients such that is small, while should have nearly zero coefficients such that is small. Thus the discriminative fidelity term is defined as:
(10) 
The discriminative coefficient term To make dictionary be discriminative for the samples in , we can make the coding coefficient of over , i.e. , be discriminative. Based on Fisher Criterion, this can be achieved by minimizing the withinclass scatter of , denoted by and maximizing the betweenclass scatter of , denoted by . and are defined as:
Intuitively, we can define as . However, such an is nonconvex and unstable. To solve this problem, we propose to add an elastic term into :
(11) 
Incorporating all the terms, we have the following FDDL model:
(12) 
There are some crucial issues related to their model, such as the convexity of and sparse coding, and they discuss these issue in depth [11]. As for classification, they still utilize the reconstruction error as that of Track I.
4 Summary
In previous two sections, we review some representative DLbased classification approaches, both from Track I and Track II. Obviously, it is intuitive but effective to add some sophisticated discrimination term to the conventional DL framework to derive a welllearned dictionary for classification.
If we check these methods, we can anticipate a general framework here:
(13) 
where is the conventional DL framework, is the discrimination term on the sparse coefficients, and are the Lagrange constraints on the sparse coefficient matrix and the projector , and ’s are scalars to balance their weights. Note does not necessarily mean only one projector, but rather represents several ones. From Eq. 13, we can see that, by employing the label matrix , the discriminative dictionary can be learned directly in the term , at the same time, the term can also propagate the discrimination power of the coefficients to the dictionary, making the dictionary even more discriminative and reliable for classification. Obviously, if we set , Eq. 13 degrades to Track I; if we omit the label information in term , Eq. 13 degenerates to Track II. Note that FisherDL [11] can also be cast as a specific example of Eq. 13, which drives the dictionary to be as discriminative as possible from two directions (direct push and indirect push by the coefficients).
Besides, the main concern seems to be the tradeoff between the classification accuracy and the complexity of formulation. Furthermore, when meeting large scale database, these methods will be time consuming in learning the dictionary. Therefore, how to extend these method to online version is an interesting but significant research.
References
 [1] D. M. Bradley and J. A. Bagnell. Differentiable sparse coding. NIPS, 2008.
 [2] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang. Learning with graph for image analysis. IEEE Trans. Img. Proc., 19(4):858–866, Apr. 2010.
 [3] M. Elad and M. Aharon. Image denoising via learned dictionaries and sparse representation. CVPR, 2006.
 [4] M. Elad, M. Figueiredo, and Y. Ma. On the role of sparse and redundant representations in image processing. proceedings of IEEE, 98(6):972–982, 2010.
 [5] Z. Jiang, Z. Lin, and L. S. Davis. Learning a discriminative dictionary for sparse coding via label consistent ksvd. CVPR, 2011.
 [6] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Supervised dictionary learning. NIPS, 2008.

[7]
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng.
Selftaught learning: transfer learning from unlabeled data.
ICML, 2007.  [8] I. Ramirez, P. Sprechmann, and G. Sapiro. Classification and clustering via dictionary learning with structured incoherence and shared features. CVPR, 2010.

[9]
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan.
sparse representation for computer vision and pattern recognition.
proceedings of IEEE, 98(6):1031–1044, 2010.  [10] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. PAMI, 2009.
 [11] M. Yang, L. Zhang, X. Feng, and D. Zhang. Fisher discrimination dictionary learning for sparse representation. ICCV, 2011.
 [12] M. Yang, L. Zhang, J. Yang, and D. Zhang. metaface learning for sparse representation based face recognition. ICIP, 2010.
 [13] Q. Zhang and B. Li. Discriminative ksvd for dictionary learning in face recognition. CVPR, 2010.
Comments
There are no comments yet.