1 Introduction
Pattern classification is a major problem in machine learning research
[32, 5, 6, 13]. The two most important topics of pattern classification are data representation and classifier learning. Zhang et al. proposed an efficient multimodel classifier for large scale Biosequence localization prediction [36]. Zhang et al. developed and optimized association rule mining algorithms and implemented them on paralleled microarchitectural platforms [39, 38]. Most data representation and classification methods are based on single data point. When one data point is considered for representation and classification, all other data points are ignored. However, the other data points other than the data point under consideration, which are called contextual data points, may play important roles in its representation and classification. It is necessary to explore the contexts of data points when they are represented and/or classified. In this paper, we investigate the problem of learning effective representation of a data point from its context guided by its class label, and proposed a novel supervised context learning method using sparse regularization and linear classifier learning formulation.We propose a novel method to explore the context of a data point, and use it to represent it. We use its nearest neighbors as its context, and try to reconstruct it by the data points in its context. The reconstruction errors are imposed to be spares. Moreover, the reconstruction result is used as the new representation of this data point. We apply a linear function to predict its class label from the sparse reconstruction of its context. The motivation of this contribution is that for each data point, only a few data points in its context is of the same class as itself. To find the critical contextual data points, we proposed to learn the classifier together with she sparse context. We mode this problem as a minimization problem. In this problem, the context reconstruction error, reconstruction sparsity, classification error, and classifier complexity are minimized simultaneously. We also problem a novel iterative algorithm to solve this minimization problem. We first reformulate it as ist Lagrange formula, and the use an alterative optimization method to solve it.
2 Proposed method
We consider a binary classification problem, and a training set of data points are given as , where is a
dimensional feature vector of the
th data point, and is the class label of the th point. To learn from the context of the th data point, we find its nearest neighbors and denote them as , where is the th nearest neighbor of the th point. They are further organized as a matrix , where the th column is . We represent by linearly reconstructing it from its contextual points as(1) 
where is its reconstruction, and is the reconstruction coefficient of the th nearest neighbor. is the reconstruction coefficient vector of the th data point. The reconstruction coefficient vectors of all the training points are organized in reconstruction coefficient matrix , with its th column as . To solve the reconstruction coefficient vectors, we propose the following minimization problem,
(2) 
where and are tradeoff parameters. In the objective of this problem, the first term is to minimize the reconstruction error measured by a squared norm penalty between and , and the second term is a norm penalty to the contextual reconstruction coefficient vector .
We design a classifier to classify the th data point,
(3) 
where is the classifier parameter vector. The following optimization problem is proposed to learn w,
(4)  
where is the the squared norm regularization term to reduce the complexity of the classifier, is the slack variable for the hinge loss of the th training point, and is a tradeoff parameter.
(5)  
According to the dual theory of optimization, the following dual optimization problem is obtained,
(6)  
where , and are Lagrange multipliers. By setting the partial derivative of with regard to w and to zeros, we have
(8)  
where is a dimensional vector of all elements. We solve this problem with the alternate optimization strategy. In each iteration of an iterative algorithm, we fix first to solve , and then fix to solve .
 Solving

When is fixed and only is considered, we solve one by one, (8) is further reduced to
(9) This problem could be solved efficiently by the modified featuresign search algorithm proposed by Gao et al. [2].
 Solving

When is fixed and only is considered, the problem in (8) is reduced to
(10) This problem is a typical constrained quadratic programming (QP) problem, and it can be solved efficiently by the active set algorithm.
3 Experiments
In this section, we evaluate the proposed supervised sparse context learning (SSCL) algorithm on several benchmark data sets.
3.1 Experiment setup
In the experiments, we used three date sets, which are introduced as follows:

MANET loss data set: The packet losses of the receiver in mobile Ad hoc networks (MANET) can be classified into three types, which are wireless random errors caused losses, the route change losses induced by node mobility and network congestion. We collect 381 data points for the congestion loss, 458 for the route change loss, and 516 data points for the wireless error loss for this data set. Thus in the data set, there are 1355 data points in total. To extract the feature vector each data point, we calculate 12 features from each data point as in [1], and concatenate them to form a vector.

Twitter data set: The second data set is a Twitter data set. The target of this data set is to predict the gender of the twitter user, male or female, given one of his/her Twitter massage. We collected 53,971 twitter massages in total, and among them there are 28,012 messages sent by male users, and 25,959 messages sent by female users. To extract features from each Twitter message, we extract Term features, linguistic features, and medium diversity features as genderspecific features as in [8].

Arrhythmia data set: The third data set is publicly available at http://arc
hive.ics.uci.edu/ml/datasets/Arrhythmia. In this data set, there are 452 data points, and they belongs to 16 different classes. Each data point has a feature vector of 279 features.
To conduct the experiments, we used the 10fold cross validation.
3.2 Experimental Results
Since the proposed algorithm is a contextbased classification and sparse representation method, we compared the proposed algorithm to three popular contextbased classifiers, and one contextbased sparse representation method. The three contextbased classifiers are traditional
nearest neighbor classifier (KNN), sparse representation based classification (SRBC)
[26],and Laplacian support vector machine (LSVM)
[11]. The contextbased sparse representation method is Gao et al.’s Laplacian sparse coding (LSC) [3]. The boxplots of the 10fold cross validation of the compared algorithms are given in figure 1. From the figures, we can see that the proposed method SSCL outperforms all the other methods on all three data sets. The second best method is SRBC, which also uses sparse context to represent the data point. This is a strong evidence that learning a supervised sparse context is critical for classification problem.3.2.1 Sensitivity to parameters
In the proposed formulation, there are three tradeoff parameters, , , and . We plot the curve of mean prediction accuracies against different values of parameters, and show them in figure 2. From figure 2(a) and 2(b), we can see the accuracy is stable to the parameter and . From figure 2(c), we can see a larger leads to better classification performances.
4 Conclusion and future works
In this paper, we study the problem of using context to represent and classify data points. We propose to use a sparse linear combination of the data points in the context of a data point to represent itself. Moreover, to increase the discriminative ability of the new representation, we develop an supervised method to learn the sparse context by learning it and a classifier together in an unified optimization framework. Experiments on three benchmark data sets show its advantage over stateoftheart contextbased data representation and classification methods. In the future, we will extend the proposed method to applications of information security [33, 27, 30, 29, 28, 31, 34], bioinformatics [25, 24, 23, 12, 15, 14, 7, 37, 7]
[16, 17], and big data analysis using high performance computing [43, 18, 9, 35, 4, 41, 40, 39, 38, 35, 10, 42, 21, 20, 43, 19, 22].References
 [1] Deng, Q., Cai, A.: Svmbased loss differentiation mechanism in mobile ad hoc networks. In: 2009 Global Mobile Congress, GMC 2009 (2009). DOI 10.1109/GMC.2009.5295834
 [2] Gao, S., Tsang, I.H., Chia, L.T.: Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1), 92–104 (2013)

[3]
Gao, S., Tsang, I.W., Chia, L.T., Zhao, P.: Local features are not
lonely–laplacian sparse coding for image classification.
In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3555–3561. IEEE (2010)
 [4] Gao, Y., Zhang, F., Bakos, J.D.: Sparse matrixvector multiply on the keystone ii digital signal processor. In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pp. 1–6 (2014)
 [5] Guo, Z., Li, Q., You, J., Zhang, D., Liu, W.: Local directional derivative pattern for rotation invariant texture classification. Neural Computing and Applications 21(8), 1893–1904 (2012)
 [6] He, Y., Sang, N.: Multiring local binary patterns for rotation invariant texture classification. Neural Computing and Applications 22(34), 793–802 (2013)
 [7] Hu, J., Zhang, F.: Improving protein localization prediction using amino acid group based physichemical encoding. In: Bioinformatics and Computational Biology, pp. 248–258 (2009)

[8]
Huang, F., Li, C., Lin, L.: Identifying gender of microblog users based on
message mining.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
8485 LNCS, 488–493 (2014)  [9] Li, T., Zhou, X., Brandstatter, K., Raicu, I.: Distributed keyvalue store on hpc and cloud systems. In: 2nd Greater Chicago Area System Research Workshop (GCASR). Citeseer (2013)
 [10] Li, T., Zhou, X., Brandstatter, K., Zhao, D., Wang, K., Rajendran, A., Zhang, Z., Raicu, I.: Zht: A lightweight reliable persistent dynamic scalable zerohop distributed hash table. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 775–787 (2013)
 [11] Melacci, S., Belkin, M.: Laplacian support vector machines trained in the primal. The Journal of Machine Learning Research 12, 1149–1184 (2011)
 [12] Peng, B., Liu, Y., Zhou, Y., Yang, L., Zhang, G., Liu, Y.: Modeling nanoparticle targeting to a vascular surface in shear flow through diffusive particle dynamics. Nanoscale Research Letters 10(1), 235 (2015)
 [13] Tian, Y., Zhang, Q., Liu, D.: vnonparallel support vector machine for pattern classification. Neural Computing and Applications 25(5), 1007–1020 (2014)
 [14] Wang, J., Li, Y., Wang, Q., You, X., Man, J., Wang, C., Gao, X.: Proclusensem: predicting membrane protein types by fusing different modes of pseudo amino acid composition. Computers in biology and medicine 42(5), 564–574 (2012)
 [15] Wang, J.J.Y., Bensmail, H., Gao, X.: Multiple graph regularized protein domain ranking. BMC bioinformatics 13(1), 307 (2012)
 [16] Wang, J.J.Y., Bensmail, H., Gao, X.: Joint learning and weighting of visual vocabulary for bagoffeature based tissue classification. Pattern Recognition 46(12), 3249–3255 (2013)

[17]
Wang, J.Y., Almasri, I., Gao, X.: Adaptive graph regularized nonnegative matrix factorization via feature selection.
In: Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 963–966 (2012)  [18] Wang, K., Kulkarni, A., Zhou, X., Lang, M., Raicu, I.: Using simulation to explore distributed keyvalue stores for exascale system services. In: 2nd Greater Chicago Area System Research Workshop (GCASR) (2013)
 [19] Wang, K., Liu, N., Sadooghi, I., Yang, X., Zhou, X., Lang, M., Sun, X.H., Raicu, I.: Overcoming hadoop scaling limitations through distributed task execution. In: Proc. of the IEEE International Conference on Cluster Computing 2015 (Cluster 15) (2015)
 [20] Wang, K., Zhou, X., Chen, H., Lang, M., Raicu, I.: Next generation job management systems for extremescale ensemble computing. In: Proceedings of the 23rd international symposium on Highperformance parallel and distributed computing, pp. 111–114 (2014)
 [21] Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and datalocality with dataaware scheduling. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 119–128 (2014)
 [22] Wang, K., Zhou, X., Qiao, K., Lang, M., McClelland, B., Raicu, I.: Towards scalable distributed workload manager with monitoringbased weakly consistent resource stealing. In: Proceedings of the 24rd international symposium on Highperformance parallel and distributed computing, pp. 219–222. ACM (2015)
 [23] Wang, S., Zhou, Y., Tan, J., Xu, J., Yang, J., Liu, Y.: Computational modeling of magnetic nanoparticle targeting to stent surface under high gradient field. Computational mechanics 53(3), 403–412 (2014)
 [24] Wang, Y., Han, H.C., Yang, J.Y., Lindsey, M.L., Jin, Y.: A conceptual cellular interaction model of left ventricular remodelling postmi: dynamic network with exitentry competition strategy. BMC systems biology 4(Suppl 1), S5 (2010)
 [25] Wang, Y., Yang, T., Ma, Y., Halade, G.V., Zhang, J., Lindsey, M.L., Jin, Y.F.: Mathematical modeling and stability analysis of macrophage activation in left ventricular remodeling postmyocardial infarction. BMC genomics 13(Suppl 6), S21 (2012)

[26]
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation.
Pattern Analysis and Machine Intelligence, IEEE Transactions on 31(2), 210–227 (2009)  [27] Xu, L., Zhan, Z., Xu, S., Ye, K.: Crosslayer detection of malicious websites. In: Proceedings of the third ACM conference on Data and application security and privacy, pp. 141–152. ACM (2013)
 [28] Xu, L., Zhan, Z., Xu, S., Ye, K.: An evasion and counterevasion study in malicious websites detection. In: Communications and Network Security (CNS), 2014 IEEE Conference on, pp. 265–273. IEEE (2014)
 [29] Xu, S., Lu, W., Xu, L., Zhan, Z.: Adaptive epidemic dynamics in networks: Thresholds and control. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 8(4), 19 (2014)
 [30] Xu, S., Lu, W., Zhan, Z.: A stochastic model of multivirus dynamics. Dependable and Secure Computing, IEEE Transactions on 9(1), 30–45 (2012)
 [31] Xu, S., Qian, H., Wang, F., Zhan, Z., Bertino, E., Sandhu, R.: Trustworthy information: concepts and mechanisms. In: WebAge Information Management, pp. 398–404. Springer (2010)
 [32] Xu, Y., Shen, F., Zhao, J.: An incremental learning vector quantization algorithm for pattern classification. Neural Computing and Applications 21(6), 1205–1215 (2012)
 [33] Zhan, Z., Xu, M., Xu, S.: Characterizing honeypotcaptured cyber attacks: Statistical framework and case study. Information Forensics and Security, IEEE Transactions on 8(11), 1775–1789 (2013)
 [34] Zhan, Z., Xu, M., Xu, S.: A characterization of cybersecurity posture from network telescope data. In: Proceedings of the 6th international conference on trustworthy systems, Intrust, vol. 14 (2014)

[35]
Zhang, F., Gao, Y., Bakos, J.D.: Lucaskanade optical flow estimation on the ti c66x digital signal processor.
In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pp. 1–6 (2014)  [36] Zhang, F., Hu, J.: Bayesian classifier for anchored protein sorting discovery. In: Bioinformatics and Biomedicine, 2009. BIBM’09. IEEE International Conference on, pp. 424–428 (2009)
 [37] Zhang, F., Hu, J.: Bioinformatics analysis of physicochemical properties of protein sorting signals (2010)
 [38] Zhang, F., Zhang, Y., Bakos, J.: Gpapriori: Gpuaccelerated frequent itemset mining. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pp. 590–594 (2011)
 [39] Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. The Journal of Supercomputing 66(1), 94–117 (2013)
 [40] Zhang, Y., Zhang, F., Bakos, J.: Frequent itemset mining on largescale shared memory machines. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pp. 585–589 (2011)
 [41] Zhang, Y., Zhang, F., Jin, Z., Bakos, J.D.: An fpgabased accelerator for frequent itemset mining. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 6(1), 2 (2013)
 [42] Zhao, D., Zhang, Z., Zhou, X., Li, T., Wang, K., Kimpe, D., Carns, P., Ross, R., Raicu, I.: Fusionfs: Toward supporting dataintensive scientific applications on extremescale highperformance computing systems. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 61–70 (2014)
 [43] Zhou, X., Chen, H., Wang, K., Lang, M., Raicu, I.: Exploring distributed resource allocation techniques in the slurm job management system. Illinois Institute of Technology, Department of Computer Science, Technical Report (2013)
Comments
There are no comments yet.