Representing data by sparse combination of contextual data points for classification

06/30/2015 ∙ by Jingyan Wang, et al. ∙ Microsoft 0

In this paper, we study the problem of using contextual da- ta points of a data point for its classification problem. We propose to represent a data point as the sparse linear reconstruction of its context, and learn the sparse context to gather with a linear classifier in a su- pervised way to increase its discriminative ability. We proposed a novel formulation for context learning, by modeling the learning of context reconstruction coefficients and classifier in a unified objective. In this objective, the reconstruction error is minimized and the coefficient spar- sity is encouraged. Moreover, the hinge loss of the classifier is minimized and the complexity of the classifier is reduced. This objective is opti- mized by an alternative strategy in an iterative algorithm. Experiments on three benchmark data set show its advantage over state-of-the-art context-based data representation and classification methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pattern classification is a major problem in machine learning research

[32, 5, 6, 13]. The two most important topics of pattern classification are data representation and classifier learning. Zhang et al. proposed an efficient multi-model classifier for large scale Bio-sequence localization prediction [36]. Zhang et al. developed and optimized association rule mining algorithms and implemented them on paralleled micro-architectural platforms [39, 38]. Most data representation and classification methods are based on single data point. When one data point is considered for representation and classification, all other data points are ignored. However, the other data points other than the data point under consideration, which are called contextual data points, may play important roles in its representation and classification. It is necessary to explore the contexts of data points when they are represented and/or classified. In this paper, we investigate the problem of learning effective representation of a data point from its context guided by its class label, and proposed a novel supervised context learning method using sparse regularization and linear classifier learning formulation.

We propose a novel method to explore the context of a data point, and use it to represent it. We use its nearest neighbors as its context, and try to reconstruct it by the data points in its context. The reconstruction errors are imposed to be spares. Moreover, the reconstruction result is used as the new representation of this data point. We apply a linear function to predict its class label from the sparse reconstruction of its context. The motivation of this contribution is that for each data point, only a few data points in its context is of the same class as itself. To find the critical contextual data points, we proposed to learn the classifier together with she sparse context. We mode this problem as a minimization problem. In this problem, the context reconstruction error, reconstruction sparsity, classification error, and classifier complexity are minimized simultaneously. We also problem a novel iterative algorithm to solve this minimization problem. We first reformulate it as ist Lagrange formula, and the use an alterative optimization method to solve it.

This paper is organized as follows. In section 2, we introduce the proposed method. In section 3, we evaluate the proposed method experimentally. In section 4, this paper is concluded with future works.

2 Proposed method

We consider a binary classification problem, and a training set of data points are given as , where is a

-dimensional feature vector of the

-th data point, and is the class label of the -th point. To learn from the context of the -th data point, we find its nearest neighbors and denote them as , where is the -th nearest neighbor of the -th point. They are further organized as a matrix , where the -th column is . We represent by linearly reconstructing it from its contextual points as

(1)

where is its reconstruction, and is the reconstruction coefficient of the -th nearest neighbor. is the reconstruction coefficient vector of the -th data point. The reconstruction coefficient vectors of all the training points are organized in reconstruction coefficient matrix , with its -th column as . To solve the reconstruction coefficient vectors, we propose the following minimization problem,

(2)

where and are trade-off parameters. In the objective of this problem, the first term is to minimize the reconstruction error measured by a squared norm penalty between and , and the second term is a norm penalty to the contextual reconstruction coefficient vector .

We design a classifier to classify the -th data point,

(3)

where is the classifier parameter vector. The following optimization problem is proposed to learn w,

(4)

where is the the squared norm regularization term to reduce the complexity of the classifier, is the slack variable for the hinge loss of the -th training point, and is a tradeoff parameter.

The overall optimization problem is obtained by combining the problems in both (2) and (4) as

(5)

According to the dual theory of optimization, the following dual optimization problem is obtained,

(6)

where , and are Lagrange multipliers. By setting the partial derivative of with regard to w and to zeros, we have

(7)

We substitute (7) to (6)to eliminate w and ,

(8)

where is a dimensional vector of all elements. We solve this problem with the alternate optimization strategy. In each iteration of an iterative algorithm, we fix first to solve , and then fix to solve .

Solving

When is fixed and only is considered, we solve one by one, (8) is further reduced to

(9)

This problem could be solved efficiently by the modified feature-sign search algorithm proposed by Gao et al. [2].

Solving

When is fixed and only is considered, the problem in (8) is reduced to

(10)

This problem is a typical constrained quadratic programming (QP) problem, and it can be solved efficiently by the active set algorithm.

3 Experiments

In this section, we evaluate the proposed supervised sparse context learning (SSCL) algorithm on several benchmark data sets.

3.1 Experiment setup

In the experiments, we used three date sets, which are introduced as follows:

  • MANET loss data set: The packet losses of the receiver in mobile Ad hoc networks (MANET) can be classified into three types, which are wireless random errors caused losses, the route change losses induced by node mobility and network congestion. We collect 381 data points for the congestion loss, 458 for the route change loss, and 516 data points for the wireless error loss for this data set. Thus in the data set, there are 1355 data points in total. To extract the feature vector each data point, we calculate 12 features from each data point as in [1], and concatenate them to form a vector.

  • Twitter data set: The second data set is a Twitter data set. The target of this data set is to predict the gender of the twitter user, male or female, given one of his/her Twitter massage. We collected 53,971 twitter massages in total, and among them there are 28,012 messages sent by male users, and 25,959 messages sent by female users. To extract features from each Twitter message, we extract Term features, linguistic features, and medium diversity features as gender-specific features as in [8].

  • Arrhythmia data set: The third data set is publicly available at http://arc
    hive.ics.uci.edu/ml/datasets/Arrhythmia. In this data set, there are 452 data points, and they belongs to 16 different classes. Each data point has a feature vector of 279 features.

To conduct the experiments, we used the 10-fold cross validation.

3.2 Experimental Results

(a) MANET loss data set
(b) Twitter data set
(c) Arrhythmia data set
Figure 1: Boxplots of prediction accuracy of different context-based algorithms.

Since the proposed algorithm is a context-based classification and sparse representation method, we compared the proposed algorithm to three popular context-based classifiers, and one context-based sparse representation method. The three context-based classifiers are traditional

-nearest neighbor classifier (KNN), sparse representation based classification (SRBC)

[26]

,and Laplacian support vector machine (LSVM)

[11]. The context-based sparse representation method is Gao et al.’s Laplacian sparse coding (LSC) [3]. The boxplots of the 10-fold cross validation of the compared algorithms are given in figure 1. From the figures, we can see that the proposed method SSCL outperforms all the other methods on all three data sets. The second best method is SRBC, which also uses sparse context to represent the data point. This is a strong evidence that learning a supervised sparse context is critical for classification problem.

3.2.1 Sensitivity to parameters

(a)
(b)
(c)
Figure 2: Parameter sensitivity curves.

In the proposed formulation, there are three tradeoff parameters, , , and . We plot the curve of mean prediction accuracies against different values of parameters, and show them in figure 2. From figure 2(a) and 2(b), we can see the accuracy is stable to the parameter and . From figure 2(c), we can see a larger leads to better classification performances.

4 Conclusion and future works

In this paper, we study the problem of using context to represent and classify data points. We propose to use a sparse linear combination of the data points in the context of a data point to represent itself. Moreover, to increase the discriminative ability of the new representation, we develop an supervised method to learn the sparse context by learning it and a classifier together in an unified optimization framework. Experiments on three benchmark data sets show its advantage over state-of-the-art context-based data representation and classification methods. In the future, we will extend the proposed method to applications of information security [33, 27, 30, 29, 28, 31, 34], bioinformatics [25, 24, 23, 12, 15, 14, 7, 37, 7]

, computer vision

[16, 17], and big data analysis using high performance computing [43, 18, 9, 35, 4, 41, 40, 39, 38, 35, 10, 42, 21, 20, 43, 19, 22].

References

  • [1] Deng, Q., Cai, A.: Svm-based loss differentiation mechanism in mobile ad hoc networks. In: 2009 Global Mobile Congress, GMC 2009 (2009). DOI 10.1109/GMC.2009.5295834
  • [2] Gao, S., Tsang, I.H., Chia, L.T.: Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1), 92–104 (2013)
  • [3] Gao, S., Tsang, I.W., Chia, L.T., Zhao, P.: Local features are not lonely–laplacian sparse coding for image classification.

    In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3555–3561. IEEE (2010)

  • [4] Gao, Y., Zhang, F., Bakos, J.D.: Sparse matrix-vector multiply on the keystone ii digital signal processor. In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pp. 1–6 (2014)
  • [5] Guo, Z., Li, Q., You, J., Zhang, D., Liu, W.: Local directional derivative pattern for rotation invariant texture classification. Neural Computing and Applications 21(8), 1893–1904 (2012)
  • [6] He, Y., Sang, N.: Multi-ring local binary patterns for rotation invariant texture classification. Neural Computing and Applications 22(3-4), 793–802 (2013)
  • [7] Hu, J., Zhang, F.: Improving protein localization prediction using amino acid group based physichemical encoding. In: Bioinformatics and Computational Biology, pp. 248–258 (2009)
  • [8] Huang, F., Li, C., Lin, L.: Identifying gender of microblog users based on message mining.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    8485 LNCS, 488–493 (2014)
  • [9] Li, T., Zhou, X., Brandstatter, K., Raicu, I.: Distributed key-value store on hpc and cloud systems. In: 2nd Greater Chicago Area System Research Workshop (GCASR). Citeseer (2013)
  • [10] Li, T., Zhou, X., Brandstatter, K., Zhao, D., Wang, K., Rajendran, A., Zhang, Z., Raicu, I.: Zht: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 775–787 (2013)
  • [11] Melacci, S., Belkin, M.: Laplacian support vector machines trained in the primal. The Journal of Machine Learning Research 12, 1149–1184 (2011)
  • [12] Peng, B., Liu, Y., Zhou, Y., Yang, L., Zhang, G., Liu, Y.: Modeling nanoparticle targeting to a vascular surface in shear flow through diffusive particle dynamics. Nanoscale Research Letters 10(1), 235 (2015)
  • [13] Tian, Y., Zhang, Q., Liu, D.: v-nonparallel support vector machine for pattern classification. Neural Computing and Applications 25(5), 1007–1020 (2014)
  • [14] Wang, J., Li, Y., Wang, Q., You, X., Man, J., Wang, C., Gao, X.: Proclusensem: predicting membrane protein types by fusing different modes of pseudo amino acid composition. Computers in biology and medicine 42(5), 564–574 (2012)
  • [15] Wang, J.J.Y., Bensmail, H., Gao, X.: Multiple graph regularized protein domain ranking. BMC bioinformatics 13(1), 307 (2012)
  • [16] Wang, J.J.Y., Bensmail, H., Gao, X.: Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification. Pattern Recognition 46(12), 3249–3255 (2013)
  • [17]

    Wang, J.Y., Almasri, I., Gao, X.: Adaptive graph regularized nonnegative matrix factorization via feature selection.

    In: Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 963–966 (2012)
  • [18] Wang, K., Kulkarni, A., Zhou, X., Lang, M., Raicu, I.: Using simulation to explore distributed key-value stores for exascale system services. In: 2nd Greater Chicago Area System Research Workshop (GCASR) (2013)
  • [19] Wang, K., Liu, N., Sadooghi, I., Yang, X., Zhou, X., Lang, M., Sun, X.H., Raicu, I.: Overcoming hadoop scaling limitations through distributed task execution. In: Proc. of the IEEE International Conference on Cluster Computing 2015 (Cluster 15) (2015)
  • [20] Wang, K., Zhou, X., Chen, H., Lang, M., Raicu, I.: Next generation job management systems for extreme-scale ensemble computing. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pp. 111–114 (2014)
  • [21] Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and data-locality with data-aware scheduling. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 119–128 (2014)
  • [22] Wang, K., Zhou, X., Qiao, K., Lang, M., McClelland, B., Raicu, I.: Towards scalable distributed workload manager with monitoring-based weakly consistent resource stealing. In: Proceedings of the 24rd international symposium on High-performance parallel and distributed computing, pp. 219–222. ACM (2015)
  • [23] Wang, S., Zhou, Y., Tan, J., Xu, J., Yang, J., Liu, Y.: Computational modeling of magnetic nanoparticle targeting to stent surface under high gradient field. Computational mechanics 53(3), 403–412 (2014)
  • [24] Wang, Y., Han, H.C., Yang, J.Y., Lindsey, M.L., Jin, Y.: A conceptual cellular interaction model of left ventricular remodelling post-mi: dynamic network with exit-entry competition strategy. BMC systems biology 4(Suppl 1), S5 (2010)
  • [25] Wang, Y., Yang, T., Ma, Y., Halade, G.V., Zhang, J., Lindsey, M.L., Jin, Y.F.: Mathematical modeling and stability analysis of macrophage activation in left ventricular remodeling post-myocardial infarction. BMC genomics 13(Suppl 6), S21 (2012)
  • [26]

    Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation.

    Pattern Analysis and Machine Intelligence, IEEE Transactions on 31(2), 210–227 (2009)
  • [27] Xu, L., Zhan, Z., Xu, S., Ye, K.: Cross-layer detection of malicious websites. In: Proceedings of the third ACM conference on Data and application security and privacy, pp. 141–152. ACM (2013)
  • [28] Xu, L., Zhan, Z., Xu, S., Ye, K.: An evasion and counter-evasion study in malicious websites detection. In: Communications and Network Security (CNS), 2014 IEEE Conference on, pp. 265–273. IEEE (2014)
  • [29] Xu, S., Lu, W., Xu, L., Zhan, Z.: Adaptive epidemic dynamics in networks: Thresholds and control. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 8(4), 19 (2014)
  • [30] Xu, S., Lu, W., Zhan, Z.: A stochastic model of multivirus dynamics. Dependable and Secure Computing, IEEE Transactions on 9(1), 30–45 (2012)
  • [31] Xu, S., Qian, H., Wang, F., Zhan, Z., Bertino, E., Sandhu, R.: Trustworthy information: concepts and mechanisms. In: Web-Age Information Management, pp. 398–404. Springer (2010)
  • [32] Xu, Y., Shen, F., Zhao, J.: An incremental learning vector quantization algorithm for pattern classification. Neural Computing and Applications 21(6), 1205–1215 (2012)
  • [33] Zhan, Z., Xu, M., Xu, S.: Characterizing honeypot-captured cyber attacks: Statistical framework and case study. Information Forensics and Security, IEEE Transactions on 8(11), 1775–1789 (2013)
  • [34] Zhan, Z., Xu, M., Xu, S.: A characterization of cybersecurity posture from network telescope data. In: Proceedings of the 6th international conference on trustworthy systems, Intrust, vol. 14 (2014)
  • [35]

    Zhang, F., Gao, Y., Bakos, J.D.: Lucas-kanade optical flow estimation on the ti c66x digital signal processor.

    In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pp. 1–6 (2014)
  • [36] Zhang, F., Hu, J.: Bayesian classifier for anchored protein sorting discovery. In: Bioinformatics and Biomedicine, 2009. BIBM’09. IEEE International Conference on, pp. 424–428 (2009)
  • [37] Zhang, F., Hu, J.: Bioinformatics analysis of physicochemical properties of protein sorting signals (2010)
  • [38] Zhang, F., Zhang, Y., Bakos, J.: Gpapriori: Gpu-accelerated frequent itemset mining. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pp. 590–594 (2011)
  • [39] Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. The Journal of Supercomputing 66(1), 94–117 (2013)
  • [40] Zhang, Y., Zhang, F., Bakos, J.: Frequent itemset mining on large-scale shared memory machines. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pp. 585–589 (2011)
  • [41] Zhang, Y., Zhang, F., Jin, Z., Bakos, J.D.: An fpga-based accelerator for frequent itemset mining. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 6(1), 2 (2013)
  • [42] Zhao, D., Zhang, Z., Zhou, X., Li, T., Wang, K., Kimpe, D., Carns, P., Ross, R., Raicu, I.: Fusionfs: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 61–70 (2014)
  • [43] Zhou, X., Chen, H., Wang, K., Lang, M., Raicu, I.: Exploring distributed resource allocation techniques in the slurm job management system. Illinois Institute of Technology, Department of Computer Science, Technical Report (2013)