1 Introduction
Object Categorization is a challenging problem, especially when the images have clutter background, occlusions or different lighting conditions. In the past, many descriptors have been proposed which aid object categorization even in such adverse conditions. Each descriptor has its own merits and demerits. Some descriptors are invariant to transformations while the others are more discriminative. For example, Scale Invariant Feature Transformation (SIFT [1]) is invariant to affine transformations, geometric blur descriptor [2] is robust to shape deformation and pyramid histogram of gradient [10] is invariant to geometric and photometric transformations. Past research has shown that, employing multiple descriptors rather than any single descriptor leads to better recognition [3, 4]. The project focuses on the problem of learning the optimal combination of the available descriptors for a particular classification task.
AdaBoost for combining the descriptors has been developed which is inspired by the MKL work where each kernel is formed with different descriptors. Difference between AdaBoost SVM with different descriptors and MKL is AdaBoost gives weight on the SVM classifier, each SVM with different descriptors in kernel, where MKL gives weights on each kernel.
In [3, 11, 12, 4], the authors employ the Multiple Kernel Learning (MKL) framework [5] to find the optimal combination of descriptors (kernels). The goal of MKL is to simultaneously optimize the combination of kernels and the usual classification objective. Most of the existing MKL formulations perform a 1 regularization [8, 3] over the kernels. This is equivalent to selecting the best kernel from the given set of kernels; which, as discussed earlier, might be suboptimal for object categorization tasks. One way to circumvent this problem of optimal weights being zero for many of the kernels, was introduced in [3], where an additional constraint to employ prior information is included.
A new formulation for the MKL problem based on block  and mixed norm( and 1 norm)regularization has been developed. It is well known that such a regularization would induce “equal weightage” to all the kernels rather than sparsity is developed. Hence would be ideal for applications such as object categorization, in which a combination of the descriptors is known to perform better than any single descriptor.
These new MKL formulations are Second Order Cone Programs(SOCP) which can be solved using solvers like Mosek, SeDuMi, etc. Other efficient alternative algorithms are also developed which alternates between SVM optimization parameter and kernel weights. Empirical results on Caltech4, Caltech101 and Oxford flower datasets show significant improvement using these new MKL formulations.
The outline of the report is as follows: section 2 briefly reviews the work on object recognition. Existing MKL formulations is given section 3 and the new MKL formulations, is presented in section 4 5. In the subsequent section, efficient algorithms for solving the proposed MKL formulation are discussed. Section 6 presents experimental results on synthetic and realworld datasets which illustrate the merits of the new MKL formulation. The results show that the new formulations achieves better recognition compared to stateoftheart, which is an 1 regularization based formulation. Video change detection problem is presented in section LABEL:sec:vid. The report concludes in section 7 by summarizing the work.
2 Related Work
This section provides some of work done in area of Machine learning involved in Object categorization. SVMKNN
[13] gets motivation from Local learning which uses KNearest neighbor to select local training point and uses SVM algorithm in those local training points for classification of object. Main problem here is time taken for classification.Multiple kernel learning considers the scenario where several descriptors (kernels) for a particular classification task are available. It aims to simultaneously learn the optimal combination of the given kernels and the optimal classifier parameters that maximize the generalization ability. Most of the work on MKL, since it was first introduced in [5], concentrates on the employment of a block 1 regularization. The main features of it being: a)
1 regularization leads to sparse combination of the kernels, and hence automatically performs feature selection b) very efficient algorithms to solve the formulation exist
[6, 7, 8, 9].There has been lot of work on combining descriptors for the object categorization task [3, 11, 10, 4, 12, 14, 15, 16, 17, 18, 19, 20]. In [10], the authors introduce spatial pyramid kernel and combine shape (pyramid histogram of gradient), appearance descriptors for object classification. In [4], the Support Kernel Machine [6], which is again based on 1 regularization, is employed for combining descriptors for object categorization. In [12], a sample dependent local ensemble kernel machine is learned for object categorization. In [3], the authors use six descriptors for object categorization and employ a MKL formulation for learning the optimal combination of descriptors. However, as observed by the authors, most of the (important) kernels get eliminated in the optimal combination. This, as discussed above, is a consequence of employing the 1 regularization. In order to circumvent the problem of optimal weights being zero for most of the kernels, the authors introduce additional constraints and parameters to utilize additional prior information regarding the kernels. This MKL formulation [3] is known to achieve stateoftheart performance for many object recognition tasks. In [11], four descriptors for flower classification task were combined using the multiple kernel learning formulation in [3] and this is shown to achieve stateoftheart performance on such tasks.
In summary, most of the existing methodologies for object categorization employ the 1 regularization based MKL formulation and its variants. As discussed earlier, such a regularization leads to kernel selection rather than kernel combination and hence suboptimal for object categorization tasks. MKL formulation with block  regularization and CKL, which is more suited for combining kernels as opposed to selecting kernels is presented in this report.
3 Multiple kernel learning
This section gives brief introduction about MKL. Let
denote the feature vector of the
training datapoint in the feature space ( kernel). Suppose denotes its label. Let represent the matrix whose columns are the training datapoints in the feature space. Also, let , be the gram matrix of the training datapoints in the feature space. Note that may not be explicitly known; the gram matrices, , are assumed to be known. Let represent the column vector, diagonal matrix with entries as labels of the training datapoints respectively.Let the discriminating hyperplane be
(here, denotes the feature space representation of the datapoint, .is the number of kernels given). The usual softmargin Support Vector Machine (SVM)
[21, 17] formulation with this notation is:s.t.  
(L2MKL) 
This is evident by considering and in the usual SVM formulation. It is easy to see that in this case the grammatrix of the datapoints ( is the number of training datapoints) is nothing but . Hence, in the context of MKL, the optimal kernel with this formulation is nothing but a simple sum of the given kernels.
Another alternative which has been extensively explored in the past [5, 8] was to employ a block 1 regularization in order to perform kernel selection. This formulation can be written as:
s.t.  
(L1MKL) 
Note that this formulation performs a 1 regularization over . Hence it automatically performs kernel selection and is equivalent to selecting one (the best) of the given kernels. Since the formulation promotes sparsity in the usage of the given kernels, it is best suited for feature selection applications rather than for applications like object categorization where each kernel is believed to provide important information regarding the classification problem at hand.
4  regularization MKL Formulation
The alternative to that of 1 regularization is to perform block  regularization. Such a regularization promotes the use of all the kernels while assuming they are equally preferable. The proposed MKL formulation can be written as:
s.t.  
(LiMKL) 
In the remainder of this section, the ranges of the indices are omitted for convenience. The (LiMKL) formulation is same as:
s.t. 
In the following text the dual of the proposed MKL formulation is derived. The Lagrangian turns out to be:
where are the Lagrange multipliers. From the KKT conditions:
(1)  
(2)  
(3)  
(4) 
Now, suppose that all the grammatrices are positivedefinite (add a small ridge if singular, see also [8]). Then, (1) implies that if for some , then . Clearly, in this case rest of the must also be zero — which is not possible since . Hence .
Eliminating the primal variables, the dual can be written as:
s.t.  (5)  
where and denote the column vectors with entries as respectively.
Though the dual in (12) has more variables, it gives more insight into the structure of the solution. Consider rewriting the dual (12) in the following way:
s.t.  (6) 
where is the optimal value of the following convex QP:
(7) 
Note that (4) is nothing but a usual SVM problem and hence the optimal is very sparse. Infact, algorithms which exploit this sparsity in solution and outperform standard QP solvers exist [22].
4.1 Algorithms for solving the LiMKL Formulation
The LiMKL formulation, can be solved using standard Second Order Cone Program (SOCP) solvers(e.g., SeDuMi^{1}^{1}1http://sedumi.ie.lehigh.edu/, Mosek^{2}^{2}2www.mosek.com). However the optimization problem would involve conic quadratic constraints (, the number of kernels, can be large). Also, the size of the optimization problem (, the number of training datapoints), can be large. Hence generic cone solvers fail to solve for large or . Interestingly, there are more efficient ways to solve the dual formulations (12). The following sections explain in brief the possible methodologies.
4.2 Alternating Minimization Algorithm
The dual (12) can be solved efficiently using an alternating minimization algorithm in the variables and . Note that for a fixed value of , (12) is nothing but the SVM dual (which has very efficient scalable solvers). Also, for fixed value of , the minimization wrt. is the following simple problem:
s.t.  (8) 
where . It is easy to show that the optimal values of for the problem (4.2) are ():
(9) 
5 Composite MKL Formulation
This section explains Composite MKL. Suppose descriptors are available. Further, for each of these descriptors Kernels (linear, polynomial, Gaussian) are defined. Let number of Kernels of the descriptor be denoted by . Also, let denote the mapping induced by the Kernel of the descriptor. The hyperplane classifier to be learnt has the form . The objective is to choose the “best” combination of these Kernels in order to maximize the generalization. The idea is to combine the Kernels in such a way that: a) all descriptors are given equal priority (weightage) b) best of the Kernels in each descriptor are selected. In other words, perform an  regularization over the parameters () such that each descriptor is given equal priority. Further, perform an 1 regularization such that sparsity in selection of Kernels belonging to each descriptor is encouraged. Mathematically, the formulation can be written as:
s.t.  (10)  
(CKL)  (11) 
where is the training dataset. is the regularization parameter.
Let denote the vector with entries as the labels. Let be the set . Denote the set by . The dual of the above formulation can be written as:
(12) 
where . Here, is the grammatrix of the training datapoints with the Kernel of the descriptor and is the diagonal matrix with entries as the labels. The grammatrices are assumed to be positive definite.
One can solve the above dual (12) using a simple alternating algorithm described below. Due to compactness of feasibility sets and convexity of the objective, the order of min., max. can be rearranged. Also since the variables are not interlinked with the variables , for , instead of minimizing sum over index one can sum the minima:
Now it is easy to see that for fixed values of , the problem wrt. is same as the SVM problem. Also, for fixed values of , the problem wrt. is the following simple problem:
(13) 
which has a closed form solution described below. Consider solving
for a particular . This amounts to just picking the maximum among for . Let these maxima be denoted by . Hence (13) is equivalent to the following problem:
The optimal solution for this problem is: .
The overall algorithm is as follows:

Initialize with and .

At iteration , solve an SVM taking kernel as . Update as the solution of this SVM.

Using updated values of , compute the closed form solution of (13) using the methodology described above.

Repeat until convergence.
6 Numerical Experiments
This section presents the experimental results on standard object categorization. Various experiments also conducted using the Adaboost for combining descriptors on standard Object categorization dataset. The key idea is to show that the proposed  regularization and Composite regularization based MKL formulation leads to better generalization than the 1 regularization based MKL formulations, which represent stateoftheart methodologies for object recognition. The results on synthetic and realworld data are summarized in sections 6.2.1 and 6.2.3
respectively. In all cases, the parameters for the respective methods were tuned on a validation set. Also, the accuracies reported are on unseen testsets and hence represent a true estimate of the generalization performance of the respective classifiers. All multiclass problems were handled using the onevsone scheme.
6.1 Results using Adaboost
Adaboost mentioned in the result is performed with following setup:

Set of classifier for Adaboost is SVM.

Each SVM in that set is build on different base kernel mentioned in previous section.

Here each are build with descriptors like pyramid histograms of gradient, scaleinvariant feature descriptors.

Adaboost provides weight on the SVM classifier which is build on each such kernel mentioned above.
This AdaBoost is inspired by the Multiple kernel learning work where each kernel is formed with different descriptors. Difference between AdaBoost with SVM(with different descriptors) and Multiple kernel learning is AdaBoost gives weight on the SVM classifier(each SVM with different descriptors in kernel) where in multiple kernel learning gives weights on each kernel in a SVM problem.
Following table shows result.
Classifier  No of objects  Accuracy 

NNC  10  58% 
kNNC  10  62% 
SVM  20  59% 
Local Learning  10  76% 
AdaBoost + SVM  10  78% 
AdaBoost + SVM  20  63% 
6.2 Results of  MKL Experiments
This section provides experimental results for the LiMKL.
6.2.1 Synthetic Data
In this section, results on synthetic datasets showing the benefit of the proposed methodology is presented. The key result to establish is that the LiMKL formulation achieves better generalization, especially in cases where the redundancy in the given kernels is less, such as in applications like object categorization. For this, the experimental strategy given by [23]. We repeat the description of the experimental setup here for the sake of completeness.
We wish to create kernels whose degree of redundancy is controlled by a single parameter . First,
datapoints are sampled from two independent Normal distributions with covariance as the identity matrix (dimensionality of data is
). Here, datapoints sampled from different Normals are assumed to belong to different classes. Now the features are grouped into disjoint sets ( varies from 1 to ): where . We then sample copies from these disjoint sets, by randomly picking one by one from with replacement. For each of thesesets randomly generate a linear transformation matrix
( is a parameter). The grammatrices are computed as . Clearly, by varying , the redundancy in the kernels can be varied. More specifically, represents the extreme case where the redundancy in kernels is zero, and hence represents the bestsuited scenario for the proposed methodology. The other extreme case is , where the redundancy is maximum, and hence an ideal scenario for employing the 1 regularization based MKL.Figure 1 shows the plot of ratio of testset accuracies achieved by LiMKL and L1MKL
vs. redundancy in the given kernels (vertical bars represent variance in accuracy). As a baseline for comparison, plot a similar graph for the ratio of testset accuracy achieved by
LiMKL and L2MKL. Note that as the degree of redundancy decreases the ratio in case of both graphs increases; proving that LiMKL is wellsuited for applications like object categorization. In fact, observed a huge improvement in generalization over the L1MKL when is near 1 (as high as and in case of L2MKL). Also, in cases where the redundancy is high ( is near 0), LiMKL achieves generalization comparable to the other MKL formulations.6.2.2 Results on Caltech4 dataset
This section presents results on Caltech4^{3}^{3}3http://www.robots.ox.ac.uk/~vgg/data/datacats.html. Caltech4 dataset contains images of airplanes, cars, faces and bikes. We have taken 80 images for each class, of which 40 are randomly taken as the training/validation data and the remaining as test data. We have used Pyramid Histogram Of Gradient (PHOG) features generated^{4}^{4}4Code available at http://www.robots.ox.ac.uk/~vgg/research/caltech/phog.html at various levels (1,2,3) and angles (180,360). We have generated kernels on these six PHOG features using different parameters for the polynomial and Gaussian kernel (9 for each feature, totally 54 kernels). This experimental procedure was repeated for 20 times with different trainingtest data splits. The mean testset accuracies obtained with L1MKL and LiMKL were 92.002.44% and 93.502.14% respectively. This shows that the LiMKL achieves better generalization. Following figures 3 5 6 7 8 9 10 shows ratio of accuracies of LiMKL to L1MKL as function of number of kernels on Caltech4 dataset. Figure 4 shows confusion for Caltech4 dataset.
6.2.3 Results on Oxford dataset
The task in the Oxford flower dataset is to categorize images of 17 varieties of flowers. This dataset contains 80 examples for each class. In [11], the authors introduced four different features color, SIFT for foreground region, SIFT for foreground boundary, Histogram of Gradients for flowers. We have used the distances given in [11, 24]^{5}^{5}5http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html for our experimentation on this dataset. We have used same training, validation and test splits as used in [11]. The mean testset accuracy achieved by L1MKL and LiMKL are 85.881.83% and 87.351.72% respectively. Again, the results confirm that the proposed methodology achieves better generalization than stateoftheart. The accuracy achieved by the proposed formulation is comparable to the best accuracy reported in [11], which is 88.330.3%. Note that this stateoftheart accuracy was achieved after tuning the parameters for the various descriptors [11] and incorporating prior information following the strategy of [3]. As mentioned earlier, incorporating such prior information may further improve testset accuracies of the proposed formulation. Following figures 11 13 14 15 shows ratio of accuracies of LiMKL to L1MKL as function of number of kernels on Oxford flower dataset. Figure 12 shows confusion for Oxford flower dataset.
In the subsequent set of experiments the generalization performance of the LiMKL and L1MKL as a function of the number of base kernels is compared. The plots are shown in figures 11 3 for the two benchmark datasets. Figures show that in most of the cases LiMKL achieves better generalization than L1MKL. Also, in some cases the improvement is as high as 7.5%. Note that the base kernels were derived from the fixed sets of descriptors and hence have some degree of redundancy. These results show that the proposed formulation does achieve good improvement generalization even in these cases. The next set of experiments compare the performance of the methodologies at various values of the regularization parameter (see figure LABEL:fig:c_ox). Note that at performance of L1MKL drastically decreases for low values of C. In some cases the difference in accuracy between LiMKL and L1MKL is as high as 9%. Hence, the proposed formulation is less sensitive to the variation in the regularization parameter.
6.2.4 Results on Caltech101 dataset
This section presents results on Caltech101^{6}^{6}6http://www.vision.caltech.edu/Image_Datasets/Caltech101/. Caltech101 dataset contains 101 object categories. We have taken 30 images for each class, of which 15 are randomly taken as the training/validation data and the remaining as test data. We have used Pyramid Histogram Of Gradient (PHOG) features generated^{7}^{7}7Code available at http://www.robots.ox.ac.uk/~vgg/research/caltech/phog.html at various levels (1,2,3) and angles (180,360). We have generated kernels on these six PHOG features using different parameters for the polynomial and Gaussian kernel (9 for each feature, totally 54 kernels). This experimental procedure was repeated for 3 times with different trainingtest data splits. The mean testset accuracies obtained with L1MKL and LiMKL were 31.45% and 27.12% respectively. This shows that the LiMKL achieves better generalization.
6.3 Results of CKL Experiments
All the experiments in this section is carried out using descriptors available from ColorDescriptor software^{8}^{8}8http://staff.science.uva.nl/~ksande/research/colordescriptors/. The general procedure for the experiment is given in the figrue 6.3. All the experiments in this section follows:

Generate for all training images.

Generate all the 14 descriptors provided by the software(RGB histogram, Opponent histogram, Hue histogram, rg histogram, Transformed Color histogram, Color moments, Color moment invariants, SIFT, HueSIFT, HSVSIFT, OpponentSIFT, rgSIFT, CSIFT, Transformed Color SIFT).

Use no spatial pyramids.

Clusters all points from all training images to form a codebook for each descriptors.

Generate histogram of codebook of both training and testing images.

Train the classifier using these histograms as features.
Results obtained using above procedure is given in following sections.
6.4 Results on Caltech5 dataset
This section presents results on Caltech5^{9}^{9}9http://www.robots.ox.ac.uk/~vgg/data/datacats.html
using new MKL formulation and descriptors(csift, opponentsift, rgsift, sift, transformedcolorsift) provided from ColorDescriptor software. Caltech5 dataset contains images of airplanes, cars, faces, leopards and bikes. This section follows same procedure discussed in previous section. We have used cluster size of 100 to form codebook. Clusters are found using kmeans algorithm. Note here we have not run kmeans multiple times to find the best cluster or codebook. We have taken 100 images for each class, of which 15 are randomly taken to form a codebook. We have generated kernels on 5 descriptors provided using different parameters for Gaussian kernel (10 for each descriptor, totally 50 kernels). This experimental procedure was repeated for 5 times with different trainingtest data splits. Figure
16 reports mean accuracy as number of training size increases. Note here we have used same codebook generated on 15 training points for all training sizes.6.5 Results on Oxford dataset
The task in the Oxford flower dataset is to categorize images of 17 varieties of flowers. This dataset contains 80 examples for each class. In [11], the authors introduced four different features color, SIFT for foreground region, SIFT for foreground boundary, Histogram of Gradients for flowers. We have used the distances given in [11, 24]^{10}^{10}10http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html for our experimentation on this dataset. We have used same training, validation and test splits as used in [11]. The mean testset accuracy achieved by L1MKL, LiMKL and CKL are 85.3922%, 86.6667% and 86.6667% respectively.
6.6 Results on Caltech101 dataset
This section presents results on Caltech101^{11}^{11}11http://www.vision.caltech.edu/Image_Datasets/Caltech101/ using new MKL formulations and all 14 descriptors provided from ColorDescriptor software. We have taken 30 images for each class, of which 15 are randomly taken as the training/validation data and the remaining as test data. We have generated kernels on 14 descriptors provided using different parameters for Gaussian kernel (2 for each descriptor, totally 28 kernels). With cluster size 600 accuracy obtained is around 24.1% and with cluster size 300, accuracy obtained is 23.21%. Main problem in Caltech101, is problem of clustering for forming codebook. Because of huge size and dimension in data, we followed 2level kmeans. Our guess is that codebook formed is not good because clustering is not good.
7 Conclusions and Future Work
The project addressed the issue of combining various descriptors for a given object categorization problem in order to achieve better generalization. The project also briefly addressed problem of video change detection.
Adaboost has been designed for combining descriptors. Stateoftheart methodologies for object categorization employ a 1 regularization based MKL formulation, which is more suitable for selecting descriptors rather than combining them. The key idea is to employ a  regularization and mixed  and 1 regularization for combining the descriptors in an MKL framework. The new MKL formulation is better suited for object categorization and highly efficient algorithms which solve the corresponding convex optimization problem were derived.
Empirical results performed on synthetic and realworld benchmark datasets clearly establish the efficacy of the proposed MKL formulation. In some cases, the increase in accuracy when compared to the standard 1 regularization was as high as . The results also show that there is a consistant improvement in accuracy in almost all the cases, however, the improvement is maximized when the redundancy in the base kernels is low. Another advantage with the proposed formulation is that it is less sensitive to variation in the regularization parameter, .
Work is going on to experiment the new MKL formulations for Caltech101 dataset using codebook models described. Experiments is going on to form codebooks with different size as size of codebook largely affect classification accuracy. Novality of new MKL formulations will be known once experimentation on the bigger dataset has been done namely Pascal and Caltech256. Future work also includes to experiment the new MKL formulations for these bigger datasets.
References

[1]
David G. Lowe.
Distinctive image features from scaleinvariant keypoints.
International Journal of Computer Vision
, 60:91–110, 2004.  [2] A. Berg and J. Malik. Geometric blur for template matching. In CVPR, volume 1, pages 607–614, 2001.
 [3] M. Varma and D. Ray. Learning the discriminative powerinvariance tradeoff. In ICCV, pages 1–8, 2007.
 [4] A. Kumar and C. Sminchisescu. Support kernel machines for object recognition. In ICCV, 2007.
 [5] Gert Lanckriet, Nello Cristianini, Peter Bartlett, and Laurent El Ghaoui. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004.
 [6] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In ICML, 2004.
 [7] Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531–1565, 2006.
 [8] Alain Rakotomamonjy, Francis R. Bach, Stephane Canu, and Yves Grandvalet. Simple MKL. Journal of Machine Learning Research, 9:2491–2521, 2008.
 [9] Z. Xu, R. Jin, I. King, and M. Lyu. An Extended Level Method for Efficient Multiple Kernel Learning. In Advances in Neural Information Processing Systems, 2008.
 [10] A. Zisserman A. Bosch and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, 2007.
 [11] ME. Nilsback and A Zisserman. Automated flower classification over a large number of classes. In Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 2008.
 [12] Y. Y. Lin, T. Y. Liu, and C. S. Fuh. Local ensemble kernel learning for object category recognition. In CVPR, 2007.

[13]
Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik.
Svmknn: Discriminative nearest neighbor classification for visual
category recognition.
In
CVPR ’06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, pages 2126–2136, Washington, DC, USA, 2006. IEEE Computer Society.  [14] S. Lazebnik J. Zhang, M. Marszalek and C. Schmid. Local features and kernels for classification of texture and object categories: a comprehensive study. In IJCV, pages 213–238, 2007.
 [15] Saketha N. Jagarlapudi, Dinesh Govindaraj, Raman S, Chiranjib Bhattacharyya, Aharon Bental, and Ramakrishnan K.r. On the algorithmics and applications of a mixednorm based kernel learning formulation. pages 844–852, 2009.
 [16] Dinesh Govindaraj, Sankaran Raman, Sreedal Menon, and Chiranjib Bhattacharyya. Controlled sparsity kernel learning. CoRR, abs/1401.0116, 2014.
 [17] Prashanth Ravipally and Dinesh Govindaraj. Sparse classifier design based on the shapley value. In Proceedings of the World Congress on Engineering, volume 1, 2010.
 [18] Dinesh Govindaraj, Naidu K.V.M., Animesh Nandi, Girija Narlikar, and Viswanath Poosala. Moneybee: Towards enabling a ubiquitous, efficient, and easytouse mobile crowdsourcing service in the emerging market. Bell Labs Technical Journal, 15(4):79–92, 2011.
 [19] Dinesh Govindaraj, Tao Wang, and S. V. N. Vishwanathan. Modeling attractiveness and multiple clicks in sponsored search results. CoRR, abs/1401.0255, 2014.
 [20] Dinesh Govindaraj. Application of active appearance model to automatic face replacement. Journal of Applied Statistics, 2011.
 [21] Vladimir N. Vapnik. Statistical Learning Theory. WileyInterscience, 1998.
 [22] J. Platt. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods—Support Vector Learning, pages 185–208, 1999.
 [23] Marius Kloft, Ulf Brefeld, Pavel Laskov, and Soren Sonnenburg. Nonsparse multiple kernel learning. In Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, 2008.
 [24] ME. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006.