1 Introduction
Multiinstance learning (MIL), originally proposed for drug activity prediction [6]
, has been applied more frequently to diverse visual recognition tasks such as image retrieval, image classification, object detection, and visual tracking. In MIL, a typical weaklysupervised learning, training data are given as a form of labeled bags, each of which is composed of a wide diversity of instances associated with input features. The aim of MIL, in a binary task, is to train a classifier to predict the labels of testing bags, which is based on the assumption that a positive bag contains at least one positive instance while a bag is labeled negative if it is only constituted of negative instances. Thus, the crux of MIL is to deal with the ambiguity of instances’ labels, especially in positive bags which have plenty of cases with different compositions.
In a way, this weakly labeled instance framework cater to many existing visional tasks such as object recognition task for the reason that intrinsic structure of MIL is able to deal with some problems perfectly hence facilitate solutions. Take image classification for instance, an image is defined as a bag and patches in the image can be regarded as instances. Then according to the purpose of MIL, specific objects or key features can be defined as positive. By means of this MIL representation, crucial information can be captured.
Up to now, different algorithms have been designed to solve MIL problems. The previous methodologies are mainly in three folds: (1) Selecting key/discriminative instances and classifying bags based on the selected instances using generative or discriminative models, e.g., EMDD [23], miSVM [1] and the key instance detection method [12]. (2) Mapping a bag into a highdimensional feature space to get a vector representation of bag then training bag classifier, e.g. miFV [19]. (3) Constructing bag representation based on the internal structure of a bag  the relation between instances within a same bag, e.g., miGraph [24].
Different from the previous strategies, we aim to build baglevel representation based on the relative distance between bags. Take one bag for instance, other bags are regarded as reference bags functioned as the basis of the feature space. The derived bag representation is called bag reference vector (BRV). Then MIL task is transformed into a problem of classifying BRV. Thus, our method to solve this task is named miBRV (multiinstance learning of bag reference vector) as a whole.
Our motivation of proposing BRV is that it can capture the essential character of MIL. For supervised learning, the intraclass similarly should be higher than the interclass similarity. In the same manner, the similarities between bags are distinctive features. To measure bag similarity, we use settoset distance which considers all pairwise relations between two bags. We consider all pairwise distances between the instances in the compared bags, which is a typical settoset distance. In this paper, we extent Hausdorff distance [11] as the settoset distance measure. Furthermore, considering the ambiguity of instances in bags, we adopt a range of operators in Hausdorff distance to represent these relations.
The pipeline of generating bag feature is illustrated in Fig. 1. For every reference bag and operator, the input bag has a distance value to the reference bag based on the operator measurement. Total length of the bag reference operator is the product of the number of reference bag and the number of operators.
2 Related Work
Multiinstance learning (MIL) has received a lot of attentions since it helps to solve a range of real applications. Till now, lots of MIL methods have been proposed to either develop effective MIL solvers or apply MIL to solve application problems. Firstly, we briefly review a few popular MIL solvers. The EMDD method [23] uses EM to infer instance space with many instances from different positive bags and few instances from negative bags. Instead of adopting simple instance space, the miRPCA method [17]
utilizes robust PCA model to build a instance model robust to outliers. Besides of generative instance models, discriminative models are more popular as instance model. For example, both MILBoost
[16] and miSVM use discriminative methods, Boosting and SVM respectively, as instance models, and iteratively select positive instances to train models. Furthermore, miGraph [24] represents bag as graph and explicitly model the relationships between instances within a bag; while [5] models the relationships between different bags using conditional random field. Recent work [2] studies the problem if there are infinite number of instances in a bag.MIL is useful for many computer vision applications. Originally, MIL is widely applied to image classification
[13, 21], since it is able to exploit salient region in image where is critical for classification. In [9], a variant of miSVM called latent SVM is effective to find the parts of object for accurate object detection. Online MIL algorithms [3, 22] are popular for visual tracking. Recently, MIL has been widely used for weaklysupervised object detection [18, 4].3 Multiinstance Bag Reference Vector
In this section we will illustrate our bagreferencebased method applied to MIL problem. The miBRV aims to construct a vector representation for each bag by computing the similarity (distance in our method) with all other bags which are taken as the reference, transferring original features (with complex structure) into new bagreference features containing rich information with simple linear structure. Our intuition is to use the distances with the reference bags to describe the bags’ intrinsic constitution and then train this affinity matrices to gain map function by a linear SVM.
3.1 Multiinstance Learning
Initially, we introduce the formal formulations of Multiinstance Learning. Given a data set of , where is the number of bags, each bag is consisted of grouped instances and labeled with while the instances’ labels are unknown. A positive bag contains at least one positive instance while there are only negative instances in negative bags. Thus, the task of Multiinstance Learning is to induce a classifier (or a mapping function) to predict the labels of input bags.
3.2 Bag Reference Vector
As what have been mentioned in section 1, intending to measure the relations between bags by means of settoset distances, we apply an operator to represent these distances. Hausdorff distance is a suitable technique to determine the extent to which one bag differs from another.
3.2.1 The Hausdorff Distance
Given two point sets A = {}, B = {}, the Hausdorff distance is defined as
where
Here the function , directed Hausdorff distance, is called forward Hausdorff distance from set to set as well. In addition, represents Euclidean distance between and , i.e., a pointtopoint distance . For each , the algorithm will compute the pointpoint distance from to and find the nearest point in set to with the least Euclidean distance, which is regarded as a pointset distance . Hence, from set to , we are able to gain a distance vector ,…, with dimensions and vice versa as it is asymmetric distance. According to the definition for function , the biggest one among these shortest distances will be select as the value of , representing the distance form to . In this way we define a method to compute the settoset distance, in other words, this measures the similarity between set and set .
In terms of MIL, similar definition can be applied to it. Naturally, we treat each bag as a set like or and instances as the points. Thus, for bag and , we can apply forward Hausdorff distance to it as
For bag , we can obtain a bagreference vector
where . In addition, is normalized to reduce the influence of instance magnitude variation. With each bag’s vector computed, an affinity matrices can be extracted by means of Hausdorff distance. Then each bag is delineated by being compared with the reference bags. The feature matrices are fed into a bag classifier along with bag labels for training and validation.
Pseudo code of miBRV is shown in Algorithm 1.
3.2.2 Extensions of Hausdorff Distance
The Hausdorff distance defines a pointtoset distance by finding the nearest point in that set with the least Euclidean distance and then chooses the maximum among these pointtoset distances as final settoset distance. Thus, the operator is to obtain the maximal one among minimum values. This operation is suitable to most cases to gain correct descriptions for positive and negative bags.
Furthermore, considering the characteristic of MIL that some positive bags may well include negative instances, there will be some flaws in this algorithm for all multiple cases. For instance, if is a positive bag with one negative instance while is composed of positive instance only, after using Hausdorff distance, the bagtobag distance will be equal to the maximal instancetobag distance in ,…,, which is as has the largest distance with all instances in . This indicates using the Hausdorff distance between two positive bags results in choosing the distance from negative instance to positive bags to represent the similarity, which gives the misleading information. Consequently, modifications can be adopted in Hausdorff distance to gain several new affinity matrices as complements. Specifically, maximum, average and minimum operators have been added to enrich the distance definition to ameliorate incorrect representation. The following illustrates Hausdorff distance as well as other five distance measurement operators paralleled to it.
Apart from extending the Hausdorff distance by adding more operators, many incorrect measured cases can be avoided by taking nearest or farthest neighbors’ average distance rather than just adopting one extreme case. To make it a practice, we firstly define two functions and which are computing the largest and the smallest distance respectively. To measure the distance between bag and , we can implement this addition to modify the function as follows.
Algorithm  Musk1  Musk2  Elephant  Fox  Tiger  Average 

miBRV  0.895 0.078  0.930 0.088  0.877 0.102  0.670 0.075  0.877 0.102  0.851 
miFV[19]  0.909 0.089  0.884 0.094  0.852 0.081  0.621 0.109  0.813 0.083  0.816 
miGraph[24]  0.889 0.073  0.903 0.086  0.869 0.078  0.616 0.079  0.801 0.083  0.816 
MIBoosting [20]  0.837 0.120  0.790 0.088  0.827 0.073  0.638 0.102  0.784 0.089  0.775 
miSVM [1]  0.874 0.120  0.836 0.088  0.822 0.073  0.582 0.102  0.789 0.089  0.781 
EEDD [23]  0.849 0.098  0.869 0.108  0.771 0.098  0.609 0.101  0.730 0.096  0.766 
MIWrapper [7]  0.849 0.106  0.796 0.106  0.827 0.088  0.582 0.102  0.770 0.092  0.765 
The final representation of bag , bag reference vector, is computed by a combination of these six distance operators, and denoted as
Combining all or some of these six distance operators will extract more distinctive features so that we can gain more comprehensive information for each bag with less training errors and improve the accuracy of the classifier.
3.3 Bag Classification using Linear SVM
As we can get a vector representation for a bag, we can use many existing classifiers for bag classification, such as SVM, Boosting, Random Forest. For efficiency, we use SVM with linear kernel for bag training and bag label prediction. The whole pipeline is illustrated in Algorithm
1. It consists with two steps, training and testing. For both training and testing, we use the LibLinear [8] toolbox.4 Experiments
4.1 Benchmark Data sets
In order to evaluate our method, we perform experiments on five benchmark data sets universally designed for MIL, including two Musk data sets [6] about molecule activity and three categories (elephant, fox, tiger) image data sets [1]
. In details, there are 47 positive and 45 negative bags in Musk1 while Musk2 are composed of 39 positive and 63 negative bags which are described by conformations with 166dimensional feature vector. On other three benchmark image data sets, each one is composed of 100 positive bags and 100 negative bags. We perform training and testing for ten times by 10fold crossvalidation, and average classification accuracy and standard deviation of each class are reported.
Several popular MIL algorithms including the stateofthearts: miFV [19], miGraph [24], MIBoosting, miSVM [1], EMDD [23], and MIWrapper [7], are referred for comparison to evaluate our results. As shown in Table 1, it indicates that miBRV are so competitive that it achieves the highest performance except on the MUSK1 data set. The average accuracy of miBRV over the five data sets has been improved by 3.5% to a large margin when comparing to the latest miFV method. The excellent results clearly demonstrate that miBRV is robust and can extract the most effective representation for a bag in MIL problems.
4.2 Text Categorization
Besides the benchmark tasks, the text categorization is another common application of MIL. For better comparison, we take the same twenty data sets derived from the 20 Newsproups corpus as in [24]. In each category, there are 100 bags among which half bags are positive and others are negative. In addition, each instance is a post represented by the top 200 TFIDF features.
In the same way, we carry out experiments on this data set using 10fold crossvalidation and report the average accuracy in Table 2. On this occasion, comparisons have been made between our miBRV, MIKernel and miGraph on these text categorization tasks. On 13 data sets out of 20, miBRV achieves the superior performance. The best average accuracy over all data sets indicates that the miBRV outperforms other two competing algorithms, miGraph and MIKernel [10].
Data set  MIkernel  miGraph  miBRV 

alt.atheism  60.2  65.5  77.0 
comp.graphics  47.0  77.8  72.1 
comp.os.mswindows.misc  51.0  63.1  64.1 
comp.sys.ibm.pc.hardware  46.9  59.5  69.0 
comp.sys.mac.harware  44.5  61.7  70.7 
comp.window.x  50.8  69.8  80.7 
misc.forsale  51.8  55.2  61.2 
rec.autos  52.9  72.0  64.1 
rec.motorcycles  50.6  64.0  54.4 
rec.sport.baseball  51.7  64.7  77.8 
rec.sport.hockey  51.3  85.0  85.0 
sci.crypt  56.3  69.6  70.3 
sci.electronics  50.6  87.1  90.7 
sci.med  50.6  62.1  74.8 
sci.space  54.7  75.7  67.8 
sci.religion.christian  49.2  59.0  68.6 
talk.politics.guns  47.7  58.5  66.2 
talk.politics.mideast  55.9  73.6  65.1 
talk.politics.misc  51.5  70.4  63.8 
talk.religion.misc  55.4  63.3  60.8 
Average  51.5  67.8  70.1 
Parameters  Musk1  Musk2  Elephant  Fox  Tiger 

=1  0.882 0.088  0.907 0.095  0.849 0.076  0.623 0.098  0.829 0.076 
=2  0.870 0.100  0.901 0.098  0.850 0.067  0.669 0.101  0.841 0.080 
=3  0.862 0.075  0.893 0.102  0.838 0.075  0.670 0.098  0.815 0.085 
=4  0.860 0.111  0.895 0.093  0.831 0.070  0.662 0.091  0.816 0.083 
Parameters  Musk1  Musk2  Elephant  Fox  Tiger 
0.872 0.094  0.909 0.083  0.797 0.075  0.670 0.111  0.803 0.085  
0.886 0.092  0.930 0.088  0.843 0.076  0.611 0.113  0.847 0.085  
0.880 0.102  0.903 0.100  0.877 0.071  0.640 0.102  0.877 0.067  
0.870 0.100  0.901 0.098  0.850 0.067  0.669 0.101  0.841 0.080  
4.3 Parameters Discussion
To deeply investigate miBRV, we discuss two main parameters in miBRV in this subsection. As illustrated in Section 3, we generate the final vector by combining different affinity matrices which are mapped by different distance operators together such as or just selecting some of them. In addition, the value of , the number of averaging neighbors to be adopted, is a significant parameter for our experiment as well.
At first, we keep one factor, distance operator, unchanged with changing form 1 to 4. The part of results for are shown in Table 3. These results illustrate that increasing ameliorates the accuracy on Elephant and Fox but experiences a decline on Musk data sets at the same time. As a whole, it reaches the acme of average accuracy at =2.
Then we fix to 2 and test some different combinations of distance functions to extract a more informative feature vectors for diverse cases. Generally, higher dimensions feature vector improves the performance of classifier. And the more distance operators we used, the more robust miBRV is as a feature vector. Table 4 contains some details of the results with different parameters, from which we can find that, although results changes a lot from different parameters, most averaging accuracy are competitive with the stateoftheart algorithms. The best performance of each column is bolded.
5 Conclusions
In this paper, we propose a novel technique for Multiinstance Learning. We focus on the inherent information on each bag, trying to delineate it by computing the similarity to other bags. In addition, our diverse distance definition fits it well, considering the crux of MIL that the proportion of positive instances in positive bags is ambiguous. No previous works adopt this straightforward but efficacious feature representation method. And the performances of our algorithm on these data sets popularly used for emulating MIL algorithms are superior to the stateoftheart algorithms. What’s more, the proposed method produces a very simple vector representation for a bag, which works well with a linear SVM. Both the methodology and experimental results of the proposed approach show that it is very robust and effective.
In the future, on one hand, we may extend our method by changing the choice of reference bags. For instance, we can generate a great deal of reference bags in which the instances are randomly selected from the original bags. By this way we may describe our bags more accurately with more references if we can solve its possible computational expense. On the other hand, hewing to the intrinsic characteristic of MIL, we can extract features which describe the relationship of the instances in each bag by means of, for instance, adding some mathematical statistics such as standard deviation of instances in a bag, allowing us to distinguish different bags more clearly to solve the core problem of MIL tasks.
Acknowledgments
This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61503145).
References
 [1] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multipleinstance learning. In NIPS, 2002.
 [2] B. Babenko, N. Verma, P. Dollar, and S. Belongie. Multiple instance learning with manifold bag. In ICML, 2011.
 [3] B. Babenko, M.H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1619–1632, 2011.
 [4] R. G. Cinbis, J. Verbeek, and C. Schmid. Multifold mil training for weakly supervised object localization. In CVPR, 2014.
 [5] T. Deselaers and V. Ferrari. A conditional random field for multipleinstance learning. In ICML, 2010.
 [6] T. G. Dietterich, R. H. Lathrop, and T. LozanoPérez. Solving the multiple instance problem with axisparallel rectangles. Artificial intelligence, 89(1):31–71, 1997.
 [7] E.T.Frank and X.Xu. Applying propositional learning algorithms to multiinstance data. University of Waikato,Department of Comuter Science,University of Waikato, Hamilton, NZ, Tech. Rep., 2003.

[8]
R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin.
Liblinear: A library for large linear classification.
The Journal of Machine Learning Research
, 9:1871–1874, 2008.  [9] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010.
 [10] T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multiinstance kernels. In ICML, 2002.
 [11] D. P. Huttenlocher, G. Klanderman, W. J. Rucklidge, et al. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9):850–863, 1993.
 [12] G. Liu, J. Wu, and Z.H. Zhou. Key instance detection in multiinstance learning. 2012.

[13]
O. Maron and A. L. Ratan.
Multipleinstance learning for natural scene classification.
In ICML, 1998.  [14] D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.
 [15] W. Shen, B. Wang, Y. Wang, X. Bai, and L. J. Latecki. Face identification using referencebased features with message passing model. Neurocomputing, 99:339–346, 2013.
 [16] P. Viola, J. C. Platt, and C. Zhang. Multiple instance boosting for object detection. In NIPS, 2006.
 [17] X. Wang, Z. Zhang, Y. Ma, X. Bai, W. Liu, and Z. Tu. Oneclass multiple instance learning via robust pca for common object discovery. In ACCV. 2013.
 [18] X. Wang, Z. Zhang, Y. Ma, X. Bai, W. Liu, and Z. Tu. Robust subspace discovery via relaxed rank minimization. Neural computation, 26(3):611–635, 2014.
 [19] X.S. Wei, J. Wu, and Z.H. Zhou. Scalable multiinstance learning. In ICDM, 2014.
 [20] X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In Knowledge Discovery and Data Mining, pages 272 C–281. Proc. 8th PacificAsia Conf, 2004.
 [21] Z.J. Zha, X.S. Hua, T. Mei, J. Wang, G.J. Qi, and Z. Wang. Joint multilabel multiinstance learning for image classification. In CVPR, 2008.
 [22] K. Zhang and H. Song. Realtime visual tracking via online weighted multiple instance learning. Pattern Recognition, 46(1):397–411, 2013.
 [23] Q. Zhang and S. A. Goldman. Emdd: An improved multipleinstance learning technique. In NIPS, 2001.
 [24] Z.H. Zhou, Y.Y. Sun, and Y.F. Li. Multiinstance learning by treating instances as noniid samples. In ICML, 2009.
Comments
There are no comments yet.