I Introduction
Multiple instance learning (MIL) was originally proposed for drug activity prediction [1]
. Now it has been widely applied in many domains and becomes an important problem in machine learning. Many multimedia data have the multiple instance (MI) structure, for example, a text article contains multiple paragraphs, an image can be divided into multiple local regions, and a gene expression data contains multiple genes. MIL is effective to process and understand MI data.
MIL is a kind of weaklysupervised learning (WSL). Each sample is in a form of labeled bags, composed of a wide diversity of instances associated with input features. The aim of MIL, in a binary task, is to train a classifier to predict labels of testing bags, which is based on the assumption that a positive bag contains at least one positive instance while a bag is negative if it is only constituted of negative instances. Thus, the crux of MIL is to deal with the ambiguity of instances labels, especially in positive bags which have plenty of cases with different compositions.
There are many algorithms have been proposed to solve the MIL problem. According to the survey by Amores [2]
, MIL algorithms can be divided into three folds: instancespace paradigm, bagspace paradigm and embeddedspace paradigm. Instancespace paradigm learns instance classifier and performs bag classification by aggregating the responses of instancelevel classifier. Bagspace paradigm exploits bag relations and treats bag as a whole; in particular, bagtobag distance/similarity is calculated; then the nearest neighbor or Bayesian classifier is able to do bag classification. Embeddedspace paradigm embeds a bag into a vocabularybased feature space to obtain a compact representation for the bag, e.g., a vector representation; then classical classifiers can be applied to solve the bag classification problem.
Deep neural networks have been applied to solve many machine learning problems. For supervised learning, there are several kinds of neural networks: Deep Belief Networks (DBN)
[3]use unsupervised pretraining and take a fixed length vector as input for feature learning and classification; deep Convolutional Neural Networks (CNN)
[4, 5]take 2D image as input and have dominated image recognition; deep Recurrent Neural Networks (RNN)
[6]and Long Short Term Memory (LSTM) networks
[7] take sequential data as input, such as text and speech, and are good at dealing with sequential prediction. Usually, training these deep networks requires a large number of fully labeled data, i.e., each instance requires a label. However, in MIL, only bag labels can be got. Meanwhile, MI data have a more complex structure which is a set of instances. The numbers of instances are different for different bags. These problems make it hard to deal with MIL problem by conventional neural networks.Before the raising of deep learning, there were some research works trying to solve the MIL problem using neural networks. Ramon and Raedt [8]
firstly proposed a multiple instance neural network (MINN). The network estimates instance probabilities before the last layer and calculates bag probability using a convex max operator (i.e., logsumexp). The network can be trained using backpropagation. Zhang and Zhou
[9] also proposed a multiple instance network which calculates bag probability by directly taking the max of instance probabilities.A MINN takes a various number of instances as input. For each instance, its representation is gradually learned layer by layer guided by multiple instance supervision. To inject multiple instance supervision, there are two different network architectures. Following the naming style in a classical MIL work [10], we name the two networks as miNet and MINet, which aim at the instancespace paradigm and embeddedspace paradigm [2] respectively. In miNet, there are instance classifiers in the each layer. We are able to obtain instance labels for both training and testing bags, which is an appealing property in some applications. While in MINet, there is no instance classifier. It directly builds a fixedlength vector as the bag representation and then learns bag classifier. Compared with miNet, MINet can obtain better bag classification accuracy. The previous works are in the category of miNet. We newly propose MINet in this paper.
A key component in MINN is MIL Pooling Layer (MPL), which aggregates either instance probability distribution vectors or instance feature vectors into a bag feature vector. It bridges MI data with conventional neural networks. Since it must be differentiable, there are a few choices, such as max pooling, mean pooling, and logsumexp pooling. These pooling methods are compared and discussed in the experiments part. Besides of MIL pooling layer, we use fully connected layers with nonlinear activations for instance feature learning. In MIL benchmarks, instance features are handcrafted and raw data of instances are given. Even so, it is beneficial to do feature transformation guided by the supervision of bag labels. In the last of MINet, we use a fully connected layer with only one neuron to match the predicted bag label with groundtruth in training.
Training neural networks using complex MI data is a challenging task. To learn good instance feature, we have tried to adopt various recent progress of deep learning in MINN, such as dropout [11]
, ReLU
[12], deeply supervised nets (DSN) [13][14]. We find DSN is the most effective one. This is due to DSN is able to better use hierarchical features in networks. Also, residual connections do a great job in networks.To summarize, we revisit the problem of solving multiple instance learning using neural networks. This branch of MIL algorithm is ignored by current MIL research community. But it is highly effective and efficient. Different from most MIL algorithms, it is able to learn instance features in an endtoend manner. This paper focuses on neural networks for endtoend MIL with comprehensive studies on MIL benchmarks. The main contributions of this paper include two extremely fast and scalable methods for MIL, i.e., miNet and MINet, and introducing deep supervision and residual connections for MIL.
We organize the rest of this paper as follow. Section II briefly reviews previous works on MIL. In Section III, we propose endtoend MIL networks. Our experimental results are presented on several MIL benchmarks in Section IV. Some discussions of experimental setups are presented in Section V. Finally, in Section VI we conclude the paper with some future works.
Ii Related Work
Previous works on solving MIL using neural networks include [8, 9, 15, 16]. [8] introduced to use a logsumexp as the convex max to calculate bag probabilities from instance probabilities. [9]
changed to a different loss function and directly applied max function.
[15]improved multiple instance neural networks by feature selection using Diverse Density and PCA.
[16] showed that ensemble methods could be integrated with multiple instance neural networks. Then, solving MIL using neural networks has been ignored in machine learning research. This paper revisits this problem, proposes new network structures, and investigates recent neural network tricks.Multiple Instance Learning (MIL) has received a lot of attentions since it helps to solve a range of real applications. Till now, lots of MIL methods have been proposed to either develop effective MIL solvers or apply MIL to solve application problems. A comprehensive survey of MIL algorithms and applications can be found in [2]. Here, we focus on give a brief review of the most recent MIL algorithm, especially the ones related to deep neural networks and feature learning.
From the view of embeddedspace paradigm for MIL, the most recent method is the scalable MIL algorithm, i.e., solving MIL using Fisher Vector (FV) coding [17], which is called miFV [18]
. miFV transforms instance feature into highdimensional space using a pretrained Gaussian mixture model and FV coding. The proposed MINet learns instance feature using deep multiple instance supervision. And MINet achieves better bag classification accuracy and is much faster than miFV.
The idea of using neural networks for solving MIL problem has been studied in some computer vision studies, such as [18, 19]. Wu et. al [18] proposed deep MIL which uses max pooling to find positive instances/patches for image classification and annotation. Pinheheiro et. al [19]
used logsumexp pooling in deep CNN for weakly supervised semantic segmentation. The proposed miNet follows the path of these two works; different from them, miNet utilizes deep supervision, and focuses on more general MIL problems. Besides of integrating MIL into deep neural networks, Wang et. al proposed a method to combine MIL with support vector machine using a relaxed MIL constraint
[20]and applied this for object discovery. However, they pay more attention on vision applications (e.g., image classification, image annotation, and semantic segmentaion, etc.), which are based on convolutional image features. Meanwhile, they always finetune neural network models pretrained on other much larger datasets like ImageNet
[21]. Moreover, they also only focus more on instancespace MIL.Compared with theirs, we focus on appling MIN structure for more general MIL problems. Notice that for general MIL problems, there are no available large datasets for pretraining like computer vision, which makes it more difficult to train MINN efficiently. We have shown many tricks to train our networks from scratch on MIL benchmarks with limited training data, and achieved many inspiring results. Meanwhile, we have investigated both miNet and MINet, and experiments have shown that the MINet outperforms miNet in more cases.Iii Multiple Instance Neural Networks
In this section, we will firstly introduce the formulation of MIL, then give various networks for MIL, and lastly study the MIL pooling methods and training loss.
Iiia Notations
Here we first review the definition of MIL. Given a set of bags and instance features of bag , where and denote the number of bags and the number of instances in bag respectively. Suppose and are the label of bag and instance separately, where means positive and means negative. In MIL, only bag labels are given during training, and there are two MIL constraints:

If bag is negative, then all instances in will be negative, i.e., if , then all ;

If bag is positive, then at least one instance in will be positive, i.e., if , then .
Since instance label is not given in training phase, solving the MIL problem is challenging. In MINNs, there are two strategies: the first one is to infer instance label in the network, i.e., placing instance probabilities of being positive as a hidden layer in the network; the second one is to use learn bag representation in the network and directly carry out bag classification without calculating instance probability. The first strategy has been studied in [8, 9, 18]. The second strategy is newly proposed in this paper. In the following subsections, we will give the descriptions of MINNs.
Let us consider a setting of a single bag with multiple instances that is passed through a MINN. A MINN is made out of
layers, each of which consists of a nonlinear transformation
, where indexes the layer.can be a composite of operations such as inner product (or fully connection), or rectified linear units (ReLU)
[22]. We denote the output of the layer of an instance as .IiiB miNet: InstanceSpace MIL Algorithm
At first, we review traditional multiple instance neural networks [8, 9, 18], which are named as miNet. As shown in Fig. 1
, each instance in a bag is first fed into several fully connected (fc) layers with activation function (in this paper we use four fc layers and ReLU activation). Thus, we get the instance feature denoted as
in the layer and instance probability denoted as . is a scalar in the range of . In the last layer, there is a MIL pooling layer (described in Section IIIF) which takes instance probabilities as input and outputs bag probability, denoted as .These first fc layers can learn some more semantic instance features compared with original (higher layer corresponding to higher semantic features). After learning these instance features, a fc layer which only has one neuron with sigmoid activation, is used to predict the positiveness of instances.
But unlike traditional neural networks, for miNet, we only have bag labels for training but instance labels are not available. To address this problem, we treat the instance labels as latent variables and infer them during the network training. We design a layer to aggregate instance scores into bag score. Here, a MIL pooling layer is used to aggregate these instance scores into the final the positiveness of bag.
The MIL pooling method satisfies the MIL constraints: If a bag is positive, there should have at least one instance with large positiveness. Otherwise, all instances in the bag should have low positiveness. Since the pooling layer is integrated into the neural network, the pooling function should be differentiable. There typical MIL pooling will be introduced in Section IIIF.
In summary, the miNet can be formulated as:
(1) 
IiiC MINet: A new EmbeddedSpace MIL Algorithm
We propose a series of new multiple instance neural networks which do not rely on inferring instance probability. The networks directly learn bag representation and produce better bag classification accuracy. These methods belong to the category of embeddedspace MIL algorithms defined in the survey [2]. Following the naming style in [10], we name this networks as MINet.
In Figure 2, we show a plain MINet with three fully connected layer and one MIL pooling layer. The change of network structure leads the network to focus on learning bag representation, rather than predicting instance probability. No matter how many input instances there are, the MIL pooling layer aggregates them into one feature vector as a bag representation. At last, a fc layer with only one neuron and sigmoid activation takes the bag representation as input and predicts bag probability. This plain MINet is formulated as:
(2) 
IiiD MINet with Deep Supervision
Inspired by the DeeplySupervised Nets (DSN) [13], we add deep supervisions in MINet as shown in Figure 3. That is, for each middle fc layer that can learn instance features, a fc layer for predicting instance scores with a MIL pooling layer follows it. During training, the supervision is added to each level. And during testing, we compute the mean score for each level. The MINet with deep supervision is formulated as:
(3) 
where the index in means we learn multiple bag features from all different levels of instance features by MIL pooling. MINet with deep supervision is able to utilize multiple hierarchies to get better bag classification accuracy. It can be interpreted from two folds: (1) In training instance feature in bottom layers can receive better supervision; and (2) in testing, we can average multiple bag probabilities to get a more robust bag label. In this paper, we set the weights of different levels equally.
IiiE MINet with Residual Connections
Recently, deep residual learning was proposed in [14] and showed the impressive improvement in image recognition by utilizing very deep neural networks. We study the residual connections in MINet as shown in Figure 4. MINet with residual connections are formulated as:
(4) 
Different from the original residual learning in [14]
which learns representation residuals using convolution, batch normalization and ReLU, we learn the bag representation residuals via fully connected layers, ReLU and MIL pooling. In the end of the network, final bag representation is connected to the bag label via a fc layer with one neuron and sigmoid activation.
IiiF MIL Pooling Methods
As referred before, we use a MIL pooling layer to get patch scores or patch representations. In this paper, we use three popular used MIL pooling methods: max pooling, mean pooling, and logsumexp (LSE) pooling, as shown in Eq. (5), where is the input, is the output, is the number of input, and is a hyperparameter. All these methods satisfy the constraints referred in Section IIIB. Actually the LSE [23]
is a smooth version and convex approximation of the max function. The hyperparameter
controls how the smoothness of approximation. That is, it is more approximate to max when is large and more approximate to mean when is small.(5) 
IiiG Training Loss
For both miNet and MINet, we can get the bag scores. Here we will define the loss function during training. As we are aiming at predicting labels of bags, it is natural to choose the cross entropy loss function, as in Eq. (6), where is the bag score of bag. This loss is added to each bag scores level for deep supervision.
(6) 
As all parts of our multiple instance network are differentiable, we can train these networks by standard backpropagation with Stochastic Gradient Descent (SGD).
# dataset  # attribute  # bag  # instance  

positive  negative  total  min  max  total  
MUSK1  
MUSK2  
Elephant  
Fox  
Tiger  
Text(Zhou) alt.atheism  
Text(Zhou) comp.graphics  
Text(Zhou) comp.os.mswindows.misc 
Iv Experiments
In this section, we perform experiments to test miNet, MINet and its variations on different MIL benchmarks, including molecule activity, image, and text categorization.
Iva Datasets
We test these methods on three widelyused MILbenchmarks in different applications, including drug activation prediction, automatic image annotation and text categorization. For evaluation, we run five times 10fold cross validation and report the average results.
Drug Activation Prediction
MUSK [1] datasets are used to predict whether a drug molecule can bind well to target protein. Each molecule is exhibited as multiple shapes, which are described as dimension features. In the MIL problem, we can regard a molecule as a bag and represent different shapes belonging to the same molecule as instances of this bag. instances are included in MUSK1 which is divided into positive bags and negative bags, while instances are included in MUSK2 which is divided into positive bags and negative bags.
Automatic Image Annotation
The Elephant, Fox and Tiger datasets [10], are all composed of positive bags from the target class animal images and negative bags randomly chosen from other class animal images. Here, an image is represented as a bag, which contains a set of regions we called instances in MIL problems. When searching for a target object, we use this network to obtain the keywords of images. Moreover, each image is represented by to instances which are dimension features that describe the color, texture, and shape in regions of an image.
Text Categorization
Besides the above datasets, the text categorization is another widely used application of MIL problems. Here, we take twenty datasets derived from the 20 Newsgroups corpus [26]. In each category, bags are included among which half bags are positive and the rest of bags are negative. Each positive bag contains % posts from the target class and the rest from other categories, while the instances of negative bags are all randomly drawn from other categories. In addition, each instance is represented by the top TFIDF features.
Detailed characteristics of these datasets are summarized in Table I.
IvB Experimental Setup
These neural networks contain four fully connected (fc) layers and first three fc layers are followed by a dropout layer ( dropout ratio). As referred in Section III
, we present the performance of the proposed multiple instance learning approaches: (1) miNet:We learn instance scores from four fc layers and aggregate instance scores into bag scores to predict the label of the bag via MIL pooling layer. (2) MINet: Input instances are aggregated into bag representation by first three fc layers and MIL pooling layer, and then use the last fc layer to predict bag probability. (3) MINet with Deep Supervision (MINet with DS): Different from MINet, each middle fc layer is followed by a MIL pooling layer and fc layer to compute bag scores. The loss function of MINet with DS sums up all middle entropy losses to do backpropagation with SGD for training, and the average of each bag score is used for testing. (4) MINet with Residual Connections (MINet with RC): Residual connections are built between each middle bag representation, and followed by a fc layer to obtain bag score.
As for the numbers of neurons in fc layers, there are , , , in miNet, MINet and MINet with DS while , , ,
in MINet with RC. Weights of fc layers are all initialized using a glorotuniform distribution
[28]. Biases are all initialized to be 0. For different datasets, the learning rate, weight decay and momentum are set suitable values that you can find in the configuration file of our code. All networks are trained with SGD, and one bag is inputted as a batch for training and testing. Moreover, about training and testing time, e.g., it takes only second to predict a bag andsecond to train on MUSK1 dataset on a moderate CPU. Our code is written in Python, based on Keras
[29], and all of our experiments are running on a PC with Inter(R) i74790K CPU (4.00GHZ) and 32GB RAM. The code for reproducing results will be available upon acceptance.Dataset  MIKernel [25]  miGraph [26]  miFV [27]  miNet  MINet  MINet with DS  MINet with RC 

alt.atheism  
comp.graphics  
comp.windows.misc  
comp.ibm.pc.hardware  
comp.sys.mac.hardware  
comp.window.x  
misc.forsale  
rec.autos  
rec.motorcycles  
rec.sport.baseball  
rec.sport.hockey  
sci.crypt  
sci.electronics  
sci.med  
sci.space  
soc.religion.christian  
talk.politics.guns  
talk.politics.mideast  
talk.politics.misc  
talk.religion.misc  
average 
IvC Experimental Results
Experimental results are shown in Table II and Table III. The best performance of each dataset is bolded. Notice that using different pooling methods for these networks will produce different results for each dataset. Here, we choose the best one as the final result (for text categorization, the max pooling achieves the best performance consistently). And we will discuss the influence of pooling methods later. Particularly, it achieves stateoftheart performance on Elephant, Fox, and text categorization, and nearly best accuracies on other datasets. These results demonstrate the effectiveness of these multiple instance networks. From these results, we can observe that these networks achieve highly competitive results.
We can easily find that the embeddedspace network MINet seems more competitive than the instancespace network miNet, which is consistent with other MIL algorithms. In five benchmark datasets, MINet with DS achieves almost all best results than other methods, which verifies the network with deep supervision will be more robust to predict bag label. Additionally, MINet with RC also gets good results on these five benchmark datasets. In text categorization datasets, MINet with DS achieves the superior performance and results of MINet with RC is slightly worser than results of MINet. The average accuracy of all 20 datasets as evaluation indicates that MINet and its two variations outperform other five competing algorithms, including MIKernel [25], miGraph [26], miFV [27] and miNets.
V Discussion
In this section, we discuss the influence of different pooling methods, deep supervision, residual connections on the networks. The width and depth of networks which may have impact on the performance is also considered in discussion.
Pooling Method  MUSK1  MUSK2  Elephant  Fox  Tiger 

max  
mean  
LSE 
Method  MUSK1  MUSK2  Elephant  Fox  Tiger 

MINet with DS  
MINet without DS 
Method  MUSK1  MUSK2  Elephant  Fox  Tiger 

MINet with RC  
MINet without RC 
Va The Influence of Different Pooling Methods
There are three pooling methods applied to these networks, including max pooling, mean pooling and LSE pooling. As referred in Section III, in embeddedspace, instance features of the same bag are aggregated into the bag representation through pooling methods; in instancespace, instance scores of the same bag are aggregated into bag scores. We test the influence of different pooling methods on MINet with DS. From Table IV, we can observe that max pooling is preferable compared with other methods.
VB The Influence of Deep Supervision
To illustrate the effectiveness of deep supervision, we compare our MINet with deep supervision to the network without deep supervision, which only do MIL pooling and bag score prediction on the third fc layer. The effectiveness of deep supervision is validated on five MIL benchmark datasets, as shown in Table V. From the results, we can observe that the performance is boosted by deep supervision for all datasets and networks. Deep supervision is essential for learning good instance features in multiple instance networks.
VC The Influence of Residual Connections
In order to show the improvement of residual connections, MINet with Residual Connections which learns the bag representation residuals, is compared to MINet. As referred in Section VI, the influence of residual connections is proved on five MIL benchmark datasets. The results of MINet with Residual Connections are better than MINet without Residual Connections, except for Elephant and Tiger. Residual Connections may also have a positive impact on learning good bag representation in multiple instance networks.
VD The Influence of network depth and width
Structure  MUSK1  MUSK2  Elephant  Fox  Tiger 

(, , , )  
(, , , )  
(, , , )  
(, , , )  
(, , , )  
(, , , )  
(, , , , , )  
(, , , , , ) 
As aforementioned,for miNet, MINet and MINet with its variations, the number of layers and neurons for each layer are fixed when training and testing. In table II and table III, the proposed network both have four fc layers and there are , , , neurons for fc layers in MINet with DS respectively while , , , in MINet with RC. However, in deep learning, the deeper and wider neural network may get better performance. In this section, we will report the results of proposed MINet with DS and MINet with RC with different layer number and neuron number values on five MIL benchmarks, respectively.
The depth and width analysis results of MINet with DS on five MIL benchmarks are presented in Table VII. Note that, the neuron number of last fc layer is fixed to 1 in order to output bag scores. As shown in Table VII, MINet with DS can achieve the best performance in most cases when the depth is , and each fc layer has , , , neurons respectively. Although results of the deeper and wider network is superior to the shallower and thinner one on some datasets, the advantage of the deeper and wider network is not obvious to boost the performance.
As referred in Section IIIE, the neuron number of fc layers should be same value to build residual connections except for the last fc layer. Fixing the width of MINet with RC, we only change the depth of the network. In Figure 5, results on five MIL benchmarks are similar with the network deeper. So the depth of MINet with RC is set to during discussing the influence of width on MINet with RC. Figure 6 illustrates that the wider network is not necessary to boost the performance. In addition, MINet with RC may get worse performance when it is too thin.
This observation is not consistent with the performance of deeper and wider neural networks to solve other problems. That may be related to limited training data and simple MIL pooling methods.
Vi Conclusion
In this work, we propose series of novel neural network frameworks for MIL. Different from previous MIL networks, our method focuses on bag level representation learning instead of instance level label estimating. Experiments show that our bag level networks show superior results on several MIL benchmarks compared with the instance level network. Moreover, we intergrate the most popular deep learning tricks (deep supervision and residual connections) into our networks, which can boost the performance further. What is more, our method only takes about second for testing (forward) and second for training (backward) per bag, which is very efficient. According to these inspiring results, we believe that deep learning can also solve the traditional MIL problem well. In the future, we would like to study how to develop more effective MIL pooling methods, and how to train deeper and wider networks for MIL with limited training data.
References
 [1] T. G. Dietterich, R. H. Lathrop, and T. LozanoPérez, “Solving the multiple instance problem with axisparallel rectangles,” Artificial Intelligence, vol. 89, no. 1, pp. 31–71, 1997.
 [2] J. Amores, “Multiple instance classification: Review, taxonomy and comparative study,” Artificial Intelligence, vol. 201, pp. 81–105, 2013.
 [3] G. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
 [4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012, pp. 1097–1105.
 [6] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural computation, vol. 1, no. 2, pp. 270–280, 1989.
 [7] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [8] J. Ramon and L. De Raedt, “Multi instance neural networks,” in Proceedings of the ICML2000 workshop on attributevalue and relational learning, 2000, pp. 53–60.
 [9] Z.H. Zhou and M.L. Zhang, “Neural networks for multiinstance learning,” in Proceedings of the International Conference on Intelligent Information Technology, Beijing, China, 2002, pp. 455–459.
 [10] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for multipleinstance learning,” in NIPS, 2002, pp. 561–568.
 [11] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” JMLR, vol. 15, no. 1, pp. 1929–1958, 2014.

[12]
V. Nair and G. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
ICML, 2010, pp. 807–814.  [13] C. Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “DeeplySupervised Nets,” in AISTATS, 2015, pp. 562–570.
 [14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385, 2015.
 [15] M.L. Zhang and Z.H. Zhou, “Improve multiinstance neural networks through feature selection,” Neural Processing Letters, vol. 19, no. 1, pp. 1–10, 2004.
 [16] M. Zhang and Z. Zhou, “Ensembles of multiinstance neural networks,” in International Conference on Intelligent Information Processing. Springer, 2004, pp. 471–474.
 [17] J. Sánchez, F. Perronnin, T. Mensink, and J. J. Verbeek, “Image classification with the Fisher Vector: Theory and practice,” IJCV, vol. 105, no. 3, pp. 222–245, 2013.
 [18] J. Wu, Y. Yu, C. Huang, and K. Yu, “Deep multiple instance learning for image classification and autoannotation,” in CVPR, 2015, pp. 3460–3469.
 [19] P. O. Pinheiro and R. Collobert, “From imagelevel to pixellevel labeling with convolutional networks,” in CVPR, 2015, pp. 1713–1721.
 [20] X. Wang, Z. Zhu, C. Yao, and X. Bai, “Relaxed multipleinstance SVM with application to object discovery,” in ICCV, 2015, pp. 1224–1232.
 [21] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. FeiFei, “Imagenet: A largescale hierarchical image database,” in CVPR, 2009, pp. 248–255.
 [22] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks.” in Aistats, vol. 15, no. 106, 2011, p. 275.
 [23] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
 [24] Q. Zhang and S. A. Goldman, “EMDD: An improved multipleinstance learning technique,” in NIPS, 2001, pp. 1073–1080.
 [25] T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola, “Multiinstance kernels,” in ICML, vol. 2, 2002, pp. 179–186.
 [26] Z. H. Zhou, Y. Y. Sun, and Y. F. Li, “Multiinstance learning by treating instances as noniid samples,” in ICML, 2009, pp. 1249–1256.
 [27] X. S. Wei, J. Wu, and Z. H. Zhou, “Scalable algorithms for multiinstance learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. PP, no. 99, pp. 1–13, 2016.
 [28] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010, pp. 249–256.
 [29] F. Chollet, “Keras,” https://github.com/fchollet/keras, 2015.
Comments
There are no comments yet.