1 Introduction
Human beings master our skills for a given problem by working on and thinking through the same problem over and over again. When a difficult problem is given to us, multiple attempts would have gone through our mind to simulate different possibilities. During this period, our understanding to the problem gets deeper, which in term allows us to propose a better solution in the end. The deeper understanding comes from a piece of consolidated knowledge within our memory, which records how we build up the problem context with processing and predicting during the “rethinking” attempts. The humanrethinking model above inspires us to design a novel deep learning model for machinerethinking, which is equipped with a memory structure to better solve the multilabel classification (MLC) problem.
The MLC problem aims to attach multiple relevant labels to an input instance simultaneously, and matches various application scenarios, such as tagging songs with a subset of emotions (Trohidis et al., 2008) or labeling images with objects (Wang et al., 2016). Those MLC applications typically come with an important property called label correlation (Cheng et al., 2010; Huang and Zhou, 2012)
. For instance, when tagging songs with emotions, “angry” is negatively correlated with “happy”; when labeling images, the existence of a desktop computer probably indicates the coexistence of a keyboard and a mouse. Many existing MLC works implicitly or explicitly takes label correlation into account to better solve MLC problems
(Cheng et al., 2010).Label correlation is also known to be important for human when solving MLC problems (Bar, 2004). For instance, when solving an image labeling task upon entering a new room, we might notice some more obvious objects like sofa, dining table and wooden floor at the first glance. Such a combination of objects hints us of a living room, which helps us better recognize the “geese” on the sofa to be stuffed animals instead of real ones. The recognition route from the sofa to the living room to stuffed animals require rethinking about the correlation of the predictions step by step. Our proposed machinerethinking model mimics this humanrethinking process to digest label correlation and solve MLC problems more accurately.
Next, we introduce some representative MLC algorithms before connecting them to our proposed machinerethinking model. Binary relevance (BR) (Tsoumakas et al., 2009)
is a baseline MLC algorithm that does not consider label correlation. For each label, BR learns a binary classifier to predict the label’s relevance independently. Classifier chain (CC)
(Read et al., 2009) extends BR by taking some label correlation into account. CC links the binary classifiers as a chain and feeds the predictions of the earlier classifiers as features to the latter classifiers. The latter classifiers can thus utilize (the correlation to) the earlier predictions to form better predictions.The design of CC can be viewed as a memory mechanism that stores the label predictions of the earlier classifiers. CNNRNN (Wang et al., 2016) and OrderFree RNN with Visual Attention (AttRNN) (Chen et al., 2017) algorithms extend CC by replacing the mechanism with a more sophisticated memorybased model—recurrent neural network (RNN). By adopting different variations of RNN (Hochreiter and Schmidhuber, 1997; Cho et al., 2014)
, the memory can store more sophisticated concepts beyond earlier predictions. In addition, adopting RNN allows the algorithms to solve tasks like image labeling more effectively via endtoend training with other deep learning architectures (e.g., convolutional neural network in CNNRNN).
The CCfamily algorithms above for utilizing label correlation are reported to achieve better performance than BR (Read et al., 2009; Wang et al., 2016). Nevertheless, given that the predictions happen sequentially within a chain, those algorithms generally suffer from the issue of label ordering. In particular, classifiers in different positions of the chain receive different levels of information. The last classifier predicts with all information from other classifiers while the first classifier label predicts with no other information. AttRNN addresses this issue with beam search to approximate the optimal ordering of the labels, and dynamic programming based classifier chain (CCDP) (Liu and Tsang, 2015) searches for the optimal ordering with dynamic programming. Both AttRNN and CCDP can be timeconsuming when searching for the optimal ordering, and even after identifying a good ordering, the label correlation information is still not shared equally during the prediction process.
Our proposed deep learning model, called RethinkNet, tackles the label ordering issue by viewing CC differently. By considering CCfamily algorithms as a rethinking model based on the partial predictions from earlier classifiers, we propose to fully memorize the temporary predictions from all classifiers during the rethinking process. That is, instead of forming a chain of binary classifiers, we form a chain of multilabel classifiers as a sequence of rethinking. RethinkNet learns to form preliminary guesses in earlier classifiers of the chain, store those guesses in the memory and then correct those guesses in latter classifiers with label correlation. Similar to CNNRNN and AttRNN, RethinkNet adopts RNN for making memorybased sequential prediction. We design a global memory for RethinkNet to store the information about label correlation, and the global memory allows all classifiers to share the same information without suffering from the label ordering issue.
Another advantage of RethinkNet is to tackle an important realworld need of CostSensitive MultiLabel Classification (CSMLC) (Li and Lin, 2014). In particular, different MLC applications often require different evaluation criteria. To be seamlessly useful for a broad spectrum of applications, it is thus important to design CSMLC algorithms, which takes the criteria (cost) into account during learning. Stateoftheart CSMLC algorithms include condensed filter tree (CFT) (Li and Lin, 2014) and probabilistic classifier chain (PCC) (Read et al., 2009). PCC extends CC to CSMLC by making Bayes optimal predictions according to the criterion. CFT also extends from CC, but achieves costsensitivity by converting the criterion to importance weights when training each binary classifier within CC. The conversion step in CFT generally requires knowing the predictions of all classifiers, which has readily been stored within the memory or RethinkNet. Thus, RethinkNet can be seamlessly combined with the importanceweighting idea within CFT to achieve costsensitivity. Extensive experiments across realworld data sets validate that RethinkNet indeed improves MLC performance across different evaluation criteria and is superior to stateoftheart MLC and CSMLC algorithms. Furthermore, for image labeling, experimental results demonstrate that RethinkNet outperforms both CNNRNN and AttRNN. The results justify the usefulness of RethinkNet.
2 Preliminary
In the MLC problem, the goal is to map the feature vector
to a label vector , where if and only if the th bit is relevant. During training, MLC algorithms use the training data set to learn a classifier . During testing, test example is drawn from the same distribution that generated . The prediction is produced as . The goal of an MLC algorithm is to make prediction close to .The existence of diverse criteria for evaluating the closeness of and calls for a more general setup called costsensitive multilabel classification (CSMLC) (Li and Lin, 2014). In this paper, we consider the instancewise evaluation criteria. These criteria can be generalized by a cost function . represents the penalty of predicting as . For CSMLC problem, the criterion used for evaluation is assumed to be known before training. That is, CSMLC algorithms learn a classifier from both the training data set and the cost function , and should be able to adapt to different easily. CSMLC algorithms aim at minimizing the expected cost .
2.1 Recurrent Neural Network (RNN)
RNN is a class of neural network model designed to solve sequence prediction problem. RNN uses memory to pass information from one element in the sequence to the next element. RNN learns two transformations. The memory transformation takes in the output of previous element and outputs to the next element. The feature transformation takes in the feature vector and projects it to the output space. For , where is the length of the sequence, we use to represent the feature vector of the th element in the sequence, and use to represent its output vector. The RNN model can be written as and for , where
is the activation function.
RNN comes with different forms. The basic form of RNN is called simple RNN (SRN) (Elman, 1990; Jordan, 1997). SRN assumes and
to be linear transformation. SRN is able to link information from one element to the latter elements, but it can be hard to train due to the decay of gradient
(Hochreiter et al., 2001). Several other forms of RNN are designed to solve such problem, including long short term memory (LSTM)
(Hochreiter and Schmidhuber, 1997), gated recurrent unit (GRU)
(Cho et al., 2014) and iterative RNN (IRNN) (Le et al., 2015).3 Proposed Model
The idea of improving prediction result by iteratively polishing it is the “rethinking” process. This process can be taken as a sequence prediction problem and RethinkNet adopts recurrent neural network (RNN) to model this process.
Figure 1 illustrates how RethinkNet is designed. RethinkNet is composed of an RNN layer and a dense (fully connected) layer. The dense layer learns a label embedding to transform the output of RNN layer to label vector. The RNN layer is used to model the “rethinking” process. All steps in RNN share the same feature vector since they are solving the same MLC problem. The output of the RNN layer represents the embedding of the label vector . Each is passed down to th element in the RNN layer.
In the first step, RethinkNet makes a prediction base on the feature vector alone, which targets at labels that are easier to identify. The first prediction is similar to BR, which predicts each label independently without the information of other labels. From the second step, RethinkNet begins to use the result from the previous step to make better predictions . is taken as the final prediction . As RethinkNet polishes the prediction, difficult labels would eventually be labeled more correctly.
3.1 Modeling Label Correlation
RethinkNet models label correlation in the memory of the RNN layer. To simplify the illustration, we assume that the activation function
is sigmoid function and the dense layer is identity transformation. Also, SRN is used in the RNN layer. Other forms of RNN share similar property since they are originated from SRN. In SRN, the memory and feature transformations are represented as matrices
and respectively. The RNN layer output will be a label vector with length .Under the setting, the predicted label vector is . This equation can be separated into two parts, the feature term , which makes the prediction like BR, and the memory term , which transforms the previous prediction to the current label vector space. This memory transformation serves as the model for label correlation. represents th row and th column of and it represents the correlation between th and th label. The prediction of th label is the combination of and . If we predict as relevant at step and is high, it indicates that the th label is more likely to be relevant. If is negative, this indicates that the th label and th label may be negatively correlated.
Figure 2 plots the learned memory transformation matrix and the correlation coefficient of the labels. We can clearly see that RethinkNet is able to capture the label correlation information, although we also found that such result in some data set can be noisy. The finding suggests that may carry not only label correlation but also other data set factors. For example, the RNN model may learn that the prediction of a certain label does not come with a high accuracy. Therefore, even if another label is highly correlated with this one, the model would not give it a high weight.
3.2 CostSensitive Reweighted Loss Function
Cost information is another important piece of information that should be considered when solving an MLC problem. Different cost function values each label differently, so we should set the importance of each label differently. One way to encode such property is to weight each label in the loss function according to its importance. The problem would become how to estimate the label importance.
The difference between the label predicted correctly and incorrectly under the cost function can be used to estimate the importance of the label. To evaluate the importance of a single label, filling out other labels is required for most costs. We leverage the sequential nature of RethinkNet where temporary predictions are made between each of the iterations. Using the temporary prediction to fill out all other labels, we will be able to estimate the importance of each label.
The weight of each label is designed as equation (1). For , where no prior prediction exists, the labels are set with equal importance. For , we use and to represent the label vector when the th label is set to and respectively. The weight of each label is therefore the cost difference between and . This weighting approach can be used to estimate the effect of each label under current prediction with the given cost function. Such method echos the design of CFT (Li and Lin, 2014).
(1) 
To accept the weight in loss function, we formulated the weighted binary crossentropy as Equation (2). For , the weight for all labels are set to 1 since there is no prediction to reference. For , the weights are updated using the previous prediction. Note that when the given cost function is Hamming loss, the labels in each iteration are weighted the same and the weighting is reduced to the same as in BR.
(2)  
algorithm  memory content  costsensitivity  feature extraction 

BR       
CC  former prediction     
CCDP  optimal ordered prediction     
PCC  former prediction  v   
CFT  former prediction  v   
CNNRNN  former prediction in RNN    CNN 
AttRNN  former prediction in RNN    CNN + attention 
RethinkNet  full prediction in RNN  v  general NN 
Table 1 shows a comparison between MLC algorithms. RethinkNet is able to consider both the label correlation and the cost information. Its structure allows it to be extended easily with other neural network for advance feature extraction, so it is easy to be adopted to image labeling problems. In Section 4, we demonstrate these advantages be turned into better results.
4 Experiments
The experiments were evaluated on 11 realworld data sets (Tsoumakas et al., 2011)
. The data set is split with 75% training and 25% testing randomly. All experiments are repeated 10 times with the mean and standard error (ste) of the testing loss/score recorded. The results are evaluated with Hamming loss, Rank loss, F1 score, Accuracy score
(Li and Lin, 2014). We use to indicate the lower value for the criterion is better and to indicate the higher value is better.RethinkNet is implemented using keras (Chollet, 2015) with tensorflow (Abadi et al., 2015). The RNN layer can be interchanged with different variations of RNN including SRN, LSTM, GRU and IRNN. A 25% dropout on the memory matrix of RNN is added. A single fullyconnected layer is used for the dense layer and Nesterov Adam (Nadam) (Dozat, 2016) is used to optimize the model. The model is trained until converges or reach epochs and the batch size is fixed to . We added an L2 Regularizer to training parameters and the regularization strength is search within with threefold crossvalidation.
4.1 Rethinking
In Section 3, we claim that RethinkNet is able to improve through iterations of rethinking. We justify our claim with this experiment. We use the simplest form of RNN, SRN, in the RNN layer of RethinkNet and the dimensionality of the RNN layer is fixed to . We set and plot the training and testing loss/score on Figure 3.
From the figure, we can observe that cost functions like Rank loss, F1 score, Accuracy score which relies more on utilizing label correlation shown significant improvement over the number of rethink iteration. Hamming loss is a criterion that evaluates each label independently and algorithms that does not consider label correlation like BR perform well on such criterion (Read et al., 2009). The first step of RethinkNet is essentially BR, thus more iterations may not serve that much benefit. The result demonstrates that the performance generally converges at around the third iteration. For efficiency, the rest of experiments will be fixed with .
Rank loss  F1 score  Accuracy score  

data set  none  reweighted  none  reweighted  none  reweighted 
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex 
4.2 Effect of Reweighting
To verify the costsensitive reweighting can take the cost information and reach a better performance, we conducted this experiment. The performance of RethinkNet with and without reweighting under Rank loss, F1 score and Accuracy score is compared. Table 2 lists the experimental results and it shows that on almost all data sets, reweighting the loss function for RethinkNet yields better result.
Rank loss  

data set  SRN  GRU  LSTM  IRNN  
training  testing  training  testing  training  testing  training  testing  
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex  
F1 score  
data set  SRN  GRU  LSTM  IRNN  
training  testing  training  testing  training  testing  training  testing  
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex 
4.3 Compare with Other MLC Algorithms
We compare RethinkNet with other stateoftheart MLC and CSMLC algorithms in this experiment. The competing algorithms includes the binary relevance (BR), probabilistic classifier chain (PCC), classifier chain (CC), dynamic programming based classifier chain (CCDP), condensed filter tree (CFT). To compare with the RNN structure used in convolutional neural network recurrent neural network (CNNRNN), we implemented a classifier chains using RNN (CCRNN) as competitor. CCRNN essentially is CNNRNN without the CNN layer since we are dealing with general data sets. BR is implemented with feedforward neural network with a hidden layer having
neurons. We coupled both CCRNN and RethinkNet with a neurons LSTM. CCRNN and BR are trained and tuned using same approach as RethinkNet and these models are optimized using Nadam with default parameter. Trainingindependent feedforward neural network is too computationally heavy, so we coupled CFT, PCC, CC with L2regularized logistic regression. CCDP is coupled with linear support vector machine (SVM) since it is derived on such model. The regularization strength for both these models are searched within
with threefold crossvalidation. PCC does not have inference rule derived for Accuracy score and we use the F1 score inference rule as an alternative in view of the similarity in the formula.The experimental results are shown on Table 6
and the ttest results are on Table
4. Note that we cannot get the result of CCDP in two weeks on the data sets Corel5k, CAL500 and bibtex so they are not listed. In terms of average ranking and ttest results, RethinkNet yields a superior performance. On Hamming loss, all algorithms are generally competitive. For Rank loss, F1 score and Accuracy score, CSMLC algorithms (RethinkNet, PCC, CFT) begin to take the lead. Even the parameters of costinsensitive algorithms are tuned on the target evaluation criteria, they are not able to compete with costsensitive algorithms. This demonstrates the importance of developing costsensitive algorithms.All three CSMLC algorithms has similar performance on Rank loss and RethinkNet performs slightly better on F1 score. PCC is not able to directly utilize the cost information of Accuracy score, this makes PCC performs slightly poorly.
When comparing with deep structures (RethinkNet, CCRNN, BR), only BR is competitive under Hamming loss with RethinkNet. On all other settings, RethinkNet is able to outperform the other two competitors. CCRNN learns an RNN with sequence length being the number of labels (). When gets large, the depth of CCRNN can go very deep making it hard to train with fixed learning rate in our setting and failed to perform well on these data sets. This demonstrates that RethinkNet is a better designed deep structure to solve CSMLC problems.
PCC  CFT  CCDP  CC  CCRNN  BR  

hamming  6/1/4  3/4/4  5/2/1  6/1/4  8/3/0  3/6/2 
rank loss  5/1/5  5/2/4  7/1/0  10/1/0  10/1/0  10/1/0 
f1  6/2/3  5/4/2  5/2/1  8/3/0  10/1/0  9/2/0 
acc  7/1/3  5/4/2  5/1/2  7/4/0  9/2/0  9/2/0 
total  24/5/15  18/14/12  22/6/4  31/9/4  37/7/0  31/11/2 
baseline  CNNRNN  AttRNN  RethinkNet  

hamming  
rank loss  
f1  
acc 
Hamming loss  

data set  RethinkNet  PCC  CFT  CCDP  CC  CCRNN  BR 
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex  
avg. rank  
Rank loss  
data set  RethinkNet  PCC  CFT  CCDP  CC  CCRNN  BR 
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex  
avg. rank  
F1 score  
data set  RethinkNet  PCC  CFT  CCDP  CC  CCRNN  BR 
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex  
avg. rank  
Accuracy score  
data set  RethinkNet  PCC  CFT  CCDP  CC  CCRNN  BR 
emotions  
scene  
yeast  
birds  
tmc2007500  
Arts1  
medical  
enron  
Corel5k  
CAL500  
bibtex  
avg. rank 
4.4 Comparison on Image Data Set
The CNNRNN and AttRNN algorithms are designed to process image labeling problems. The purpose of this experiment is to understand how RethinkNet performs on such task compare with CNNRNN and AttRNN. We use the data set MSCOCO (Lin et al., 2014) and the training testing split provided by them. Pretrained Resnet50 (He et al., 2015) is adopted for feature extraction. The competing models include logistic regression as baseline, CNNRNN, AttRNN, and RethinkNet. We use the implementation of AttRNN from the author and other models are implemented using keras. The models are fine tuned with the pretrained Resnet50. The results on testing data set are shown on Table 5. The result justifies that RethinkNet is able to outperform stateoftheart deep learning models that are designed for image labeling.
4.5 Effect of Using Different RNN
In this experiment, we compare the performance of RethinkNet using different forms of RNN on the RNN layer in RethinkNet. The competitors includes SRN, LSTM, GRU, and IRNN. We tune the label embedding dimensionality so that the total number of trainable parameters are around for each form of RNN. The results are evaluated on two more commonly seen cost functions, Rank loss and F1 score, and shown on Table 3.
Different variations of RNN differs in the way they manipulate the memory. In terms of testing result, we can see that SRN and LSTM are two better choices. GRU and IRNN tends to be overfitting too much causing their testing performance to drop. Among SRN and LSTM, SRN tends to have a slightly larger discrepancy between training and testing performance. We can also observed that many data sets performs better with the same variation of RNN across cost functions. This indicates that different data set may require different form of memory manipulation.
5 Conclusion
Classic multilabel classification (MLC) algorithms predict labels as a sequence to model the label correlation. However, these approaches face the problem of ordering the labels in the sequence. In this paper, we reformulate the sequence prediction problem to avoid the issue. By mimicking the human rethinking process, we propose a novel costsensitive multilabel classification (CSMLC) algorithm called RethinkNet. RethinkNet takes the process of gradually polishing its prediction as the sequence to predict. We adopt the recurrent neural network (RNN) to predict the sequence, and the memory in the RNN can then be used to store the label correlation information. In addition, we also modified the loss function to take in the cost information, and thus make RethinkNet costsensitive. Extensive experiments demonstrate that RethinkNet is able to outperform other MLC and CSMLC algorithms on general data sets. On image data set, RethinkNet is also able to exceed stateoftheart image labeling algorithms in performance. The results suggest that RethinkNet is a promising algorithm for solving CSMLC using neural network.
References

Abadi et al. [2015]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,
Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,
Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol
Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang
Zheng.
TensorFlow: Largescale machine learning on heterogeneous systems, 2015.
Software available from tensorflow.org.  Bar [2004] Moshe Bar. Visual objects in context. Nature reviews. Neuroscience, 5(8):617, 2004.
 Chen et al. [2017] ShangFu Chen, YiChen Chen, ChihKuan Yeh, and YuChiang Frank Wang. Orderfree RNN with visual attention for multilabel classification. arXiv preprint arXiv:1707.05495, 2017.
 Cheng et al. [2010] Weiwei Cheng, Eyke Hüllermeier, and Krzysztof J Dembczynski. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, 2010.
 Cho et al. [2014] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
 Chollet [2015] François Chollet. Keras. https://github.com/fchollet/keras, 2015.

Dozat [2016]
Timothy Dozat.
Incorporating nesterov momentum into adam.
2016.  Elman [1990] Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
 He et al. [2015] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
 Hochreiter and Schmidhuber [1997] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997.
 Hochreiter et al. [2001] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, et al. Gradient flow in recurrent nets: the difficulty of learning longterm dependencies. 2001.
 Huang and Zhou [2012] ShengJun Huang and ZhiHua Zhou. Multilabel learning by exploiting label correlations locally. In AAAI, 2012.
 Jordan [1997] Michael I Jordan. Serial order: A parallel distributed processing approach. Advances in psychology, 121:471–495, 1997.
 Le et al. [2015] Quoc V Le, Navdeep Jaitly, and Geoffrey E Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941, 2015.
 Li and Lin [2014] ChunLiang Li and HsuanTien Lin. Condensed filter tree for costsensitive multilabel classification. In ICML, 2014.
 Lin et al. [2014] TsungYi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
 Liu and Tsang [2015] Weiwei Liu and Ivor Tsang. On the optimality of classifier chain for multilabel classification. In NIPS, 2015.
 Read et al. [2009] Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. Classifier chains for multilabel classification. Machine Learning and Knowledge Discovery in Databases, pages 254–269, 2009.
 Trohidis et al. [2008] Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P. Vlahavas. Multilabel classification of music into emotions. In ISMIR, 2008.
 Tsoumakas et al. [2009] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining multilabel data. In Data mining and knowledge discovery handbook, pages 667–685. 2009.
 Tsoumakas et al. [2011] Grigorios Tsoumakas, Eleftherios SpyromitrosXioufis, Jozef Vilcek, and Ioannis Vlahavas. Mulan: A java library for multilabel learning. Journal of Machine Learning Research, 12:2411–2414, 2011.
 Wang et al. [2016] Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. Cnnrnn: A unified framework for multilabel image classification. In CVPR, 2016.
Comments
There are no comments yet.