Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations

08/23/2018 ∙ by Guangxiang Zhao, et al. ∙ Peking University 0

This paper explores a new natural language processing task, review-driven multi-label music style classification. This task requires the system to identify multiple styles of music based on its reviews on websites. The biggest challenge lies in the complicated relations of music styles. It has brought failure to many multi-label classification methods. To tackle this problem, we propose a novel deep learning approach to automatically learn and exploit style correlations. The proposed method consists of two parts: a label-graph based neural network, and a soft training mechanism with correlation-based continuous label representation. Experimental results show that our approach achieves large improvements over the baselines on the proposed dataset. Especially, the micro F1 is improved from 53.9 to 64.5, and the one-error is reduced from 30.5 to 22.6. Furthermore, the visualized analysis shows that our approach performs well in capturing style correlations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As music style (e.g., Jazz, Pop, and Rock) is one of the most frequently used labels for music, music style classification is an important task for applications of music recommendation, music information retrieval, etc. There are several criteria related to the instrumentation and rhythmic structure of music that characterize a particular style. In real life, many pieces of music usually map to more than one style.

Several methods have been proposed for automatic music style classification Qin and Ma (2005); Zhou et al. (2006); Wang et al. (2009); Choi et al. (2017)

. Although these methods make some progress, they are limited in two aspects. First, their generalization ability partly suffers from the small quantity of available audio data. Due to the limitation of music copyright, it is difficult to obtain all necessary audio materials to classify music styles. Second, for simplification, most of the previous studies make a strong assumption that a piece of music has only one single style, which does not meet the practical needs.

Different from the existing methods, this work focuses on review-driven multi-label music style classification. The motivation of using reviews comes from the fact that, there is a lot of accessible user reviews on relevant websites. First, such reviews provide enough information for effectively identifying the style of music, as shown in Table 1. Second, compared with audio materials, reviews can be obtained much more easily. Taking practical needs into account, we do not follow the traditional single-label assumption. Instead, we categorize music items into fine-grained styles and formulate this task as a multi-label classification problem. For this task, we build a new dataset which contains over 7,000 samples. Each sample includes a music title, a set of human annotated styles, and associated reviews. An example is shown in Table 1.

Music Title Mozart: The Great Piano Concertos, Vol.1
Styles Classical Music, Piano Music
Reviews (1) I’ve been listening to classical music all the time.
(2) Mozart is always good. There is a reason he is ranked in the top 3 of lists of greatest classical composers.
(3) The sound of piano brings me peace and relaxation.
(4) This volume of Mozart concertos is superb.
Table 1: An illustration of review-driven multi-label music style classification. For easy interpretation, we select a simple and clear example where styles can be easily inferred from reviews. In practice, the correlation between styles and associated reviews is relatively complicated.

The major challenge of this task lies in the complicated correlations of music styles. For example, Soul Music111Soul Music is a popular music genre that originated in the United States in the late 1950s and early 1960s. It contains elements of African-American Gospel Music, R&B and Jazz.

contains elements of R&B and Jazz. These three labels can be used alone or in combination. Many multi-label classification methods fail to capture this correlation, and may mistake the true label [Soul Music, R&B, Jazz] for the false label [R&B, Jazz]. If well learned, such relations are useful knowledge for improving the performance, e.g., increasing the probability of Soul Music if we find that it is heavily linked with two high probability labels: R&B and Jazz. Therefore, to better exploit style correlations, we propose a novel deep learning approach with two parts: a label-graph based neural network, and a soft training mechanism with correlation based continuous label representation.

First, the label-graph based neural network is responsible for classifying music styles based on reviews and style correlations. A hierarchical attention layer collects style-related information from reviews based on a two-level attention mechanism, and a label graph explicitly models the relations of styles. Two information flows are combined together to output the final label probability distribution.

Second, we propose a soft training mechanism by introducing a new loss function with continuous label representation that reflects style correlations. Without style relation information, the traditional discrete label representation sometimes over-distinguishes correlated styles, which does not encourage the model to learn style correlations and limits the performance. Suppose a sample has a true label set [Soul Music], and currently the output probability for Soul Music is 0.8, and the probability for R&B is 0.3. It is good enough to make a correct prediction of [Soul Music]. However, the discrete label representation suggests the further modification to the parameters, until the probability of Soul Music becomes 1 and the probability of R&B becomes 0. Because Soul Music and R&B are related as mentioned above, over-distinguishing is harmful for the model to learn the relation between Soul Music and R&B. To avoid this problem, we introduce the continuous label representation as the supervisory signal by taking style correlations into account. Therefore, the model is no longer required to distinguish styles completely because a soft classification boundary is allowed.

Our contributions are the followings:

  • To the best of our knowledge, this work is the first to explore review-driven multi-label music style classification.222The dataset is in the supplementary material and we will release it if this paper is accepted.

  • To learn the relations among music styles, we propose a novel deep learning approach with two parts: a label-graph based neural network, and a soft training mechanism with correlation-based continuous label representation.

  • Experimental results on the proposed dataset show that our approach achieves significant improvements over the baselines in terms of all evaluation metrics.

2 Related works

2.1 Music Style Classification

Previous works mainly focus on using audio information to identify music styles. Traditional machine learning algorithms are adopted in this task, such as Support Vector Machine (SVM) 

Xu et al. (2003)

, Hidden Markov Model (HMM) 

Chai and Vercoe (2001); Pikrakis et al. (2006)

, and Decision Tree (DT) 

Zhou et al. (2006). Furthermore, several studies explore different hand-craft feature templates Tzanetakis and Cook (2002); Qin and Ma (2005); Oramas et al. (2016). Recently, neural networks have freed researchers from cumbersome feature engineering. For example, Choi et al. (2017)

introduced a convolutional recurrent neural network for music classification.

Medhat et al. (2017) designed a masked conditional neural network for multidimensional music classification.

Motivated by the fact that many pieces of music usually have different styles, several studies aim at multi-label musical style classification. For example, Wang et al. (2009) proposed to solve multi-label music genre classification with a hyper-graph based SVM. oramas2017multi explored how representation learning approaches for multi-label audio classification outperformed traditional handcrafted feature based approaches.

The previous studies have two limitations. First, they are in shortage of available audio data, which limits the generalization ability. Second, their studies are based on a strong assumption that a piece of music should be assigned with only one style. Different from these studies, we focus on using easily obtained reviews in conjunction with multi-label music style classification.

2.2 Multi-Label Classification

In contrast to traditional supervised learning, in multi-label learning, each music item is associated with a set of labels. Multi-label learning has gradually attracted attention, and has been widely applied to diverse problems, including image classification 

Qi et al. (2007); Wang et al. (2008), audio classification Boutell et al. (2004); Sanden and Zhang (2011), web mining Kazawa et al. (2004), information retrieval Zhu et al. (2005); Gopal and Yang (2010), etc. Compared to the existing multi-label learning methods Wei et al. (2018); Li et al. (2018b, a); Yang et al. (2018), our method has novelties: a label graph that explicitly models the relations of styles; a soft training mechanism that introduces correlation-based continuous label representation. To our knowledge, most of the existing studies of learning label representation only focus on single-label classification Hinton et al. (2015); Sun et al. (2017), and there is few research on multi-label learning.

3 Review-Driven Multi-Label Music Style Classification

3.1 Task Definition

Given several reviews from a piece of music, the task requires the model to predict a set of music styles. Assume that denotes the input reviews, and represents the review with words. The term denotes the gold set with labels, and varies in different samples. The target of review-driven multi-label music style classification is to learn the mapping from input reviews to style labels.

3.2 Dataset

We construct a dataset consisting of 7172 samples. The dataset is collected from a popular Chinese music review website,333https://music.douban.com where registered users are allowed to comment on all released music albums.

The dataset contains 5020, 646, and 1506 samples for training, validation, and testing respectively. We define an album as a data sample in the dataset, the dataset contains over 287K reviews and over 3.6M words. 22 styles are found in the dataset.444The styles include: Alternative Music, Britpop, Classical Music, Country Music, Dark Wave, Electronic Music, Folk Music, Heavy Metal Music, Hip-Hop, Independent Music, Jazz, J-Pop, New-Age Music, OST, Piano Music, Pop, Post-Punk, Post-Rock, Punk, R&B, Rock, and Soul Music. Each sample is labeled with 2 to 5 style types. Each sample includes the title of an album, a set of human annotated styles, and associated user reviews sorted by time. An example is shown in Table 1. On average, each sample contains 2.2 styles and 40 reviews, each review has 12.4 words.

Figure 1: An illustration of the proposed approach. Left: The label-graph based neural network. Right: The soft training method. The label graph defines the relations of labels. is the output label probability distribution. Soft training means that we combine the continuous label representation and the discrete label representation together to train the model. The hierarchical attention layer is responsible for extracting style-related information. The label graph layer and soft training are used for exploiting label correlations.

4 Proposed Approach

In this section, we introduce our proposed approach in detail. An overview is presented in Section 4.1. The details are explained in Section 4.2 and Section 4.3.

4.1 Overview

The proposed approach contains two parts: a label-graph based neural network and a soft training mechanism with continuous label representation. An illustration of the proposed method is shown in Figure 1.

The label-graph based neural network outputs a label probability distribution based on two kinds of information: reviews and label correlations. First, a hierarchical attention layer produces a music representation by using a two-level attention mechanism to extract style-related information from reviews. Second, we transforms into a “raw” label probability distribution

via a sigmoid function. Third, a label graph layer outputs the final label probability distribution

by multiplying the “raw” label representation with a label graph that explicitly models the relations of labels. Due to noisy reviews, the model sometimes cannot extract all necessary information needed for a correct prediction. The label correlations can be viewed as supplementary information to refine the label probability distribution. For example, the low probability of a true label will be increased if the label is heavily linked with other high probability labels. With the label correlation information, the model can better handle multi-label music style classification, where there are complicated correlations among music styles.

Typically, the model is trained with the cross entropy between the discrete label representation and the predicted label probability distribution . However, we find it hard for the model to learn style correlations because the discrete label representation does not explicitly contain style relations. For example, for a true label set [Soul Music], the discrete label representation assigns Soul Music with the value of 1 while its related styles, R&B and Jazz, get the value of 0. Such discrete distribution does not encourage the model to learn the relation between Soul Music and its related styles. To better learn label correlations, a continuous label representation that involves label relations is desired as training target. Therefore, we propose a soft training method that combines the traditional discrete label representation (e.g., ) and the continuous label representation (e.g., ).

We first propose to use the learned label graph to transform the discrete representation into a continuous form. The motivation comes from that in a well-trained label graph, the values should reflect label relations to a certain extent. Two highly related labels should get a high relation value, and two independent labels should get a low relation value. However, in practice, we find that for each label, the relation value with itself is too large and the relation value with other labels is too small, e.g., [0.95, 0.017, 0.003]. It causes the generated label representation lacking sufficient label correlation information. Therefore, to enlarge the label correlation information in the generated label representation, we propose a smoothing method that punishes the high relation values and rewards the low relation values in . The method applies a softmax function with a temperature on to get a softer label graph , and uses to transform into a softer label representation.

For ease of understanding, we introduce our approach from the following two aspects: one for extracting music representation from reviews, the other for exploiting label correlations.

4.2 Hierarchical Attention Layer for Extracting Music Representation

This layer takes a set of reviews from the same sample as input, and outputs a music representation . Considering that the dataset is built upon a hierarchical structure where each sample has multiple reviews and each review contains multiple words, we propose a hierarchical network to collect style-related information from reviews.

We first build review representations via a Bi-directional Long-short Term Memory Network (Bi-LSTM) and then aggregate these review representations into the music representation. The aggregation process also adopts a Bi-LSTM structure that takes the sequence of review representations as input. Second, it is observed that different words and reviews are differently informative. Motivated by this fact, we introduce a two level of attention mechanism 

Bahdanau et al. (2014): one at the word level and the other at the review level. It lets the model to pay more or less attention to individual words and sentences when constructing the music representation .

4.3 Label Correlation Mechanism

4.3.1 Label Graph Layer

To explicitly take advantage of the label correlations when classifying music styles, we add a label graph layer to the network. This layer takes a music representation as input and generates a label probability distribution .

First, given an input , we use a sigmoid function to produce a “raw” label probability distribution as

(1)

where is a feed-forward network.

Formally, we denote as the label graph, where is the number of labels in the dataset,

is initialized by an identity matrix. An element

is a real-value score indicating how likely the label and the label are related in the training data. The graph is a part of parameters and can be learned by back-propagation.

Then, given the “raw” label probability distribution and the label graph , the output of this layer is:

(2)

Therefore, the probability of each label is determined not only by the current reviews, but also by its relations with all other labels. The label correlations can be viewed as supplementary information to refine the label probability distribution.

4.3.2 Soft Training

Given a predicted label probability distribution and a target discrete label representation , the typical loss function is computed as

(3)

where denotes all parameters, and is the number of the labels. The function denotes the cross entropy between two distributions.

However, the widely used discrete label representation does not apply to the task of music style classification, because the music styles are not mutually exclusive and highly related to each other. The discrete distribution without label relations makes the model over-distinguish the related labels. Therefore, it is hard for the model to learn the label correlations that are useful knowledge.

Instead, we propose a soft training method by combining a discrete label representation with a correlated-based continuous label representation . The probability values of should be able to tell which labels are correct, and the probability gap between two similar labels in should not be large. With the combination between and as training target, the classification model is no longer required to distinguish styles completely and can have a soft classification boundary.

A straight-forward approach to produce the continuous label representation is to use the label graph matrix to transform the discrete representation into a continuous form:

(4)

We expect that the values in a well-learned label graph should reflect the degree of label correlations. However, in practice, we find that for each label, the relation value with itself is too large and the relation value with other labels is too small. It causes the generated label representation lacking sufficient label correlation information. Therefore, to enlarge the label correlation information in , we propose a smoothing method that punishes the high relation values and rewards the low relation values in . We apply a softmax function with a temperature on to get a softer as

(5)

where is the dimension of each column in . This transformation keeps the relative ordering of relation values unchanged, but with much smaller range. The higher temperature makes the steep distribution softer. Then, the desired continuous representation is defined as

(6)

Finally, we define the loss function as

(7)

where the loss aims to correctly classify labels, and the loss aims to avoid the over-distinguishing problem and to better learn label correlations.

With the new objective, the model understands not only which labels are correct, but also the correlations of labels. With such soft training, the model is no longer required to distinguish the labels completely because a soft classification boundary is allowed.

5 Experiment

In this section, we evaluate our approach on the proposed dataset. We first introduce the baselines, the training details, and the evaluation metrics. Then, we show the experimental results and provide the detailed analysis.

5.1 Baselines

We first implement the following widely-used multi-label classification methods for comparison. Their inputs are the music representations which are produced by averaging word embeddings and review representations at the word level and review level respectively.

  • ML-KNN 

    Zhang and Zhou (2007): It is a multi-label learning approach derived from the traditional K-Nearest Neighbor (KNN) algorithm.

  • Binary Relevance Tsoumakas et al. (2010): It decomposes a multi-label learning task into a number of independent binary learning tasks (one per class label). It learns several single binary models without considering the dependences among labels.

  • Classifier Chains Read et al. (2011): It takes label dependencies into account and keeps the computational efficiency of the binary relevance method.

  • Label Powerset Tsoumakas and Vlahavas (2007): All classes assigned to an example are combined into a new and unique class in this method.

  • MLP: It feed the music representations into a multilayer perceptron, and generate the probability of music styles through a sigmoid layer.

Different from the above baselines, the following two directly process word embeddings. Similar to MLP, they produce label probability distribution by a feed-forward network and a sigmoid function.

  • CNN: It consists of two layers of CNN which has multiple convolution kernels, then feed the word embeddings to get the music representations.

  • LSTM: It consists of two layers of LSTM, which processes words and sentences separately to get the music representations.

5.2 Training Details

The features we use for the baselines and the proposed method are the pre-trained word embeddings of reviews. For evaluation, we introduce a hyper-parameter , and a label will be considered a music style of the song if its probability is greater than . We tune hyper-parameters based on the performance on the validation set. We set the temperature in soft training to 3, to 0.2, hidden size to 128, embedding size to 128, vocabulary size to 135K, learning rate to 0.001, and batch size to 128. The optimizer Adam Kingma and Ba (2014)

and the maximum training epoch is set to 100. We choose parameters with the best performance on the validation set and then use the selected parameters to predict results on the test set.

5.3 Evaluation Metrics

Multi-label classification requires different evaluation metrics from traditional single-label classification. In this paper, we use the following widely-used evaluation metrics.

  • F1-score: We calculate the micro F1 and macro F1, respectively. Macro F1 computes the metric independently for each label and then takes the average, whereas micro F1 aggregates the contributions of all labels to compute the average metric.

  • One-Error: One-error evaluates the fraction of examples whose top-ranked label is not in the gold label set.

  • Hamming Loss: Hamming loss counts the fraction of the wrong labels to the total number of labels.

5.4 Experimental Results

Models OE(-) HL (-) Macro F1(+) Micro F1(+)
ML-KNN 77.3 0.094 23.6 38.1
Binary Relevance 74.4 0.083 24.7 41.8
Classifier Chains 67.5 0.107 29.9 44.3
Label Powerset 56.2 0.096 37.7 50.3
MLP 71.5 0.081 29.8 45.8
CNN 37.9 0.099 32.5 49.3
LSTM 30.5 0.089 33.0 53.9
HAN (Proposal) 25.9 0.079 52.1 61.0
+LCM (Proposal) 22.6 0.074 54.4 64.5
Table 2: The comparisons between our approach and the baselines on the test set. The OE and HL denotes one-error and hamming loss respectively, the implemented approach HAN and LCM denotes the hierarchical attention network and the label correlation mechanism respectively. “+” represents that higher scores are better and “-” represents that lower scores are better. It can be seen that the proposed approach significantly outperforms the baselines.

We evaluate our approach and the baselines on the test set. The results are summarized in Table 2. It is obvious that the proposed approach significantly outperforms the baselines, with micro F1 of 64.5, macro F1 of 54.4, and one-error of 22.6, improving the metrics by 10.6, 21.4, and 7.9 respectively. The improvement is attributed to two parts, a hierarchical attention network and a label correlation mechanism. Only using the hierarchical attention network outperforms the baselines, which shows the effectiveness of hierarchically paying attention to different words and sentences. The greater F1-score is achieved by adding the proposed label correlation mechanism, which shows the contribution of exploiting label correlations. Especially, the micro F1 is improved from 61.0 to 64.5, and the macro F1 is improved from 52.1 to 54.4.

The results of baselines also reveal the usefulness of label correlations for improving the performance. ML-KNN and Binary Relevance, which over-simplify multi-label classification and neglect the label correlations, achieve the worst results. In contrast, Classifier Chains and Label Powerset, which take label correlations into account, get much better results. Though without explicitly taking advantage of label correlations, the neural baselines, MLP, CNN, and LSTM, still achieve better results, due to the strong learning ability of neural networks.

5.5 Incremental Analysis

Models OE(-) HL(-) Macro F1(+) Micro F1(+)
HAN 25.9 0.079 52.1 61.0
+LG 23.4 0.077 54.2 62.8
+ ST 22.6 0.074 54.4 64.5
Table 3: Performance of key components in the proposed approach. LG and ST denote the label graph layer and the soft training.

In this section, we conduct a series of experiments to evaluate the contributions of our key components. The results are shown in Table 3.

The method with the label graph does not achieve the expected improvements. It indicates that though with explicitly modeling the label correlations, the label graph does not play the expected role. It verifies our assumption that the traditional training method with discrete label representation makes the model over-distinguish the related labels, and thus does not learn label correlations well. To solve this problem, we propose a soft training method with a continuous label representation

that takes label correlations into account. It can be clearly seen that with the help of soft training, the proposed method achieves the best performance. Especially, the micro F-score is improved from 62.8 to 64.5, and the one-error is reduced from 23.4 to 22.6. With the new loss function, the model not only knows how to distinguish the right labels from the wrong ones, but also can learn the label correlations that are useful knowledge, especially when the input data contains too much style unrelated words for the model to extract all necessary information.

Ground Truth Without LCM With LCM
Britpop555Britpop is a style of British Rock., Rock Britpop Britpop, Rock
Hip-Hop666Hip-Hop is a mainstream Pop style., Pop, R&B777Rhythm and Blues, often abbreviated as R&B, is a genre of popular music. Electronic Music, Pop Pop, R&B
Pop, R&B Pop, Rock, Britpop Pop, R&B
Country Music, Folk, Pop Country Music, Pop Country Music, Pop, Folk
Classical Music, New-Age Music888New-Age Music is a genre of music intended to create artistic inspiration, relaxation, and optimism. It is used by listeners for yoga, massage, and meditation., Piano Music Piano Music, Classical Music Piano Music, New-Age Music, Classical Music
Table 4: Examples generated by the methods with and without the label correlation mechanism. The labels correctly predicted by two methods are shown in blue. The labels correctly predicted by the method with the label correlation mechanism are shown in orange. We can see that the method with the label correlation mechanism classifies music styles more precisely.

For clearer understanding, we compare several examples generated with and without the label correlation mechanism in Table 4. By comparing gold labels and predicted labels generated by different methods, we find that the proposed label correlation mechanism identifies the related styles more precisely. This is mainly attributed to the learned label correlations. For example, the correct prediction in the first example shows that, the label correlation mechanism captures the close relation between “Britpop” and “Rock”, which helps the model to generate a more appropriate prediction.

5.6 Visualization Analysis

Since we do not have enough space to show the whole heatmap of all 22 labels, we randomly select part of the heatmap to visualize the learned label graph. Figure 2 shows that some obvious music style relations are well captured. For “Country Music”, the most related label is “Folk Music”. In reality, these two music styles are highly similar and the boundary between them is not well-defined. For three kinds of rock music, “Heavy Metal Music”, “Britpop Music”, and “Alternative Music”, the label graph correctly captures that the most related label for them is “Rock”. For a more complicated relation where “Soul Music” is highly linked with two different labels, “R&B” and “Jazz”, the label graph also correctly capture such relation. These examples demonstrate that the proposed approach performs well in capturing relations among music styles.

Figure 2: The heatmap generated by the learned label graph. The deeper color represents the closer relation. For space, we abbreviate some music style names. We can see that some obvious relations are well captured by the model, e.g., “Heavy Metal Music (Metal)” and “Rock”, “Country Music (Country)” and “Folk”.

5.7 Error Analysis

Although the proposed method has achieved significant improvements, we also notice that there are some failure cases. In this section, we give the detailed error analysis.

First, the proposed method performs worse on the styles with low frequency in the training set. Table  5 compares the performance on the top 5 music styles of highest and lowest frequencies. As we can see, the top 5 fewest music styles get much worse results than top 5 most music styles. This is because the label distribution is highly imbalanced where unpopular music styles have too little training data. For future work, we plan to explore various methods to handle this problem. For example, re-sample original data to provide balanced labels.

Second, we find that some music items are wrongly classified into the styles that are similar with the gold styles. For example, a sample with a gold set [Country Music] is wrongly classified into [Folk] by the model. The reason is that some music styles share many common elements and only subtly differ from each other. It poses a great challenge for the model to distinguish them. For future work, we would like to research how to effectively address this problem.

Most Styles % of Samples F1
Rock 30.4 75.8
Independent Music 30.0 64.8
Pop 26.2 67.1
Folk Music 21.9 73.7
Electronic Music 13.9 61.8
Least styles % of Samples F1
Jazz 4.3 37.5
Heavy Metal Music 3.9 55.6
Hip-Hop 3.1 7.5
Post-punk 2.5 17.1
Dark Wave 1.3 17.4
Table 5: The performance of the proposed method on most and fewest styles.

6 Conclusions

In this paper, we focus on classifying multi-label music styles with user reviews. To meet the challenge of complicated style relations, we propose a label-graph based neural network and a soft training mechanism. Experiment results show that our proposed approach significantly outperforms the baselines. Especially, the micro F1 is improved from 53.9 to 64.5, and the one-error is reduced from 30.5 to 22.6. Furthermore, the visualization of label graph also shows that our method performs well in capturing label correlations.

References

  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
  • Boutell et al. (2004) Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004.

    Learning multi-label scene classification.

    Pattern Recognition, 37(9):1757–1771.
  • Chai and Vercoe (2001) Wei Chai and Barry Vercoe. 2001. Folk music classification using hidden markov models. In

    Proceedings of International Conference on Artificial Intelligence

    , volume 6.
  • Choi et al. (2017) Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. Convolutional recurrent neural networks for music classification. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 2392–2396.
  • Gopal and Yang (2010) Siddharth Gopal and Yiming Yang. 2010. Multilabel classification with meta-level features. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19-23, 2010, pages 315–322.
  • Hinton et al. (2015) Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR, abs/1503.02531.
  • Kazawa et al. (2004) Hideto Kazawa, Tomonori Izumitani, Hirotoshi Taira, and Eisaku Maeda. 2004. Maximal margin labeling for multi-topic text categorization. In Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, NIPS 2004, December 13-18, 2004, Vancouver, British Columbia, Canada], pages 649–656.
  • Kingma and Ba (2014) Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
  • Li et al. (2018a) Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang, and Xu Sun. 2018a. Sememe prediction: Learning semantic knowledge from unstructured textual wiki descriptions. CoRR, abs/1808.05437.
  • Li et al. (2018b) Wei Li, Zheng Yang, and Xu Sun. 2018b. Exploration on generating traditional chinese medicine prescription from symptoms with an end-to-end method. CoRR, abs/1801.09030.
  • Medhat et al. (2017) Fady Medhat, David Chesmore, and John Robinson. 2017. Automatic classification of music genre using masked conditional neural networks. In Data Mining (ICDM), 2017 IEEE International Conference on, pages 979–984. IEEE.
  • Oramas et al. (2016) Sergio Oramas, Luis Espinosa Anke, Aonghus Lawlor, Xavier Serra, and Horacio Saggion. 2016. Exploring customer reviews for music genre classification and evolutionary studies. In Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016, New York City, United States, August 7-11, 2016, pages 150–156.
  • Oramas et al. (2017) Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra. 2017. Multi-label music genre classification from audio, text, and images using deep features. arXiv preprint arXiv:1707.04916.
  • Pikrakis et al. (2006) Aggelos Pikrakis, Sergios Theodoridis, and Dimitris Kamarotos. 2006. Classification of musical patterns using variable duration hidden markov models. IEEE Trans. Audio, Speech & Language Processing, 14(5):1795–1807.
  • Qi et al. (2007) Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei, and Hong-Jiang Zhang. 2007. Correlative multi-label video annotation. In Proceedings of the 15th International Conference on Multimedia 2007, Augsburg, Germany, September 24-29, 2007, pages 17–26.
  • Qin and Ma (2005) Dan Qin and GZ Ma. 2005. Music style identification system based on mining technology. Computer Engineering and Design, 26:3094–3096.
  • Read et al. (2011) Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning, 85(3):333.
  • Sanden and Zhang (2011) Chris Sanden and John Z. Zhang. 2011. Enhancing multi-label music genre classification through ensemble techniques. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011, pages 705–714.
  • Sun et al. (2017) Xu Sun, Bingzhen Wei, Xuancheng Ren, and Shuming Ma. 2017. Label embedding network: Learning label representation for soft training of deep networks. CoRR, abs/1710.10393.
  • Tsoumakas et al. (2010) Grigorios Tsoumakas, Ioannis Katakis, and Ioannis P. Vlahavas. 2010. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, 2nd ed., pages 667–685.
  • Tsoumakas and Vlahavas (2007) Grigorios Tsoumakas and Ioannis P. Vlahavas. 2007. Random k -labelsets: An ensemble method for multilabel classification. In Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings, pages 406–417.
  • Tzanetakis and Cook (2002) George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, 10(5):293–302.
  • Wang et al. (2009) Fei Wang, Xin Wang, Bo Shao, Tao Li, and Mitsunori Ogihara. 2009. Tag integrated multi-label music style classification with hypergraph. In Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR 2009, Kobe International Conference Center, Kobe, Japan, October 26-30, 2009, pages 363–368.
  • Wang et al. (2008) Mei Wang, Xiangdong Zhou, and Tat-Seng Chua. 2008. Automatic image annotation via local multi-label classification. In Proceedings of the 7th ACM International Conference on Image and Video Retrieval, CIVR 2008, Niagara Falls, Canada, July 7-9, 2008, pages 17–26.
  • Wei et al. (2018) Bingzhen Wei, Xuancheng Ren, Xu Sun, Yi Zhang, Xiaoyan Cai, and Qi Su. 2018. Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency. CoRR, abs/1805.04033.
  • Xu et al. (2003) Changsheng Xu, Namunu C Maddage, Xi Shao, Fang Cao, and Qi Tian. 2003. Musical genre classification using support vector machines. In Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, volume 5, pages V–429.
  • Yang et al. (2018) Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: sequence generation model for multi-label classification. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pages 3915–3926.
  • Zhang and Zhou (2007) Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048.
  • Zhou et al. (2006) Yatong Zhou, Taiyi Zhang, and Jiancheng Sun. 2006. Music style classification with a novel bayesian model. In Advanced Data Mining and Applications, Second International Conference, ADMA 2006, Xi’an, China, August 14-16, 2006, Proceedings, pages 150–156.
  • Zhu et al. (2005) Shenghuo Zhu, Xiang Ji, Wei Xu, and Yihong Gong. 2005. Multi-labelled classification using maximum entropy method. In SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15-19, 2005, pages 274–281.