Music Plagiarism Detection via Bipartite Graph Matching

by   Tianyao He, et al.
Shanghai Jiao Tong University

Nowadays, with the prevalence of social media and music creation tools, musical pieces are spreading much quickly, and music creation is getting much easier. The increasing number of musical pieces have made the problem of music plagiarism prominent. There is an urgent need for a tool that can detect music plagiarism automatically. Researchers have proposed various methods to extract low-level and high-level features of music and compute their similarities. However, low-level features such as cepstrum coefficients have weak relation with the copyright protection of musical pieces. Existing algorithms considering high-level features fail to detect the case in which two musical pieces are not quite similar overall, but have some highly similar regions. This paper proposes a new method named MESMF, which innovatively converts the music plagiarism detection problem into the bipartite graph matching task. It can be solved via the maximum weight matching and edit distances model. We design several kinds of melody representations and the similarity computation methods according to the music theory. The proposed method can deal with the shift, swapping, transposition, and tempo variance problems in music plagiarism. It can also effectively pick out the local similar regions from two musical pieces with relatively low global similarity. We collect a new music plagiarism dataset from real legally-judged music plagiarism cases and conduct detailed ablation studies. Experimental results prove the excellent performance of the proposed algorithm. The source code and our dataset are available at



page 1

page 2

page 3

page 4


A dataset and classification model for Malay, Hindi, Tamil and Chinese music

In this paper we present a new dataset, with musical excepts from the th...

On Musical Onset Detection via the S-Transform

Musical onset detection is a key component in any beat tracking system. ...

Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

High-level musical qualities (such as emotion) are often abstract, subje...

A data-driven approach to mid-level perceptual musical feature modeling

Musical features and descriptors could be coarsely divided into three le...

Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders

On an artist's profile page, music streaming services frequently recomme...

Computing Information Quantity as Similarity Measure for Music Classification Task

This paper proposes a novel method that can replace compression-based di...

CorpusVis: Visual Analysis of Digital Sheet Music Collections

Manually investigating sheet music collections is challenging for music ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and Related Work

Music plagiarism, which is the use of another work and trying to pass it off as one’s original work, has always been a controversial topic making headlines now and then. Today, the

Figure 1. The graph perspective of melody similarity. We segment the musical scores. Each segment represents a node in the bipartite graph. The edge between two nodes has a weight equals to the function of the edit distance. The similarity of two pieces can be calculated by computing the maximum weight matching.

the number of music documents available on the Internet is increasing rapidly. Each year, over 10,000 new albums of recorded music are released and over 100,000 new musical pieces are registered for copyright (Uitdenbogerd and Zobel, 1999). It is easy for common users to reach musical content anywhere and anytime on the Internet. While there also exist more opportunities for unintentional and intentional plagiarism.

Given the huge amount of money that music can generate, the number of lawsuits and revenue loss due to plagiarism and pirate copies has been escalating exponentially (De Prisco et al., 2016). However, there are no general rules that set a minimum number of similar notes or beats for music copyright infringement  (De Prisco et al., 2017a). When music plagiarism cases are brought to court, independent music experts will analyze the similarities between two songs and judge relying on their subjective opinion (Dittmar et al., 2012). A tool that can automatically detect the similarities between songs will assist the music expert in evaluating plagiarism quickly and effectively. Besides, with the development of automatic music generation technology, machines can compose music if trained with enormous existing music data (Briot and Pachet, 2020). It is also important for the machines to avoid plagiarism in generated music. Considering a large number of generated music and existing music data, there is an eager need for the automatic music plagiarism detection tool.

Though there are many kinds of plagiarism in music, such as sample plagiarism, melody plagiarism, rhythm plagiarism and so on (Dittmar et al., 2012), melody plagiarism is prominent in the accusation of plagiarism. Recognizing the similarities between melodic fragments is the basic ability of a music plagiarism detection system.

Figure 2. The flow chart of MESMF algorithm. Given two musical pieces in the midi form, we can extract their notes’ information and generate sequence A and sequence B, where each element in the sequence contains pitch, duration, and downbeat information. Then, we transform the sequences to relative form by subtraction of pitch and division of duration between neighboring elements. Next, we cut the sequence into several segments and treat them as nodes in the bipartite graph. We can perform the maximum weight matching algorithm to get the final similarity and matching results.

Over the past decades, there have been some research works focusing on melody plagiarism detection (De Prisco et al., 2017c, b). Some giant companies like Sony have attached their attention to this area (Pachet et al., 2006). Generally, the existing plagiarism detection methods can be categorized as audio-based methods and sheet-based methods according to the objects they are dealing with.

The audio-based methods inspect the music similarity by comparing the time-frequency representation of the music audio excerpts (Downie et al., 2008; Dittmar et al., 2012). These audio-based methods use features like cepstrum coefficients which have weak relation with the copyright protection of musical pieces. And the shift and swapping problem also exists. The whole song may be a copy of another one, but the notes’ order is shifted, and these algorithms cannot detect this situation.

The sheet-based method is to compare the similarity between music note sequences (Sie et al., ). This method aims to measure symbolic melodic similarity, which plays a crucial role in Music Information Retrieval (MIR) (Velardo et al., 2016). Symbolic melodic similarity evaluates the degree of similarity given several musical sequences as human listeners can do. The sheet-based method is intuitive and compact. However, computing of symbolic melodic similarity is non-trivial. It is much different from comparing text similarity. We need to consider the importance of downbeats, consonance, etc. We also should notice that two songs may appear to be completely different on the sheets, but this is achieved simply by changing the pitch and duration of the notes, or maybe one song is just a change in the structure of another, and so on. Some works like (Robine et al., 2007) directly formulate the similarity computation to a sequences comparison problem. They use edit-distance to obtain the minimum cost of transforming sequence to sequence . Their work shows good performance in short musical pieces comparison but shows really poor performance when considering cases like shift or swapping some periods of music. An example of this case is shown in Figure 3.

Figure 3. An example for melody shifts in music plagiarism. In this example, the first piece is “ABC”, while the second piece is “BCA”. This is a kind of shift commonly seen in music plagiarism. It is hard for general string alignment algorithms to detect, which motivates this paper.

The experiments in  (Downie et al., 2005)

demonstrate the limited results of these methods. There also exist algorithms relying on music representations based on n-grams techniques, which were adapted from string matching domain 

(Doraisamy and Rüger, 2003; Bainbridge et al., 2005). These methods are simple and involve counting the different items that the query and the potential result have in common. However, this technique only considers two results: matching and mismatching, in the counting process of subsequences, apparently ignoring that when music fragments are similar considering music theory and perception, the two do not need to match completely. Therefore, this method fails to show good results in experiments, which is shown in our experiment part. Besides, work (Müllensiefen and Pendzich, 2009) combines n-gram features with other similarity computation methods like Ukkonen measure, Sum Common measure, TF-IDF correlation, Tversky’s measure, and so on, which have shown good performance. Nevertheless, these measures are all based on the assumption that the number and frequency of common or different n-gram features are related to the overall similarity perception when comparing two musical pieces. When confronted with the situations that plagiarism exists in a relatively small part, this method does not make sense.

In this paper, we propose a novel algorithm for melody plagiarism detection. We innovatively treat the melodies as graphs with consecutive nodes, and the two melodies in comparison are converted into a bipartite graph. Then the maximum weight matching and edit distances algorithm are used to compute the maximum weight matching of this bipartite graph, which successfully deals with the problems of shift or swapping some periods of music. To better parse the melody into sequences, we explore several kinds of pitch and duration representations. We also propose some tricks based on music theory to improve the similarity computation. The system gives more attention to downbeats, consonance, being transposition invariant and tempo invariant, and so on.

Some works about music plagiarism have established datasets with different sources. Daniel  (Müllensiefen and Pendzich, 2009) selects 20 US-copyright cases ranging from 1970 to 2005 and obtain most of their MIDI files from website of copyright infringement project 111 Roberto’s work  (De Prisco et al., 2017c) uses the dataset of plagiarism cases in  (Cronin, 2020) which collects lawsuits before 2016 and is constructed by Columbia Law School. Other works on music similarity computation also use datasets from MIREX 222

which includes pairs of musical pieces with similar symbolic melodies. These datasets are not open sources and the songs they contain may are not relatively new. Having noticed their shortcomings and the small size of the public dataset of melody plagiarism detection, we collect more melody plagiarism data from the real lawsuits. These data will be made public for research use. Extensive experiments prove the superiority of our proposed method. The contribution of this paper is summarized as follows:

  • We propose to perform melody plagiarism detection via bipartite graph matching and we present an algorithm that is easy to implement. The proposed algorithm can cope with the tune transposition, notes shift and swapping problem in plagiarism. It also can pick out the local similar regions between two musical pieces with low global similarity. The computational complexity of the proposed algorithm is given in this paper.

  • We design the melody representations and the similarity computation method according to the music theory. Several kinds of pitch and duration representations are explored. The pitch variation, duration variation, note downbeats, and consonance are taken into consideration for the similarity computation.

  • We publish a new dataset. We extend the existing public datasets of melody plagiarism detection by collecting more data from real lawsuits. Extensive experiments are conducted on the extended dataset and the results prove the effectiveness of our proposed algorithm.

The rest of this paper is organized as follows. Section 2 presents the preliminary knowledge and demonstrates the proposed algorithm. In Section 3, we show some improvement tricks. The algorithm complexity is presented in Section 4. Section 5 shows the experiments and analysis. Finally, the paper is concluded in Section 6.

2. Proposed Algorithm

2.1. Preliminary

2.1.1. String Alignment Problems

Given two strings , , an alignment is an assignment of gaps to positions in , and in , so as to line up each letter in one sequence with either a letter or a gap in the other sequence. Of all the distance measures that compare string similarity, the most widely used one is the edit distance (Kruskall and Liberman, 1983). The edit operations most commonly considered are the deletion of a symbol, the insertion of a symbol, and the substitution of one symbol for another (Hall and Dowling, 1980; Hunt and Szymanski, 1977). The string alignment problem is a dual problem of the edit distance.

2.1.2. Maximum Weight Matching

The maximum weight matching problem is to find, in a weighted graph , a matching in which the sum of weights is maximized (Duan and Pettie, 2014). The traditional solutions to the maximum weight matching problem are the Hungarian Algorithm and Kuhn–Munkres algorithm (KM-Algorithm) (Kuhn, 1955). KM-Algorithm’s time complexity is to calculate the maximum weight of the matching where denotes the cardinality of the matching.

2.1.3. Melody Representation

According to Mongeau’s work (Mongeau and Sankoff, 1990)

, each monophonic musical piece can be transformed into a sequence of ordered pairs by representing every note as a pair. In their model, a pair is composed of two parts: the pitch and the duration of the corresponding note. We made some variations based on their model, and the melody shown in Figure

4 can be transformed to the sequence:

Here, the capital letters represent pitch information, and the number in the subscript indicates the length of the note in sixteenth notes.

Figure 4. Illustration of a musical piece.

At the same time, the work (Downie, 2003) proposes different alphabets of characters and sets of numbers to represent both the duration features and the pitch features of notes. In this section, we show some representation which we think are suitable for detecting music plagiarism.

  1. Pitch Representation

    The pitch feature of a musical piece can be expressed mainly in three ways: pitch contour, absolute pitch, and relative pitch. All of them have their advantages and we will give brief introductions to them separately.

    1. Pitch Contour
      The pitch contour describes the trend and variation between successive notes. For all the notes’ pitch features, they are simplified into only three values: Up, Down, and Same which depends on the relationship with the former note. The melody shown in Figure 4 can be transformed to:

      The benefit of this method is that the range of value is really small, which can largely reduce the time complexity and space complexity when computing similarity scores.

    2. Absolute Pitch
      The absolute pitch is the MIDI number of each note. The melody shown in Figure 4 can be represented as:

      To make the range of absolute pitch smaller and simpler, we convert the exact pitches to modulo-12 values. Also, we can consider the successive pitches’ variation by assigning the note positive value when melody moves up and negative value otherwise. In this way, the melody of Figure 4 can be represented as

      The advantage of this method is that the sequence describes the pitch character of the music without information loss.

    3. Relative Pitch
      Compared with the previous two methods, relative pitch computes the difference of successive notes (number of semitones). Then the melody in Figure 4 can be represented as:

      This method is robust to transposition, which means that increasing pitches of all the notes to the same degree will not influence the pitch sequence we extracted. This property has proved to be meaningful when detecting music plagiarism in our experiment.

  2. Duration Representation

    Similar to pitch representation, there are three main methods to represent the duration feature: duration contour, absolute duration, and relative duration.

    1. Duration Contour
      Similar to Pitch Contour, we use three values (Shorter (S), Same (s), Longer (L)) to describe the duration trend between successive notes. Then the duration sequence of example melody Figure 4 is:

    2. Absolute Duration
      In terms of the absolute duration, we also use the duration defined by MIDI notation as standard which indicates the length of the note in sixteenth notes:

    3. Relative Duration
      To make the sequence tempo invariant, we use the duration ratio to extract the successive notes’ duration relationship:

      The tempo invariance means that speeding up or slowing down the musical pieces will not influence the duration sequences we extract from them.

2.2. Problem Formulation

Given two songs and , we extract their melodic features and transform them into two sequences and where denotes the -th note of song , denotes the -th note of song and denote the number of notes of two songs. Every note is expressed as:

which is established as a tuple with three elements where denotes its pitch feature, denotes its duration feature and denotes whether this note is a downbeat. Based on the melodic sequences, we are expected to compute the plagiarism degree of the two songs and further find all the pairs of pieces from two songs and which show great plagiarism degree.

The algorithm based on the melodic similarity between two different songs consists of two parts. The first part is to extract the melodic features and generate the corresponding sequence, which we name as the Sequence of the Melodic Features (SMF). The second part is to calculate the Similarity score between two melodic representations which reflect the plagiarism degree. We treat the melody similarity computation as a graph matching problem. Then the calculation of similarity score is formulated as a maximum weight matching problem and can be solved based on edit distances. For short, we propose the MESMF (Maximum weight matching and Edit distances model applied on the Sequences of Melodic Features) method. This is a novel perspective and method to calculate the similarity between two melody sequences. These two parts will be introduced in detail in the next two sections. The procedure of our proposed method can be understood clearly through Figure 2.

2.3. Melody Similarity Computation

In the previous section, we transform two songs and into two sequences and by extracting the symbolic and melodic features. In this section, we will compute the similarity score.

We think of formulating the similarity calculation between two melodies as a maximum weight matching problem. According to the melodic sequences extracted, we establish a bipartite graph first and regard the value of the maximum weight matching as their similarity score. We will introduce the bipartite graph here. Figure 1 helps to understand this process.

2.3.1. Nodes Formalization

Firstly, we divide sequences and into pieces with the same length and overlapping rate which are two hyper-parameters and obtain two piece lists and . The hyper-parameter overlapping rate is used to avoid the situation that we cut an integral part into different pieces and reduce our detection performance.

Next, we formulate each piece in as a node in the left node-set and each piece in as a node in the right node-set .

2.3.2. Edges Formalization

For each node in and each node in , we construct an edge with value equal to the similarity score between the two corresponding pieces and which mean the -th piece of and the -th piece of .

The similarity score is computed based on the edit distance of these two sequences. Let us consider the three elementary operations that are usually used to compare musical sequences: substitution, insert and delete. Let be an edit operation, a cost is assigned to each edit operation as follows:

  • If substitutes (-th character of ) into (-th character of ), then

  • If deletes then

  • If inserts then

Then the edit-distance of sequences and can be calculated through dynamic programming:


Initially, we assign each operation with the same constant cost. In the next section, we will use many tools to optimize the cost by considering music theory.

We only obtain the distance of two pieces and it is necessary to transform the distance into similarity score through transformation function :


In terms of choosing function

, we have tried several different types of transformation functions like linear transformation

, inverse transformation and log-exponential transformation . Figure 5 illustrates the characteristics of the three transformation functions.

Figure 5. Piece similarity of various transformations.

Through experimenting and comparing the results, we find that the log-exponential function has the best performance. The log-exponential function’s optimality is reasonable because its slope decreases with the increase of distance, which means that when two pieces are different in a large degree, a little more difference will not strongly influence the similarity score. Although the inverse transformation function also has this property, it decreases too sharply when the distance is small.

2.3.3. Solve the Problem based on KM-Algorithm

To solve the maximum weighting problem formulated in previous parts, we use the Kuhn–Munkres algorithm to compute the maximum weight of matching and regard it as the similarity score of music and .

3. Further Improvement based on Music Theory

Substitution is the main edit operation that largely affects the performance of our music plagiarism algorithm. In our initial assumption, we assign every elementary operation the same constant cost. However, this assumption contradicts the real situation and ignores the music theory. For example, substituting a note with a note affects the melody much more than with a note . To improve the accuracy, we consider the music theory and come up with several optimization methods: considering pitch variation, duration variation, note downbeat as well as a consonance of the music.

3.1. Pitch Variation

In most cases, the larger we change the pitch of a note, the more we will affect the original music. Therefore, we consider the variation of the pitch by directly using the difference of two pitches as the pitch cost of the operation:


3.2. Duration Variation

When calculating the substitution cost between two notes, we can consider the variation of duration. The insertion or deletion of a half note may disturb more significantly a melody than the insertion or deletion of a sixteenth note. Therefore, we use a hyper-parameter to model the relative importance of note duration:


3.3. Note Downbeat

According to music theory, there are strong beat positions and weak beat positions in a musical piece. The notes in the strong beat positions are of more significance to the music compared with the notes in the weak beat positions. We consider this situation by giving the note in the strong beat position more substitution weight which is also a hyper-parameter:


3.4. Consonance

According to (Mongeau and Sankoff, 1990) and (Typke et al., 2004) work, the substitution cost may be correlated to the consonance interval. The substitution cost of note and based on the absolute pitch difference fits the consonance: the fifth and the third major or minor is the most consonant intervals in Western Music. Table 1 shows the associated scores.

Difference 0 1 2 3 4 5 6 rest
Cost 0 5.7 5.325 3.675 3.675 2.85 4.65 3.35
Table 1. Substitution cost based on absolute pitch difference according to consonance.

4. Algorithm Complexity Analysis

We suppose that has notes, has notes and we divide them into pieces with the same length and overlapping rate . Music is divided into pieces and music is divided into pieces. They need space

to store these pieces.

To calculate the edit distance between two pieces with the same length , we use dynamic programming whose time complexity is and space complexity is .

To construct the bipartite graph, we need to calculate the similarity score between every piece from and every piece from which needs time

and space

There are and nodes in the left and right set, respectively.

Finally, we use the KM algorithm to compute the maximum weight of matching, which needs time and space . To sum up, our algorithm needs time and space .

5. Experiments

For the lack of widely accepted evaluation in the field of music plagiarism detection, we propose a new dataset and experiment protocol, to rationally reflect the ability of our detection algorithm.

5.1. Dataset

We use two datasets for the experiment. The first dataset consists of 9 pairs of songs from Ping An Tech’s work333 and some well-known music plagiarism cases. In this paper, we propose the second dataset, which is composed of 20 pairs of songs, where each pair is legally judged as plagiarism by the court (Cronin, 2020). We construct this dataset through several steps: searching for music plagiarism lawsuits, finding the target songs, getting their corresponding midi files, extracting the melody, and transforming them to sequences (see Figure 6).

Figure 6. The establishment of our new dataset. The procedures include finding music, getting midi files, extracting melody, and transforming to sequences.

5.2. Experiment Setting

Our goal is to evaluate our algorithm’s ability to find out the music plagiarism and detect similar sub-sequences. In the experiment, we mix up the musical pieces in the dataset. Each time we pick out one of the songs and calculate its similarity with other songs. If the song in the same pair (the plagiarized one) ranks first among all the other songs, we regard our algorithm as correct in this evaluation.

Finally, we can calculate the complete accuracy of the algorithm. The procedures are shown in Figure 7.

Figure 7. The protocol of our experiment. We compare the similarity scores of the selected musical piece and the other pieces in the dataset. If the plagiarized one ranks first, we consider our algorithm gives an accurate result.

In the experiment, we focus on two factors: the average ranking index of the correct songs and the accuracy. We denote the accuracy and average index of dataset as Average Index and ACC, .

5.3. Evaluation Result

In the evaluation, we tune four hyper-parameters: piece length, overlap, duration weight and downbeat weight .

We first try different piece lengths (see Table 2), which means the number of notes. The average index and accuracy curve with different piece lengths are shown in Figure 8. We can find that the optimal piece length for dataset 1 is 7 and for dataset 2 is 5. A short piece cannot well represent the melody of music, while a long piece is not sensitive to similarity. A piece length of five or seven is about two to three bars in music, which explains why they are optimal.

Piece Length Average Index1 ACC1 Average Index2 ACC2
4 3.39 0.3889 4.39 0.7317
5 3.56 0.5 4.41 0.7560
6 2.16 0.6111 4.92 0.6341
7 1.11 0.9445 4.80 0.6585
8 1.83 0.7222 4.80 0.6098
9 3.0 0.4445 5.0 0.5610
10 3.28 0.5 5.19 0.5366
11 2.89 0.4444 6.19 0.4878
12 4.72 0.4444 6.41 0.4878
Table 2. Mean index and accuracy by various piece lengths.
Figure 8. Mean index and accuracy over piece lengths.

Then we tune the overlap rate (see Table 3). The average index and accuracy curve with different overlap rates are shown in Figure 9. We find that 0.3 and 0.8 are the best overlap rates for two datasets since they can best detect similarity and avoid meaningless repetition.

Overlap Rate Average Index1 ACC1 Average Index2 ACC2
0.0 2.50 0.6667 3.80 0.5122
0.1 2.50 0.6667 3.80 0.5122
0.2 1.61 0.7778 4.61 0.5854
0.3 1.11 0.9444 4.61 0.5854
0.4 1.11 0.9444 5.05 0.6829
0.5 2.50 0.6667 5.05 0.6829
0.6 1.44 0.7222 4.68 0.6341
0.7 1.44 0.7222 4.68 0.6341
0.8 1.5 0.8333 4.41 0.7561
0.9 1.17 0.8333 4.41 0.7561
Table 3. Mean index and accuracy over overlap rates.
Figure 9. Mean index and accuracy over overlap rates.
Average Index1 ACC1 Average Index2 ACC2
0.0 1.11 0.9444 4.41 0.7561
0.1 1.61 0.7222 4.39 0.7317
0.2 2.16 0.5556 4.41 0.7073
0.3 2.67 0.50 4.29 0.7073
0.4 2.77 0.50 4.24 0.7073
0.5 3.89 0.50 4.21 0.7073
0.6 3.56 0.50 4.29 0.7073
0.7 3.83 0.50 4.24 0.7317
0.8 3.94 0.50 4.29 0.7317
Table 4. Mean index and accuracy with different .

Next, we tune the duration weight (see Table 4). Table 4 shows that is optimal on accuracy but not optimal on the average index, which means consideration of duration harm the accuracy of our algorithm but improve the general situation like the worst cases. The reasons are that using midi content to represent duration may not be accurate and plagiarism often has little change in the note duration. In this case, the relative duration information may make our algorithm confused.

Average Index1 ACC1 Average Index2 ACC2
1.2 1.83 0.6667 4.65 0.7561
1.3 1.78 0.6111 4.68 0.7561
1.4 1.72 0.6667 4.58 0.7561
1.5 1.72 0.6667 4.48 0.7561
1.6 1.72 0.6667 4.56 0.7317
1.7 1.67 0.6667 4.48 0.7317
1.8 1.67 0.6667 4.46 0.7317
2 1.67 0.6667 4.43 0.7317
2.5 1.72 0.6667 4.48 0.7317
3 1.77 0.6667 4.44 0.7317
Table 5. Mean index and accuracy over downbeat weights.

Finally, we test different weights of downbeats in Table 5. We get the optimal downbeat 1.7 for accuracy, which adds proper importance to the downbeats since it affects the rhythm and emotion of music.

EditDistance DirectPitch RelativePitch MaxMatch Downbeat NoteDistance Index1 ACC1 Index2 ACC2
4.17 0.3889 5.87 0.6829
2.11 0.6111 4.85 0.7553
1.83 0.6667 4.65 0.7561
1.11 0.9445 4.41 0.7561
1.67 0.8889 4.49 0.7317
2.00 0.6667 8.34 0.6829
1.94 0.6111 7.97 0.6829
Table 6. Mean index and accuracy using different compositions of optimization methods for similarity measure.

Also, we test the effect of our optimization methods (see Table 6) where Direct Pitch means considering consonance weight when computing the edit distance, Relative Pitch means using relative pitch sequence to represent, MaxMatch means segmenting the sequence and computing the maximum weight matching, downbeat means considering whether the note is downbeat and note distance means considering the variation of the pitch when computing the edit distance. With MaxMatch, our algorithm becomes robust to shift and swapping. The accuracy achieves 61.1% and 75.53% which is a lot higher than not using MaxMatch. After we consider more music features such as relative pitch and note distance, the accuracy soars to 94.45% and 75.61% and the average index jumps to 1.11 and 4.41. This shows the importance of major key and pitch differences in music.

Finally, we compare our algorithm with other existing solutions. The result is shown in Table 7. Four baselines are based on the n-gram features. The Ukkonen and Sum Common methods (equations in 6) consider the difference of all n-gram features occurring in either one musical piece (Müllensiefen and Frieler, ).


TF-IDF correlation method is widely used as a similarity measure for retrieving text documents (25). In our experiment, we use the n-gram features weighted by their frequency in both two musical pieces and prevalence in our dataset which is measured by inverted document frequency (Manning and Schütze, 1999): where is the size of the dataset and means the number of pieces including . Tversky’s ratio model is originated from (Tversky, 1988) which is adapted by inserting :

For every baseline, we tune the hyperparameter

and record the best performance. From the table, we can see that all of their performances are worse than our method in this dataset. This is because methods based on n-gram features all rely on the assumption that the number and frequency of common or different n-gram features are related to the overall similarity perception, but this does not work in the plagiarism dataset.

Method Index1 ACC1 Index2 ACC2
Sum Common (Müllensiefen and Frieler, ) 1.44 0.7222 6.75 0.6585
Ukkonen (Müllensiefen and Frieler, ) 1.39 0.7222 6.68 0.7073
TF-IDF correlation (25) 1.33 0.7778 6.39 0.7073
Tversky-equal (Tversky, 1988) 1.28 0.7778 6.26 0.6585
MESMF (ours) 1.11 0.9445 4.41 0.7561
Table 7. Average index and accuracy of different methods.

Our algorithm is also capable of finding the similarity pieces of music, which is our one-to-many detection. We here show a demo where we compare one famous Chinese song with other songs with potential plagiarism. The outcome is visualized in Figure 10.

Figure 10. An example of one-to-many similar pieces detection. The song on the left is compared with songs on the right. Musical pieces in the same color have similar melodic features and denotes the corresponding similarity score.

The same color means a similar pair of pieces. By listening to the piece pairs, we find they have high auditory similarity. This result also shows the power of our algorithm.

6. Conclusion

In this paper, we have proposed a new MESMF method for music plagiarism detection. It represents music as a sequence based on melodic features including pitch, duration, consonance, and downbeats. It also effectively computes the music similarity. In this model, we combine the advantages of both edit distances and maximum weight matching to operate the music sequence. Our model is robust to sequential shifts and changes in pitch and duration.

In the evaluation part, we design a new evaluation dataset and experiment. The dataset consists of many pairs of music judged as plagiarism. Our experiment requires the algorithm to find plagiarism in the dataset. Our algorithm outperforms all the other ones in the experiment.

In the future, we plan to extend our algorithm to music with multiple tracks and consider more auditory features to increase our accuracy. We also plan to apply our algorithm in other fields, such as music search and music recommendation. Other ideas such as collaborative filtering and the neural network may also give us ideas to improve our algorithm. Finally, we want to improve our experiment and dataset to facilitate more researchers in this field.


  • D. Bainbridge, M. Dewsnip, and I. H. Witten (2005) Searching digital music libraries. Information processing & management 41 (1), pp. 41–56. Cited by: §1.
  • J. Briot and F. Pachet (2020) Deep learning for music generation: challenges and directions. Neural Computing and Applications 32 (4), pp. 981–993. Cited by: §1.
  • C. Cronin (2020) External Links: Link Cited by: §1, §5.1.
  • R. De Prisco, A. Esposito, N. Lettieri, D. Malandrino, D. Pirozzi, G. Zaccagnino, and R. Zaccagnino (2017a) Music plagiarism at a glance: metrics of similarity and visualizations. In 2017 21st International Conference Information Visualisation (IV), pp. 410–415. Cited by: §1.
  • R. De Prisco, N. Lettieri, D. Malandrino, D. Pirozzi, G. Zaccagnino, and R. Zaccagnino (2016) Visualization of music plagiarism: analysis and evaluation. In 2016 20th International Conference Information Visualisation (IV), pp. 177–182. Cited by: §1.
  • R. De Prisco, D. Malandrino, G. Zaccagnino, and R. Zaccagnino (2017b) A computational intelligence text-based detection system of music plagiarism. In 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 519–524. Cited by: §1.
  • R. De Prisco, D. Malandrino, G. Zaccagnino, and R. Zaccagnino (2017c) Fuzzy vectorial-based similarity detection of music plagiarism. In 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–6. Cited by: §1, §1.
  • C. Dittmar, K. F. Hildebrand, D. Gärtner, M. Winges, F. Müller, and P. Aichroth (2012) Audio forensics meets music information retrieval—a toolbox for inspection of music plagiarism. In 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1249–1253. Cited by: §1, §1, §1.
  • S. Doraisamy and S. Rüger (2003) Robust polyphonic music retrieval with n-grams. Journal of Intelligent Information Systems 21 (1), pp. 53–70. Cited by: §1.
  • J. S. Downie, M. Bay, A. F. Ehmann, and M. C. Jones (2008) Audio cover song identification: mirex 2006-2007 results and analyses.. In ISMIR, pp. 468–474. Cited by: §1.
  • J. S. Downie (2003) Music information retrieval. Annual review of information science and technology 37 (1), pp. 295–340. Cited by: §2.1.3.
  • J. Downie, K. West, A. Ehmann, and E. Vincent (2005) The 2005 music information retrieval evaluation exchange (mirex 2005): preliminary overview. In 6th int. conf. on music information retrieval (ismir), pp. 320–323. Cited by: §1.
  • R. Duan and S. Pettie (2014) Linear-time approximation for maximum weight matching. Journal of the ACM (JACM) 61 (1), pp. 1–23. Cited by: §2.1.2.
  • P. A. Hall and G. R. Dowling (1980) Approximate string matching. ACM computing surveys (CSUR) 12 (4), pp. 381–402. Cited by: §2.1.1.
  • J. W. Hunt and T. G. Szymanski (1977) A fast algorithm for computing longest common subsequences. Communications of the ACM 20 (5), pp. 350–353. Cited by: §2.1.1.
  • J. Kruskall and M. Liberman (1983) The theory and practice of sequence comparison. In Time warps, string edits, and macromolecules, pp. 1–44. Cited by: §2.1.1.
  • H. W. Kuhn (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2 (1-2), pp. 83–97. Cited by: §2.1.2.
  • C. D. Manning and H. Schütze (1999)

    Foundations of statistical natural language processing

    Foundations of Statistical Natural Language Processing. Cited by: §5.3.
  • M. Mongeau and D. Sankoff (1990) Comparison of musical sequences. Computers and the Humanities 24 (3), pp. 161–175. Cited by: §2.1.3, §3.4.
  • [20] D. Müllensiefen and K. Frieler Cognitive adequacy in the measurement of melodic similarity: algorithmic vs. human judgments. computing in musicology. Cited by: §5.3, Table 7.
  • D. Müllensiefen and M. Pendzich (2009) Court decisions on music plagiarism and the predictive value of similarity algorithms. Musicae Scientiae 13 (1_suppl), pp. 257–295. External Links: Document, Link, Cited by: §1, §1.
  • F. Pachet, J. Aucouturier, A. La Burthe, A. Zils, and A. Beurive (2006) The cuidado music browser: an end-to-end electronic music distribution system. Multimedia Tools and Applications 30 (3), pp. 331–349. Cited by: §1.
  • M. Robine, P. Hanna, P. Ferraro, and J. Allali (2007) Adaptation of string matching algorithms for identification of near-duplicate music documents. Cited by: §1.
  • [24] M. Sie, C. Chiang, H. Yang, and Y. Liu DETECTING and locating plagiarism of music melodies by path exploration over abinary mask. Cited by: §1.
  • [25] (2010) Speech and language processing:an introduction to natural language processing, computational linguistics, and speech recognition. Speech and language processing:an introduction to natural language processing, computational linguistics, and speech recognition. Cited by: §5.3, Table 7.
  • A. Tversky (1988) Features of similarity.. Readings in Cognitive Science 84 (4), pp. 290–302. Cited by: §5.3, Table 7.
  • R. Typke, R. C. Veltkamp, and F. Wiering (2004) Searching notated polyphonic music using transportation distances. In Proceedings of the 12th annual ACM international conference on Multimedia, pp. 128–135. Cited by: §3.4.
  • A. Uitdenbogerd and J. Zobel (1999) Melodic matching techniques for large music databases. pp. 57–66. External Links: Document Cited by: §1.
  • V. Velardo, M. Vallati, and S. Jan (2016) Symbolic melodic similarity: state of the art and future challenges. Computer Music Journal 40 (2), pp. 70–83. Cited by: §1.