Evaluating Recommender System Algorithms for Generating Local Music Playlists

by   Daniel Akimchuk, et al.
Ithaca College

We explore the task of local music recommendation: provide listeners with personalized playlists of relevant tracks by artists who play most of their live events within a small geographic area. Most local artists tend to be obscure, long-tail artists and generally have little or no available user preference data associated with them. This creates a cold-start problem for collaborative filtering-based recommendation algorithms that depend on large amounts of such information to make accurate recommendations. In this paper, we compare the performance of three standard recommender system algorithms (Item-Item Neighborhood (IIN), Alternating Least Squares for Implicit Feedback (ALS), and Bayesian Personalized Ranking (BPR)) on the task of local music recommendation using the Million Playlist Dataset. To do this, we modify the standard evaluation procedure such that the algorithms only rank tracks by local artists for each of the eight different cities. Despite the fact that techniques based on matrix factorization (ALS, BPR) typically perform best on large recommendation tasks, we find that the neighborhood-based approach (IIN) performs best for long-tail local music recommendation.



There are no comments yet.


page 1

page 2

page 3

page 4


SeER: An Explainable Deep Learning MIDI-based Hybrid Song Recommender System

State of the art music recommender systems mainly rely on either Matrix ...

Local Music Event Recommendation with Long Tail Artists

In this paper, we explore the task of local music event recommendation. ...

Evaluating Music Recommendations with Binary Feedback for Multiple Stakeholders

High quality user feedback data is essential to training and evaluating ...

Considering Durations and Replays to Improve Music Recommender Systems

The consumption of music has its specificities in comparison with other ...

Diversifying Music Recommendations

We compare submodular and Jaccard methods to diversify Amazon Music reco...

Trust your neighbors: A comprehensive survey of neighborhood-based methods for recommender systems

Collaborative recommendation approaches based on nearest-neighbors are s...

Music Sequence Prediction with Mixture Hidden Markov Models

Recommendation systems that automatically generate personalized music pl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

If you were to move to a new city and wanted to check out the local music scene, how would you get started? You might ask an expert, such as an employee at a local music store or a barista at a local coffee shop, but they are likely to give you incomplete or biased recommendations based on their own personal experiences and interests. You might also pick up the arts section of the local newspaper or go online to find a community events notice board. Either way, you would be faced with a long list of music events, each of which would only provide a small amount of contextual information such as artist names and perhaps a few genre labels.

Music recommender systems [12] have the potential to offer an alternative to these more traditional methods of exploring the local music scene. However, the most popular music streaming services (e.g. Spotify, Pandora, Apple Music, Deezer) offer little, if any, support of music discovery based on geographic region. For example, if a user wants to find music from a specific location on Spotify, they would have to use the generic text-based search functionality and then dig through playlists with that location’s name in the playlist title or description. Often, even if such playlists exist, they are outdated, not personalized to match the user’s interests, and may not be relevant due to a variety of factors (e.g. cities with common names, playlists with non-local music, etc.).

By contrast, music event recommendation services like BandsInTown111https://www.bandsintown.com and SongKick222https://www.songkick.com/ help users follow artists so that that the user can be notified when a favorite artist will be playing nearby. They also recommend upcoming events with artists who are similar to one or more of the artists that the user has selected to follow. These services have been successful in growing both the number of users and the number of artists and events covered by their service. For example, BandsInTown claims to have 38 million users and lists events for over 430,000 artists333According to https://en.wikipedia.org/wiki/Bandsintown on March 28, 2018.. Event listings are added by aggregating information of ticket sellers (e.g. Ticketmaster444https://www.ticketmaster.com/, TicketFly555https://www.ticketfly.com/) and by managers and booking agents who have the ability to directly upload tour dates for their touring artists to these services.

While this coverage is impressive, a large percentage of the events found in local newspapers are not listed on these commercial music event recommendation services. Many talented artists play at small venues like neighborhood pubs, coffee shops, and DIY shows, and are often not represented by (pro-active, tech-savvy) managers. Yet many music fans enjoy the intimacy of a small venue and a personal connection with local artists, and they may have a hard time discovering these events.

Our long-term goal is to create a locally-focused music recommender system that (1) helps users create personalized playlists that feature relevant music by local artists, and (2) provides users with personalized music event recommendations. A core component of this system is to explore how existing recommender system algorithms perform to the task of local music recommendation. Here we consider a local artist to be an artist or band who resides in and/or plays the majority of their live music events in a small geographic region such as a city (e.g. Liverpool, Seattle) or a neighborhood within a larger city (e.g. Haight-Ashbury in San Francisco, Gràcia in Barcelona).

Figure 1: Long-Tail Distribution for Music Consumption

1.1 Long-tail Recommendation & Popularity Bias

Local music recommendation can be considered a special case of the long-tail music recommendation problem [8, 3, 1] since most local artists are relatively obscure outside of their home cities. The long-tail metaphor [1] comes from the idea that if we order each artist by popularity and plot how many times their music is consumed (i.e. purchased/downloaded/streamed) we would see a rapid drop off (i.e. power-law distribution) such that that a very small fraction of the artists (in the short-head) would receive the majority of the consumption while the overwhelming majority of artists (in the long-tail) receive little or no attention (see Figure 1).

Recommender systems are known to suffer from popularity bias [2, 4]: popular artists are recommended often while obscure, long-tail, artists are rarely recommended, if at all. This creates a feedback loop in which “the rich get richer” and prevents local artists from being discovered by potential fans. Popularity bias is manifested in (commercial) recommender systems due to a combination of conceptual and technical reasons. First, listeners tend to prefer familiar music [9, 6], so it is safer for a music streaming service to recommend popular songs or artists that are more likely to be known to the user. Second, recommender systems that use aggregated user preference data, known as collaborative filtering (CF) systems, suffer the cold-start problem [12, 13]: little or no historical user preference data exists for new or obscure artists. As a result, a CF-based recommender system cannot recommend these artists with sufficient confidence.

In this paper, we explore how existing recommender system algorithms perform on the task of local music recommendation. We formulate this problem as a modification of the automatic playlist continuation task [12] that was the focus of the 2018 ACM RecSys Challenge666https://recsys-challenge.spotify.com/ [5]. Specifically, we evaluate how accurately different recommender systems predict additional tracks for existing playlists, but we limit the additional tracks to be those by artists who are associated with a given city or neighborhood. We consider this formulation to be a case study in how different recommender system algorithms perform at the task of long-tail recommendation.

2 Recommender System Algorithms

Figure 2: Playlist-Track Matrix: We train each recommender system model with the training playlists and evaluate them using the evaluation playlists. We partition each evaluation playlist and use as input to the model. The model then scores each local track (not shown) and evaluate performance by comparing to the ground truth .

In this section we describe three common recommendation algorithms: Item-Item Neighborhood (IIN) Recommendation, Alternating Least Squares (ALS) for Implicit Feedback, and Bayesian Personalized Ranking (BPR). Our main data structure is a Playlist-Track matrix which is akin to a User-Item matrix in standard CF research.

Each of the algorithms described takes as input this matrix like the one shown in Figure 2. The element of this matrix in the -th row and -th column, denoted , reflects the rating of the track in playlist . We consider our data to represent implicit feedback where the value of is 1 if track is found in playlist and 0 otherwise.

2.1 Item-Item Neighborhood Model (IIN)

Neighborhood models are traditional collaborative filtering methods that make recommendations based on the similarities of playlists and/or tracks [15, 11]. Item-Item Neighborhood (IIN) models focus specifically on the similarity of different tracks. They function under the assumption that if a track is similar to the tracks already associated with a playlist, then it is likely to be a successful recommendation.

Given playlist and track , the Item-Item similarity score, , can be calculated via

where is the set of nonzero tracks in playlist , and is the -th column of , containing the ratings from each playlist for track . To recommend tracks for playlist , is calculated for every track and sorted from greatest to least.

2.2 Alternating Least Squares for Implicit Feedback (ALS)

Weighted Regularized Matrix Factorization (WRMF) optimized by Alternating Least Squares (ALS) [14] is one of the most highly-cited and successful recommender system models. For example, it was the model used to win the Netflix Prize [7] in 2009 and was an integral component of the system that recently won the 2018 ACM RecSys Challenge that focused on music playlist continuation [15]. The goal of this algorithm is to map playlists and tracks into a common latent factor space in which they can be compared.

To address the case where can be a value other than 0 or 1, we define

In this case, it also proves helpful to define a confidence value for . While there are many options, Hu et al. [14] suggest using

with hyperparameter


The latent factors for each playlist and for each track , both elements of , are to be computed with the goal that This will be done by minimizing the cost function

Note the term is used for regularization.

This sum has terms, which makes it computationally impractical to use traditional cost minimization, so instead we repeatedly recompute the playlist factor and the track factor . First, to recompute the playlist factors, we define an matrix . Each row of this matrix is the track factor for a given track. We also define an diagonal matrix for each playlist such that

With being the

-dimensional vector of all

, we minimize the cost function with

In a similar manner, at the same time we recompute the track factors by defining an matrix with each row being the playlist factor for a given playlist. We also define a similar diagonal matrix, this time of size , with

With being the -dimensional vector of all , we minimize the cost function with

This is repeated until and stabilize.

We predict the preference of playlist for track via

2.3 Bayesian Personalized Ranking

Traditional methods of training recommendation algorithms assume all non-ranked tracks by a playlist to have a rank of 0. This implies that the “perfect" algorithm would give these tracks a rank of a 0. However, as we want to rank these zero-valued tracks, this isn’t our desired output and is adjusted for using regularization to avoid this overfitting. As described by Rendle et al. [10], Bayesian Personalized Ranking attempts to address this issue without the need for regularization. It defines a new optimization criterion on which to train a model.

BPR functions under the assumption that any track that is in a playlist (any track that has a nonzero rating in ) is preferred by that playlist over a track that is not in the playlist. To formalize this, we define two sets: the set of playlists and the set of tracks , and also the set

If , then , where is the preference structure for playlist .


be the parameters of the underlying learning algorithm (the implementation used in this experiment utilizes matrix factorization). By Bayes’ Law, we know the probability of

being the correct parameter vector given playlist ’s preference structure

It also follows that

Using the underlying learning algorithm, the predicted relationship between tracks and for playlist using the parameters , referred to as , is to be calculated. We assign


is the sigmoid function.

The prior probability of


, is a normal distribution with zero mean and variance-covariance matrix

. As suggested in [10], we use , where is the vector of regularization parameters for the underlying learning algorithm and

is the identity matrix.

Using these identities, the optimization criterion to maximize can be written as the calculable

For a more detailed derivation of this optimization criterion, see [10]. This is maximized using any optimization algorithm, such as gradient descent, and the resulting parameter vector is used with the underlying learning algorithm.

3 Local Music Data

Our first task is to identify a set of local artists for a given city. For the paper, we consider a local artist to be an artist that performs the large majority of their live events close to or within a single city. We collected artist event information from both Ticketfly777http://www.ticketfly.com/ and Facebook888https://www.facebook.com/. Ticketfly provides information about large and mid-sized events while Facebook provides information about smaller niche events that were not listed on Ticketfly. We were able to collect 22,246 unique events at over 3,500 different venues for over 145,000 artists for a span of 3 months999All event data was collected in February 2019.. Of these events, 17,976 events come from Ticketfly and 8,447 from Facebook, with an overlap of 4,177 events between the two sites. We associate an artist as being local to a city if at least 80% of their events were within a 10-mile radius of city center and they were associated with at least 2 events.

For this study, we selected a geographically diverse set of eight cities within the United States. For each city, we create the list of local artists from our music event data and collect the set of tracks by these artists. Finally, we identify all of the playlists from the the Million Playlist Dataset [5] that contains one or more of these tracks. A list of the cities as well as summary statistics about each city can be found in Table 1.

We note that most of the local artists in our study are obscure long-tail artists and tend to have between a few hundreds to a few thousand of monthly listeners on Spotify. This is also reflected in the fact that the sparsity (i.e. percent of zeros) for the columns in the playlist-track matrix associated with the tracks by the local artists is extremely sparse (average of 99.9990% sparse.) This make the task of local music recommendation particularly challenging when we consider that the overall sparsity of is 99.9971%. Put another way, the overall density (percent of non-zero ratings) is about 3 times more dense than the density for local (long-tail) artists.

Atlanta Berkeley Boulder Brooklyn Chicago Los Angeles Nashville Philadelphia Average
Local Playlists 221 1022 110 886 84 375 98 276 384.0
Local Artists 15 41 6 123 12 36 16 39 36.0
Median Monthly Listeners 4,311 1,677 39,609 2,017 84 2,370 1,244 72 6,423
Local Tracks 388 2023 237 1468 260 556 140 519 698.9
Sparsity 99.9991% 99.9990% 99.9983% 99.9989% 99.9997% 99.9978% 99.9991% 99.9997% 99.9990%
Table 1: Summary statistics for each city evaluated.
Atlanta Berkeley Boulder Brooklyn Chicago Los Angeles Nashville Philadelphia Average
Local Tracks 388 2023 237 1468 260 556 140 519 698.875


Item-Item 0.335 (0.041) 0.295 (0.033) 0.268 (0.052) 0.324 (0.056) 0.199 (0.044) 0.339 (0.074) 0.447 (0.103) 0.355 (0.049) 0.324
ALS 0.065 (0.065) 0.046 (0.046) 0.066 (0.066) 0.042 (0.042) 0.057 (0.057) 0.043 (0.043) 0.086 (0.086) 0.036 (0.036) 0.055
BPR 0.036 (0.036) 0.025 (0.025) 0.036 (0.036) 0.026 (0.026) 0.048 (0.048) 0.030 (0.030) 0.038 (0.038) 0.031 (0.031) 0.034
Random 0.177 (0.005) 0.133 (0.002) 0.207 (0.008) 0.131 (0.001) 0.211 (0.013) 0.161 (0.002) 0.208 (0.013) 0.166 (0.005) 0.174
Popular 0.225 (0.010) 0.161 (0.002) 0.248 (0.026) 0.159 (0.003) 0.262 (0.019) 0.179 (0.004) 0.255 (0.011) 0.182 (0.008) 0.209


Item-Item 0.100 (0.017) 0.077 (0.010) 0.099 (0.028) 0.114 (0.017) 0.046 (0.020) 0.091 (0.020) 0.148 (0.054) 0.152 (0.028) 0.103
ALS 0.026 (0.026) 0.012 (0.012) 0.023 (0.023) 0.011 (0.011) 0.005 (0.005) 0.008 (0.008) 0.022 (0.022) 0.001 (0.001) 0.014
BPR 0.004 (0.004) 0.000 (0.000) 0.000 (0.000) 0.001 (0.001) 0.005 (0.005) 0.000 (0.000) 0.004 (0.004) 0.000 (0.000) 0.002
Random 0.008 (0.003) 0.002 (0.000) 0.016 (0.005) 0.001 (0.000) 0.015 (0.007) 0.009 (0.003) 0.015 (0.014) 0.006 (0.002) 0.009
Popular 0.035 (0.012) 0.018 (0.001) 0.046 (0.020) 0.022 (0.003) 0.058 (0.018) 0.013 (0.004) 0.042 (0.012) 0.014 (0.007) 0.031


Item-Item 0.117 (0.039) 0.095 (0.032) 0.094 (0.028) 0.128 (0.010) 0.032 (0.020) 0.090 (0.004) 0.120 (0.046) 0.187 (0.014) 0.108
ALS 0.036 (0.036) 0.020 (0.020) 0.027 (0.027) 0.015 (0.015) 0.000 (0.000) 0.016 (0.016) 0.032 (0.032) 0.004 (0.004) 0.019
BPR 0.005 (0.005) 0.000 (0.000) 0.000 (0.000) 0.001 (0.001) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.001
Random 0.000 (0.000) 0.001 (0.001) 0.027 (0.018) 0.005 (0.003) 0.012 (0.012) 0.011 (0.008) 0.011 (0.011) 0.007 (0.004) 0.009
Popular 0.023 (0.012) 0.015 (0.003) 0.073 (0.031) 0.034 (0.006) 0.035 (0.024) 0.019 (0.007) 0.010 (0.010) 0.018 (0.010) 0.028
Table 2: Evaluation metrics at the track-level for each algorithm.
Atlanta Berkeley Boulder Brooklyn Chicago Los Angeles Nashville Philadelphia Average
Local Artists 15 41 6 123 12 36 16 39 36


Item-Item 0.751 (0.076) 0.725 (0.059) 0.833 (0.086) 0.604 (0.088) 0.652 (0.057) 0.732 (0.128) 0.713 (0.108) 0.686 (0.065) 0.712
ALS 0.793 (0.020) 0.691 (0.005) 0.928 (0.019) 0.453 (0.009) 0.824 (0.048) 0.617 (0.025) 0.732 (0.025) 0.601 (0.019) 0.705
BPR 0.503 (0.006) 0.359 (0.017) 0.699 (0.032) 0.240 (0.006) 0.732 (0.028) 0.407 (0.012) 0.570 (0.030) 0.328 (0.016) 0.480
Random 0.580 (0.011) 0.415 (0.009) 0.690 (0.021) 0.303 (0.007) 0.744 (0.022) 0.443 (0.007) 0.547 (0.032) 0.443 (0.012) 0.521
Popular 0.561 (0.018) 0.421 (0.007) 0.811 (0.042) 0.386 (0.005) 0.759 (0.028) 0.470 (0.011) 0.577 (0.015) 0.458 (0.014) 0.555


Item-Item 0.436 (0.037) 0.377 (0.032) 0.543 (0.048) 0.319 (0.050) 0.340 (0.041) 0.431 (0.079) 0.543 (0.074) 0.427 (0.058) 0.427
ALS 0.284 (0.010) 0.247 (0.007) 0.405 (0.043) 0.127 (0.010) 0.323 (0.067) 0.164 (0.022) 0.364 (0.032) 0.264 (0.022) 0.272
BPR 0.089 (0.015) 0.043 (0.009) 0.186 (0.050) 0.017 (0.005) 0.263 (0.038) 0.050 (0.007) 0.191 (0.042) 0.028 (0.008) 0.108
Random 0.090 (0.017) 0.065 (0.005) 0.188 (0.032) 0.038 (0.007) 0.193 (0.009) 0.071 (0.012) 0.090 (0.008) 0.068 (0.007) 0.100
Popular 0.076 (0.017) 0.069 (0.008) 0.268 (0.051) 0.101 (0.009) 0.268 (0.045) 0.080 (0.010) 0.082 (0.025) 0.086 (0.008) 0.129


Item-Item 0.941 (0.021) 0.803 (0.034) 0.917 (0.043) 0.604 (0.056) 0.985 (0.015) 0.765 (0.086) 1.000 (0.000) 0.769 (0.060) 0.848
ALS 0.389 (0.020) 0.374 (0.012) 0.518 (0.073) 0.177 (0.010) 0.571 (0.061) 0.213 (0.030) 0.451 (0.035) 0.297 (0.035) 0.374
BPR 0.077 (0.031) 0.037 (0.012) 0.273 (0.100) 0.017 (0.007) 0.487 (0.059) 0.013 (0.004) 0.212 (0.045) 0.018 (0.011) 0.142
Random 0.108 (0.013) 0.067 (0.008) 0.227 (0.052) 0.041 (0.006) 0.274 (0.023) 0.067 (0.011) 0.082 (0.020) 0.087 (0.010) 0.119
Popular 0.054 (0.020) 0.053 (0.008) 0.373 (0.075) 0.147 (0.006) 0.487 (0.059) 0.112 (0.016) 0.041 (0.019) 0.073 (0.006) 0.1675
Table 3: Evaluation metrics at the artist-level for each algorithm.

4 Experiments

For each of these cities, we use the following evaluation procedure:

1:foreach city do
2:     foreach fold do
3:         construct and
4:         foreach algorithm do
5:              train model with
6:              foreach playlist  do
7:                  split into and
8:                  use with model to
9:                      predict
10:                  calculate NDCG, R-Precision, Prec@1
11:                      by comparing and
12:                      at track- and artist-levels
13:              end for
14:         end for
15:     end for
16:end for
Algorithm 1 Evaluation Procedure

For each city, we partition the local playlists into five equally sized groups and perform five-fold cross-evaluation. That is, we use each group as the evaluation set once and the other four as part of the training set each time.

Using the Implicit Python library101010https://github.com/benfred/implicit, we calculate Item-Item similarity scores (see Section 2.1) and train both a WRMF model optimized with ALS (see Section 2.2) and a matrix factorization model optimized with BPR (see Section 2.3) using the training set that includes both local and non-local tracks.

For each of the playlists in the evaluation set, we use the non-local tracks

to generate a ranked list of recommendations based on the score from each of our three algorithms (IIN, ALS, BPR). We have also implemented two baselines: a random baseline which randomly shuffles the local tracks, and a popularity baseline where we rank all of the non-local tracks by their respective popularities. The popularity of a track is estimated as the percentage of playlists that the track appears in from the the training set

.We evaluate each of these five ranked lists on their ability to recommend local music at both the track-level and the artist-level, the latter only looking at the first occurrence of a given artist in the list of track recommendations.

For evaluation metrics, we use the two of the three metrics that were used in the ACM RecSys Challenge 2018 for playlist continuation [5]: Normalized Discounted Cumulative Gain (NDCG) and R-Precision (RPrec). NDCG evaluates the entire ranking of all local tracks, weighted such that the top ranked tracks have the greatest importance. The R-Precision for a playlist with relevant local tracks is the percentage of the highest scoring local tracks in that are present in the ground truth playlist . The RecSys Challenge also used a third metric, Clicks, which counted the number of sets of 10 recommended tracks that would be needed before finding the first relevant track. This metric is not appropriate in our setting since we care ranking a much smaller set of tracks (hundreds vs. millions). Instead, we use Precision-at-1 (Prec@1) which measure the accuracy of our top-ranked (i.e. highest scoring) track for each evaluation playlist. By comparision, NDCG reflects the quality of the the entire ranking, RPrec measures the quality of the first few local track recommendations, and Prec@1 is the accuracy of only the top recommendation.

Our final reported evaluation scores in Tables 2 and 3

reflect the averaging of these metrics first over the evaluation playlists in each fold, and then averaged over the five folds. We also report standard error (in parentheses) of each metric over the five folds.

5 Results

As shown in Table 2, the Item-Item Neighborhood model outperforms both baselines (Random, Popularity) and both matrix factorization models (ALS, BPR) in nearly every scenario. The notable exception to this is Chicago, in which the popularity baseline outperformed all other models in all three metrics. This can be explained, however, due to the extremely high sparsity of local tracks. Also, in Chicago’s case, while 260 local tracks were found in 84 playlists, the vast majority these playlists contain the same few tracks, preventing the neighborhood model from providing meaningful similarities. Besides this exception, the Item-Item Neighborhood model is consistently the best model, achieving R-Precision and Precision-at-1 scores an order of magnitude better than the other models.

At the artist-level, shown in Table 3, the neighborhood model once again performed universally better than the other models and baselines when observing RPrec and Prec@1. For many of the cities, Prec@1 was near perfect, and in the case of Nashville achieved a perfect score of 1. When observing NDCG, the WRMF with ALS model outperforms the Item-Item model for half of the cities. Specifically, these cities have the fewest playlists, artists, and tracks. In most of these cases, the performance of the Item-Item model is comparable to that of ALS.

In both cases, BPR performed significantly worse than expected, even frequently scoring worse than the random model. One potential failing point of this algorithm could be the sparsity of the data. Both of the datasets used to evaluate BPR in [10] are much less sparse (less than 99% sparse), which corresponds to a vastly more dense training and recommendation space than the data used in this experiment.

In terms of computation overhead time, the Item-Item Neighborhoood model takes the least amount of time to initialize. We conducted our experiment on a 2017 iMac with 16GB of RAM and an Intel Core i5 processor, and the overhead of calculating the Item-Item similarity scores, for every playlist and track took about 2 minutes. Training the ALS model took about 20 minutes and training the BPR model took about 2 hours. After training, recommendation time was comparable between all three models.

6 Conclusions

We have presented a novel approach for evaluating local (long-tail) music recommendation. That is, by partitioning a large playlist-track matrix into non-local and local (mostly long-tail) tracks, and considering playlists with one or more these local tracks, we can evaluate how different recommender systems perform on this task.

Surprisingly, the Item-Item Neighborhood model performs better than the models based on matrix factorization (ALS, BPR) on the task of local music recommendation. This opposes the results of [14, 10], which show that in general recommendation, ALS and BPR significantly outperform neighborhood models. This may be related to the fact that local (long-tail) music recommendation involves modeling highly sparse data. That is, matrix factorization approaches attempt to optimize parameters to minimize loss of all the ratings. Since the vast majority of the rating are associated with the popular (non-local) tracks in the short-head, these models might not generalize well to local tracks. As a result, a simple Item-Item Neighborhood, which is not susceptible to popularity bias, performs better based on our experiments.

In future work, we plan to explore modification of matrix factorization models that attempt to mitigate popularity bias. For example, we could re-weight the cost function for the WRMF model (Section 2.2) so that the weights are per track rather than per rating. We also plan to develop and deploy a local music recommendation system to compare performance of recommender system algorithm from the perspective of the end user experience. This requires us to evaluate not only recommendation accuracy but also scalability and robustness in a real-time setting.

This project is supported by NSF grant IIS-1615679.


  • [1] Chris Anderson. The long tail: Why the future of business is selling less of more. Hachette Books, 2006.
  • [2] Alejandro Bellogín, Pablo Castells, and Iván Cantador. Statistical biases in information retrieval metrics for recommender systems. Information Retrieval Journal, 20(6):606–634, 2017.
  • [3] Oscar Celma. Music recommendation. In Music recommendation and discovery, pages 43–85. Springer, 2010.
  • [4] Òscar Celma and Pedro Cano. From hits to niches?: or how popular artists can bias music recommendation and discovery. In Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, page 5. ACM, 2008.
  • [5] Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani. Recsys challenge 2018: Automatic music playlist continuation. In Proceedings of the 12th ACM Conference on Recommender Systems, pages 527–528. ACM, 2018.
  • [6] Patrick G Hunter and E Glenn Schellenberg. Interactive effects of personality and frequency of exposure on liking for music. Personality and Individual Differences, 50(2):175–179, 2011.
  • [7] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009.
  • [8] Mark Levy and Klaas Bosteels. Music recommendation and the long tail. In 1st Workshop On Music Recommendation And Discovery (WOMRAD), ACM RecSys, 2010, Barcelona, Spain. Citeseer, 2010.
  • [9] Adrian C North and David J Hargreaves. Subjective complexity, familiarity, and liking for popular music. Psychomusicology: A Journal of Research in Music Cognition, 14(1-2):77, 1995.
  • [10] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In

    Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

    , UAI ’09, pages 452–461, Arlington, Virginia, United States, 2009. AUAI Press.
  • [11] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM.
  • [12] Markus Schedl, Hamed Zamani, Ching-Wei Chen, Yashar Deldjoo, and Mehdi Elahi. Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval, 7(2):95–116, 2018.
  • [13] Douglas Turnbull, Luke Barrington, and Gert RG Lanckriet. Five approaches to collecting tags for music. In ISMIR, volume 8, pages 225–230, 2008.
  • [14] C. Volinsky, Y. Koren, and Y. Hu. Collaborative filtering for implicit feedback datasets. In ICDM 2008. Eighth IEEE International Conference on Data Mining, pages 263–272, Los Alamitos, CA, USA, dec 2008. IEEE Computer Society.
  • [15] Maksims Volkovs, Himanshu Rai, Zhaoyue Cheng, Ga Wu, Yichao Lu, and Scott Sanner. Two-stage model for automatic playlist continuation at scale. In Proceedings of the ACM Recommender Systems Challenge 2018, page 9. ACM, 2018.