Utilizing Human Memory Processes to Model Genre Preferences for Personalized Music Recommendations

03/24/2020 ∙ by Dominik Kowald, et al. ∙ Know-Center GmbH Johannes Kepler University Linz TU Graz 0

In this paper, we introduce a psychology-inspired approach to model and predict the music genre preferences of different groups of users by utilizing human memory processes. These processes describe how humans access information units in their memory by considering the factors of (i) past usage frequency, (ii) past usage recency, and (iii) the current context. Using a publicly available dataset of more than a billion music listening records shared on the music streaming platform Last.fm, we find that our approach provides significantly better prediction accuracy results than various baseline algorithms for all evaluated user groups, i.e., (i) low-mainstream music listeners, (ii) medium-mainstream music listeners, and (iii) high-mainstream music listeners. Furthermore, our approach is based on a simple psychological model, which contributes to the transparency and explainability of the calculated predictions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 6

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Computational models of user preferences are crucial elements of music recommender systems (Schedl et al., 2015) to tailor recommendations to the preferences of the user. Such user models are typically derived from the listening behavior of the users, i.e., their interactions with music artifacts, content features of music (Zangerle and Pichl, 2018), or hybrid combinations of both. Research in music psychology (North and Hargreaves, 2008) has shown that a wide range of factors impact music preferences (Schedl et al., 2015), such as emotional state (Cantor and Zillmann, 1973; Juslin and Sloboda, 2001), a user’s current context (Rentfrow and Gosling, 2003), or a user’s personality (Rentfrow and Gosling, 2003; Schedl et al., 2018a). Several aspects make the modeling of music preferences challenging, such as, e.g., that music consumption is context-dependent and serves various purposes for listeners (Schedl et al., 2018b). Also, recent research (Dominik et al., 2020) has verified that classic music recommendation approaches suffer from popularity bias, i.e., they are biased to the mainstream that is prevalent in a music community. As a result, listeners of non-mainstream music receive less relevant recommendations compared to listeners of popular, mainstream music (Bauer and Schedl, 2019; Schedl and Bauer, 2018, 2017; Oord et al., 2013).

User Group
LowMS 1,000 82,417 931 6,915,352 14,573,028 2.107 85.771 .125 24.582 74%/26%
MedMS 1,000 86,249 933 7,900,726 20,264,870 2.565 126.439 .379 25.352 68%/32%
HighMS 1,000 92,690 973 8,251,022 22,498,370 2.727 186.010 .688 21.486 65%/35%
Table 1. Dataset statistics for the LowMS, MedMS, and HighMS Last.fm user groups. Here, is the number of distinct users, is the number of distinct artists, is the number of distinct genres, is the number of listening events, is the number of genre assignments, is the average number of genre assignments per LE, is the average number of genres a user has listened to, is the average mainstreaminess value, is the average age of users in the group and is the users’ male/female ratio.

In this paper, we introduce a psychology-inspired approach to model and predict the music genre preferences of users. We base our approach on research in music psychology that found music liking being positively influenced by prior exposure to the music (Pereira et al., 2011; Schubert, 2007). This has been attributed to the mere exposure effect or familiarity principle (Zajonc, 1968), i.e., users tend to establish positive preferences for items to which they are frequently and consistently exposed. Our idea is to computationally model prior exposure to music genres using the activation equation of human memory from the cognitive architecture Adaptive Control of Thought–Rational (ACT-R) (Anderson and Schooler, 1991; Anderson et al., 2004). The activation equation determines the usefulness of a memory unit (i.e., its activation) for a user in the current context, based on how frequently and recently a user accessed it in the past as well as how important this unit is in the current context. In our previous work, we have employed a specific part of the activation equation, namely the Base-Level-Learning (BLL) equation, to recommend music artists (Kowald et al., 2019). The BLL equation computes the base-level activation of a memory unit based on how frequently and recently a user has accessed it in the past, following a time-dependent decay in the form of a power-law distribution. A high base-level activation means that the memory unit is vital for the user and, thus, can be more easily retrieved from her memory. However, in this work (Kowald et al., 2019), we did not implement the full activation equation as we left out the associative activation that tunes the base-level activation of the memory unit to the current context.

In the present paper, we extend our previous model and utilize the associative activation for music genre predictions. This helps us tune the predictions to the current context of the user. As the current context, we utilize the set of genres that are assigned to the most recently listened artist of a user. On a publicly available dataset of Last.fm music listening histories, we model the genre preferences of users from three different groups, which we extract using behavioral data in the form of music listening events: (i) LowMS, i.e., listeners of niche music (low mainstreaminess), (ii) HighMS, i.e., listeners of mainstream music (high mainstreaminess), and (iii) MedMS, i.e., listeners of music that lies in-between (medium mainstreaminess). We introduce the approach that employs the full activation equation to take into account the current context of the user, which we define as the user’s current genre preference. We compare the efficacy of to a variant, i.e., , that uses only the BLL equation to model the past usage frequency (i.e., popularity) and recency (i.e., time). Furthermore, we compare both approaches to five baselines, including two collaborative filtering variants, mainstream-aware genre modeling, popularity-aware genre modeling, as well as time-based genre modeling.

The contributions of our work are two-fold. Firstly, we propose , as an extension to , to model and predict the genre preferences of users. Secondly, we evaluate the efficacy of both and on three different groups of Last.fm users, which we separate based on the distance of their listening behavior to the mainstream: (i) LowMS, (ii) MedMS, and (iii) HighMS. We find that both and outperform the five baseline methods in all three groups, with achieving the significantly highest performance. Our results also show that with both and , we can specifically improve the prediction performance for the users in the LowMS group. In other words, we can serve better the music consumers, whose prediction quality suffers the most from popularity bias. Also, both and are based on a psychological theory, whose computational model is transparent and explainable and not a black box.

2. Data and Approach

In this section, we describe the Last.fm dataset as well as our music genre modeling and prediction approaches.

2.1. Dataset

In this paper, we use the publicly available LFM-1b dataset111http://www.cp.jku.at/datasets/LFM-1b/ of music listening information shared by users of the online music platform Last.fm. LFM-1b contains listening histories of more than 120,000 users, which sums up to over 1.1 billion listening events (LEs) collected between January 2005 and August 2014. Each LE contains a user identifier, the artist, the album, the track name, and a timestamp (Schedl, 2016). Furthermore, the LFM-1b dataset contains demographic data of the users such as country, age, gender, and a mainstreaminess score, which is defined as the overlap between a user’s personal listening history and the aggregated listening history of all Last.fm users in the dataset. Thus, the mainstreaminess score reflects a user’s inclination to music listened to by the Last.fm mainstream listeners (i.e., the “average” Last.fm listener) (Schedl and Hauger, 2015).

User groups. In order to study different types of users, we use this mainstreaminess score to split the LFM-1b dataset into three equally sized user groups based on their mainstreaminess (i.e., low, medium, and high). Specifically, we sort all users based on their mainstreaminess score and assign the 1,000 users with the lowest scores to the low-mainstream group (i.e., LowMS), the 1,000 users with scores around the median mainstreaminess (= .379) to the medium-mainstream group (i.e., MedMS), and the 1,000 users with the highest scores to the high-mainstream group (i.e., HighMS).

In our study, we consider only users with a minimum of 6,000 and a maximum of 12,000 LEs. We choose these thresholds based on the average number of LEs per user in the dataset, which is 9,043, as well as the kernel density distribution of the data. With this method, on the one hand, we exclude users with too little data available for training our algorithms (i.e., users with less than 6,000 LEs), and on the other hand, we exclude so-called power listeners (i.e., users with more than 12,000 LEs) that might distort our results. Table 1 summarizes the statistics and characteristics of our three user groups. We see that, even if we only consider 1,000 users per group, we have a sufficient amount of LEs, i.e., between 6.9 to 8.3 million, to train and test our music genre modeling and prediction approaches. Further characteristics of our user groups are as follows:

(i) LowMS. The LowMS group represents the  = 1,000 users with the smallest mainstreaminess scores. These users have an average mainstreaminess value of  = .125. LowMS contains  = 82,417 distinct artists,  = 6,915,352 listening events,  = 931 genres, and  = 14,573,028 genre assignments. Interestingly, the male/female ratio is the least evenly distributed one in this group (i.e.,  = 74%/26%).

(ii) MedMS. The MedMS group consists of the  = 1,000 users with mainstreaminess scores around the median and thus, lying between the ones of the LowMS and HighMS groups. This group has an average mainstreaminess value of  = .379. The majority of dataset statistics of this group lies between the ones of the LowMS and HighMS users, except for the average age, which is the highest for the MedMS users (i.e.,  = 25.352 years).

(iii) HighMS. The HighMS group represents the  = 1,000 users in the LFM-1b dataset with the highest mainstreaminess scores ( = .688). These users are not only the youngest ones (i.e.,  = 21.486 years) but also listen to the highest number of distinct genres on average (i.e.,  = 186.010), indicating that music which is considered mainstream is quite diverse on Last.fm. Also, this user group exhibits the largest number of female listeners (i.e.,  = 65%/35%) and the highest number of distinct genres ( = 973).

Additionally, we investigate the most frequent countries of the users. Here, for all three groups, the United States (US) is the dominating country. The share of US users increases with the mainstreaminess, i.e., while this share is only 14% for LowMS and 18% for MedMS, it is already 22% for HighMS. Interestingly, Russia (RU, 13%), Poland (PL, 9%), and Japan (JP, 8%) are frequent in the LowMS group, while the United Kingdom (UK) contributes a substantial share in the other two groups (9% for MedMS and 14% for HighMS). Germany (DE) is among the most popular countries in all three groups (10% for LowMS and HighMS, 8% for MedMS); Brazil (BR) can only be found among the most popular countries in the MedMS group (8%); and the Netherlands (NL, 5%) as well as Spain (ES, 4%) can only be found in the HighMS group.

Genre mapping. For mapping music genres to artists, we use an extension of the LFM-1b dataset, namely the LFM-1b UGP dataset (Schedl and Ferwerda, 2017), which describes the genres of an artist by leveraging social tags assigned by Last.fm users. Specifically, LFM-1b UGP contains a weighted mapping of 1,998 music genres available in the online database Freebase222https://developers.google.com/freebase/ (no longer maintained) to Last.fm artists. This database includes a fine-grained representation of musical styles, including genres such as “Progressive Psytrance” or “Pagan Black Metal”.

The genre weightings for any given artist correspond to the relative frequency of tags assigned to that artist in Last.fm. For example, for the artist “Metallica”, the top tags and their corresponding relative frequencies are “thrash metal” (1.0), “metal” (.91), “heavy metal” (.74), “hard rock” (.41), “rock” (.34), and “seen live” (.3). From this list, we remove all tags that are not part of the 1,998 Freebase genres (i.e., “seen live” in our example) as well as all tags with a relative frequency smaller than .5 (i.e., “hard rock” and “rock” in our example). Thus, for “Metallica”, we end up with three genres, i.e., “thrash metal”, “metal” and “heavy metal”.

2.2. Approach

In this section, we describe our music genre modeling and prediction approach based on the declarative memory module of ACT-R.

2.2.1. The Cognitive Architecture ACT-R

ACT-R, which is short for “Adaptive Control of Thought – Rational”, is a cognitive architecture developed by John Robert Anderson (Anderson et al., 2004). ACT-R defines and formalizes the basic cognitive operations of the human mind (e.g., access to information in human memory).

Figure 1 schematically illustrates the main architecture of ACT-R. In general, ACT-R differs between short-term memory modules, such as the working memory module, and long-term memory modules, such as the declarative and procedural memory modules. Using a sensory register (i.e., the ultra-short-term memory), the encoded information is passed to the short-term working memory module, which interacts with the long-term memory modules. In the case of the declarative memory, the encoded information can be stored, and already stored information can be retrieved. In the case of the procedural memory, the information can be matched against stored rules that can lead to actions (Wheeler, 2014).

Figure 1. Schematic illustration of ACT-R. In our work, we focus on the activation equation of the declarative memory module.
(a)
(b)
(c)
Figure 2. Calculation of the BLL equation’s parameter. On a log-log scale, we plot the relistening count of the genres over the time since their last LEs. We set to the slopes

of the corresponding linear regression lines.

Thus, declarative memory holds factual knowledge (e.g., what something is), and procedural memory consists of sequences of actions (e.g., how to do something). In our work, we focus on the declarative part, which contains the activation equation of human memory. The activation equation determines the usefulness, i.e., the activation level

, of a memory unit (e.g., a music genre in our case) for a user in the current context. It is given by:

(1)

Here, the component represents the base-level activation and quantifies the general usefulness of the unit by considering how frequently and recently it has been used in the past. It is given by the base-level learning (BLL) equation:

(2)

where is the frequency of ’s occurrences and is the time since the occurrence of . The exponent accounts for the power-law of forgetting, which means that each unit’s activation level caused by the occurrence decreases in time according to a power function (Anderson et al., 2004).

The second component of Equation 1 represents the associative activation that tunes the base-level activation of the unit to the current context. The context is given by any contextual element that is relevant for the current situation. In the case of a music recommender system, that could be a music genre that the user prefers in the current situation. Through learned associations, the contextual elements are connected with and can increase ’s activation depending on the weight and the strength of association .

2.2.2. Modeling and Predicting Music Genre Preferences

For modeling and predicting music genre preferences, we investigate two approaches: (i) based on the BLL equation to model the past usage frequency (i.e., popularity) and recency (i.e., time), and (ii) based on the full activation equation to also take the current context into account.

We start with and thus, with defining the base-level activation for genre and user by utilizing the previously defined BLL equation:

(3)

Here, is a genre user has listened to in the past, and is the number of times has listened to . Further, is the time in seconds since the th LE of by , and is the power-law decay factor, which we identify using a similar method as used in (Kowald et al., 2017b). Thus, in Figure 2, for all LEs and genres in our dataset, we plot the relistening count of a genre over the time since the last LE of . Then, we set to the slope of the linear regression lines of this data, which leads to 1.480 for LowMS, 1.574 for MedMS, and 1.587 for HighMS.

The resulting base-level activation values are then normalized using a simple softmax function in order to map them onto a range of [0, 1] that sums up to 1 (Kowald et al., 2017b; Kowald and Lex, 2016):

(4)

Here, is the set of distinct genres listened to by . Finally, predicts the top- genres with highest values to :

(5)

To investigate not only the factors of frequency and time but also the current context by means of an associative activation, we implement the full activation equation (see Equation 1) in the form of:

(6)

where the first part represents the base-level activation by means of the BLL equation and the second part represents the associative activation.

Figure 3. Example illustrating the difference between (left panel) and (right panel). Here, unfilled nodes represent target genres and , and black nodes represent genres of the last artist listened to by the target user (i.e., contextual genres). For and , the node sizes represent the activation levels and for the contextual genres, the node sizes represent the attentional weights . The association strength is represented by the edge lengths. While determines a higher activation level for than for , gives a higher activation level to than to by also considering the associative association based on the current context.

To calculate the associative activation and thus, to model a user’s current context, we incorporate the set of genres assigned to the most recently listened to artist by user . When applying this equation in the context of recommender systems, related literature (Van Maanen and Marewski, 2009) suggests using a measure of normalized co-occurrence to represent the strength of an association . Accordingly, we define the co-occurrence between two genres as the number of artists to which both genres are assigned. We normalize this co-occurrence value according to the Jaccard coefficient:

(7)

where is the set of artists to which context-genre is assigned, and is the set of artists to which genre is assigned. Thus, we set the number of times two genres co-occur into relation with the number of times in which at least one of the two genres appears. In this work, we set the attentional weight of context-genre to 1. By doing so, we give equal weights to all genres assigned to an artist, which avoids down-ranking of less popular, but perhaps more specific, and hence more valuable, genres.

Finally, we normalize the values using the aforementioned softmax function and predict the top- genres with highest values for a given user and the genres of the user’s most recently listened artist (i.e., the current context):

(8)

We further illustrate the difference between and in the example of Figure 3 by showing the additional impact of the associative activation defined by the second component of the activation equation. As defined, this associative activation is evoked by the current context (i.e., the genres of the last artist the target user has listened to).

The left panel of Figure 3 shows two genres, and , with different base-level activation levels (illustrated by the circle size). Thus, according to , reaches a higher base-level activation, which means a better rank, than . This relationship changes in the right panel of Figure 3, where we consider the influence of the genres in the current context (illustrated by the black nodes). Specifically, depending on the weights (represented by the size of the black nodes) and strength of association (represented by the length of the edges), the genres in the current context spread additional associative activation to the genres and . Now, according to , receives stronger associative activation than , which also leads to a better rank.

3. Experiments and Results

In this section, we describe our experimental setup, i.e., the baseline algorithms, the evaluation protocol and metrics, as well as the results of our experiments.

3.1. Baseline Algorithms

We compare the and approaches to five baseline algorithms:

Mainstream-based baseline: . The approach models a user ’s music genre preferences using the overall top- genres of all users (i.e., the mainstream) in ’s user group (i.e., LowMS, MedMS, HighMS). This is given by:

(9)

Here denotes the set of predicted genres, the set of all genres, and corresponds to the number of times occurs in all genre assignments of ’s user group.

User-based collaborative filtering baseline: . User-based collaborative filtering-based approaches aim to find similar users for target user (i.e., the set of neighbors ) and predict the genres these similar users have listened to in the past (Shi et al., 2014). is given by:

(10)

where denotes the set of predicted genres for user , are the genres listened to by the set of neighbors ,333We set the neighborhood size for and to 20.

is the cosine similarity between the genre distributions of user

and neighbor . Finally, indicates how often has listened to genre in the past.

Item-based collaborative filtering baseline: . Similar to , is a collaborative filtering-based approach, but instead of finding similar users for the target user , it aims to find similar items, i.e., music artists , for the artists that has listened to in the past. Then, it predicts the genres that are assigned to these similar artists as given by:

(11)

where are the genres assigned to the similar artists , is the set of similar artists for an artist ,444For , we consider the set of the 20 artists that has listened to most frequently. and is the cosine similarity between the genre distributions assigned to and the genres assigned to a similar artist .

Popularity-based baseline: . is a personalized music genre modeling technique, which predicts the most frequently listened genres in the listening history of user . is given by the following equation:

(12)

Here, is the set of genres has listened to in the past and denotes the number of times has listened to . Thus, it ranks the genres has listened to in the past by popularity.

Time-based baseline: . The time-based baseline predicts the genres that user has most recently listened to. It is given by:

(13)

where is the time since the last (i.e., the th) LE of by .

User group Evaluation metric
LowMS .108 .311 .341 .356 .368 .397 .485
.101 .389 .425 .443 .445 .492 .626
.112 .461 .505 .533 .550 .601 .785
.180 .541 .590 .618 .625 .679 .824
MedMS .196 .271 .284 .292 .293 .338 .502
.146 .248 .264 .274 .272 .320 .511
.187 .319 .336 .351 .365 .419 .705
.277 .419 .441 .460 .452 .523 .753
HighMS .247 .273 .266 .282 .228 .304 .427
.188 .232 .229 .242 .201 .266 .412
.246 .304 .298 .314 .267 .348 .569
.354 .413 .402 .429 .357 .462 .642
Table 2. Genre prediction accuracy results comparing our and approaches with a mainstream-based baseline (), a user-based collaborative filtering baseline (), an item-based collaborative filtering baseline (), a popularity-based baseline () and a time-based baseline (). For all three user groups (i.e., LowMS, MedMS, and HighMS),

outperforms all other approaches. According to a t-test with

 = .001, “” indicates statistically significant differences between and all other approaches.
(a) User group: LowMS
(b) User group: MedMS
(c) User group: HighMS
Figure 4. Recall/precision plots for predicted genres of the baselines and our and approaches for the three user groups LowMS, MedMS, and HighMS. achieves the best results in all settings.

3.2. Evaluation Protocol and Metrics

We split the datasets into train and test sets (Cremonesi et al., 2008). In doing so, we ensure that our evaluation protocol preserves the temporal order of the LEs, which simulates a real-world scenario in which we predict genres of future LEs based on past ones and not the other way round (Kowald et al., 2017b). This also means that a classic -fold cross-validation evaluation protocol is not useful in our setting.

Specifically, we put the most recent 1% of the LEs of each user into the test set (i.e., ) and keep the remaining LEs for the train set (i.e., ). We do not use a classic 80/20 split as the number of LEs per user is large (i.e., on average, 7,689 LEs per user). Although we only use the most recent 1% of listening events per user, this process leads to three large test sets with 69,153 listening events for LowMS, 79,007 listening events for MedMS, and 82,510 listening events for HighMS. To finally measure the prediction quality of the approaches, we use the following six well-established performance metrics (Baeza-Yates et al., 2011):

Recall: . Recall is calculated as the number of correctly predicted genres divided by the number of relevant genres taken from the LEs in the test set . It is a measure for the completeness of the predictions and is formally given by:

(14)

where denotes the predicted genres and the set of relevant genres of an artist in user ’s LEs in the test set.

Precision: . Precision is calculated as the number of correctly predicted genres divided by the number of predictions and is a measure for the accuracy of the predictions. It is given by:

(15)

We report recall and precision for predicted genres in form of recall/precision plots.

F1-score: .

F1-score is the harmonic mean of recall and precision:

(16)

We report the F1-score for  = 5, where it typically reaches its highest value if 10 genres are predicted.

Mean Reciprocal Rank: MRR@k. MRR is the average of reciprocal ranks of all relevant genres in the list of predicted genres:

(17)

This means that a high MRR is achieved if relevant genres occur at the beginning of the predicted genre list.

Mean Average Precision: MAP@k. MAP is an extension of the precision metric by also taking the ranking of the correctly predicted genres into account and is given by:

(18)

Here, is 1 if the predicted genre at position is among the relevant genres (0 otherwise) and is the precision calculated at position according to Equation 15.

Normalized Discounted Cumulative Gain: nDCG@k. nDCG is another ranking-dependent metric. It is based on the Discounted Cumulative Gain () measure (Järvelin et al., 2008), which is defined as:

(19)

where is 1 if the genre predicted for the item is relevant (0 otherwise). is given as divided by , which is the highest possible DCG value that can be achieved if all relevant genres are predicted in the correct order:

(20)

We report MRR, MAP, and nDCG for predicted music genres, where these metrics reach their highest values.

3.3. Results and Discussion

In this section, we present and discuss our evaluation results. The accuracy results according to , , , and are shown in Table 2 for the five baseline approaches as well as the proposed and algorithms. Furthermore, we provide recall/precision plots for predicted genres.

Accuracy of baseline approaches. When analyzing the performance of the baseline approaches , , , , and , we see a clear difference between the non personalized and the personalized algorithms. While the non personalized approach, which predicts the top- genres of the mainstream, provides better accuracy results in the HighMS setting than in the LowMS setting, the personalized , , , and algorithms provide better results in the LowMS setting than in the HighMS setting. Hence, personalized genre modeling approaches provide better results, the lower the mainstreaminess of the users. Non-personalized genre modeling approaches, however, have higher performance, the higher the mainstreaminess of the users.

Next, we compare the accuracy of the two collaborative filtering-based methods, , and . Here, the item-based CF variant

reaches higher accuracy estimates in the LowMS and MedMS settings, while the user-based CF variant

provides better performance in the HighMS setting. To better understand this pattern of results, we provide the average pairwise user similarity in the form of boxplots in Figure 5

. Here, for all three user groups, we calculate the pairwise similarity between the users via the cosine similarity metric based on the users’ genre distribution vectors. We see that users in the HighMS setting are very similar to each other, which explains the good performance of an algorithm that is based on user similarities, such as

.

and reach the highest accuracy estimates among the five baseline approaches. Interestingly, the popularity-based algorithm provides the best results for the HighMS user group, while the time-based algorithm provides the best results in the LowMS user group. For the MedMS user group, however, both algorithms reach a comparable accuracy performance, which shows the importance of both factors, frequency (i.e., popularity) and recency (i.e., time).

Figure 5. Average pairwise user similarity for LowMS, MedMS, and HighMS. We calculate the user similarity using the cosine similarity metric based on the users’ genre distributions. While users in the LowMS group show a very individual listening behavior, users in the HighMS group tend to listen to similar music genres.

Accuracy of and . We discuss the results of the and approaches, which utilize human memory processes as defined by the cognitive architecture ACT-R in order to model and predict music genre preferences. Specifically, combines the factors of past usage frequency and recency via the BLL equation (see Equation 3) and extends by also considering the current context via the activation equation (see Equation 6). In this work, we define the current context by the genres assigned to the artist that the target user has listened to most recently.

Figure 6. Recall/precision plot of our approach for predicted genres for the three user groups LowMS, MedMS, and HighMS. We observe good prediction accuracy results for in all settings but especially for LowMS. This shows that our approach based on human memory processes is especially useful for predicting the music genre preferences of users with low interest in mainstream music.

As expected, when combining the factors of past usage frequency and recency in the form of , we can outperform the best performing baseline approaches and in all three settings (i.e., LowMS, MedMS, and HighMS). We can further improve the accuracy performance when we additionally consider the current context in the form of . Here, we reach a statistically significant improvement555According to a t-test with = .001. over all other approaches across all evaluation metrics and user groups. Furthermore, in Figure 6, we present a recall/precision plot showing the accuracy of for predicted genres for LowMS, MedMS, and HighMS. We observe good results for all three user groups but especially in the LowMS setting, in which we are faced with users with a low interest in mainstream music.

This shows that the proposed

algorithm can provide accurate predictions of music genres listened to in the future for all user groups and, thus, treats all users in our experiment in a fair manner. Moreover, since our approach utilizes human memory processes, it is based on psychological principles of human intelligence rather than artificial intelligence. We believe that this theoretical underpinning contributes to the explanation effectiveness of our approach as we can fully understand why a specific genre was predicted for a target user in a given context. To further illustrate this with an example, we would like to refer back to Figure 

3. In this figure, we have shown the differences between and for two predicted genres and . Let us assume that these are the top- predicted genres for a target user . According to , we know that these genres got the highest activation levels because has listened to them very frequently and recently. When looking at the activation levels calculated by , we also take the current context into account and, thus, get an indication for the similarity of and to the genres assigned to the most recently listened artist of user . In our example, genre is strongly related to the current context, while genre only has a weak relation to it. Taken together, with our approach, we can easily explain genre prediction results according to three simple factors that are relevant for human memory processes according to the cognitive architecture ACT-R: (i) past usage frequency, (ii) past usage recency, and (iii) similarity to current context.

4. Conclusion and Future Work

In this paper, we presented and , two music genre preference modeling, and prediction approaches based on the human memory module of the cognitive architecture ACT-R. While utilizes the BLL equation of ACT-R in order to model the factors of past usage frequency (i.e., popularity) and recency (i.e., time), integrates the activation equation of ACT-R to also incorporate the current context. We defined this context as the genres assigned to the most recently listened artist of the target user.

Using a dataset gathered from the music platform Last.fm, we evaluated and against a mainstream-based approach , a user-based CF approach , an item-based CF approach , a popularity-based approach as well as a time-based approach . We used six evaluation metrics (i.e., recall, precision, F1-score, MRR, MAP, and nDCG) in three evaluation settings in which the evaluated users differed in terms of their inclination to mainstream music (i.e., LowMS, MedMS, and HighMS user groups). Our evaluation results show that both and outperform the five baseline methods in all three settings; even does so in a statistically significant manner. Furthermore, we find that especially the current context is of high importance when aiming for accurate genre predictions.

Summed up, in this work, we have shown that human memory processes in the form of ACT-R’s activation equation can be effectively utilized for modeling and predicting music genres. By following such a psychology-inspired approach, we also believe that we can model a user’s preferences transparently, in contrast to, e.g., deep learning-based approaches based on latent user representations. Therefore, our approach could be useful to realize more transparent and explainable music recommender systems.

Limitations and future work. In the present work, we only considered the genres assigned to the most recently listened artist of the target user as contextual information. However, related work on music preference modeling has shown that music listening habits depend on the time of the day, the current activity of a user or the mood a user is currently experiencing (see, e.g., (Knees et al., 2019)).

For future work, we also plan to utilize the procedural memory processes of ACT-R in addition to the activation equation. As, for instance, done in the SNIF-ACT model (Pirolli and Fu, 2003; Fu and Pirolli, 2007), we could define so-called production rules in order to transfer the user’s preferences into actual music recommendation strategies. By making these rules transparent to the user, we aim to contribute to research on transparent recommender systems that create explainable recommendations.

Reproducibility. To foster the reproducibility of our research, we use the publicly available LFM-1b dataset (see Section 2). Furthermore, we provide the source code of our approach as part of our TagRec framework (Kowald et al., 2017a).

References

  • (1)
  • Anderson et al. (2004) John R Anderson, Daniel Bothell, Michael D Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. 2004. An integrated theory of the mind. Psychological review 111, 4 (2004), 25 pages.
  • Anderson and Schooler (1991) John R Anderson and Lael J Schooler. 1991. Reflections of the environment in memory. Psychological science 2, 6 (1991), 396–408.
  • Baeza-Yates et al. (2011) Ricardo Baeza-Yates, Berthier de Araújo Neto Ribeiro, et al. 2011. Modern Information Retrieval. New York: ACM Press; Harlow, England: Addison-Wesley.
  • Bauer and Schedl (2019) Christine Bauer and Markus Schedl. 2019. Global and country-specific mainstreaminess measures: Definitions, analysis, and usage for improving personalized music recommendation systems. PloS one 14, 6 (2019), e0217389.
  • Cantor and Zillmann (1973) Joanne R Cantor and Dolf Zillmann. 1973. The effect of affective state and emotional arousal on music appreciation. The Journal of General Psychology 89, 1 (1973), 97–108.
  • Cremonesi et al. (2008) Paolo Cremonesi, Roberto Turrin, Eugenio Lentini, and Matteo Matteucci. 2008. An Evaluation Methodology for Collaborative Recommender Systems. In Proceedings of AXMEDIS’2008. IEEE Computer Society, Washington, DC, USA, 224–231. https://doi.org/10.1109/AXMEDIS.2008.13
  • Dominik et al. (2020) Kowald Dominik, Schedl Markus, and Lex Elisabeth. 2020. The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study. In Proceedings of the 42nd European Conference on Information Retrieval.
  • Fu and Pirolli (2007) Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A cognitive model of user navigation on the World Wide Web. Human–Computer Interaction 22, 4 (2007), 355–412.
  • Järvelin et al. (2008) Kalervo Järvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings of ECIR’2008. Springer, 4–15.
  • Juslin and Sloboda (2001) Patrik N Juslin and John A Sloboda. 2001. Music and emotion: Theory and research. Oxford University Press.
  • Knees et al. (2019) P. Knees, M. Schedl, B. Ferwerda, and A. Laplante. 2019. User Awareness in Music Recommender Systems. In Mirjam Augstein, Eelco Herder, Wolfgang Wörndl (eds.), Personalized Human-Computer Interaction. DeGruyter.
  • Kowald et al. (2017a) Dominik Kowald, Simone Kopeinik, and Elisabeth Lex. 2017a. The tagrec framework as a toolkit for the development of tag-based recommender systems. In Adjunct Publication of UMAP’2017. ACM, 23–28.
  • Kowald and Lex (2016) Dominik Kowald and Elisabeth Lex. 2016. The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems. In Proceedings of Hypertext’2016 (Halifax, Nova Scotia, Canada). ACM, New York, NY, USA, 237–242.
  • Kowald et al. (2019) Dominik Kowald, Elisabeth Lex, and Markus Schedl. 2019. Modeling Artist Preferences for Personalized Music Recommendations. In Proc. of ISMIR ’19.
  • Kowald et al. (2017b) Dominik Kowald, Subhash Chandra Pujari, and Elisabeth Lex. 2017b. Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach. In Proceedings of WWW’2017. ACM, 10 pages.
  • North and Hargreaves (2008) Adrian North and David Hargreaves. 2008. The social and applied psychology of music. OUP Oxford.
  • Oord et al. (2013) Aäron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-based Music Recommendation. In Proceedings of NIPS’2013 (Lake Tahoe, Nevada). Curran Associates Inc., USA, 2643–2651.
  • Pereira et al. (2011) Carlos Silva Pereira, João Teixeira, Patrícia Figueiredo, João Xavier, São Luís Castro, and Elvira Brattico. 2011. Music and emotions in the brain: familiarity matters. PloS one 6, 11 (2011), e27241.
  • Pirolli and Fu (2003) Peter Pirolli and Wai-Tat Fu. 2003. SNIF-ACT: A model of information foraging on the World Wide Web. In International Conference on User Modeling. Springer, 45–54.
  • Rentfrow and Gosling (2003) Peter J Rentfrow and Samuel D Gosling. 2003. The do re mi’s of everyday life: the structure and personality correlates of music preferences. Journal of personality and social psychology 84, 6 (2003), 21 pages.
  • Schedl (2016) Markus Schedl. 2016. The LFM-1b Dataset for Music Retrieval and Recommendation. In Proceedings of the 2016 Conference on Multimedia Retrieval. ACM, 103–110.
  • Schedl and Bauer (2017) Markus Schedl and Christine Bauer. 2017. Distance-and Rank-based Music Mainstreaminess Measurement. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, 364–367.
  • Schedl and Bauer (2018) Markus Schedl and Christine Bauer. 2018. An Analysis of Global and Regional Mainstreaminess for Personalized Music Recommender Systems. Journal of Mobile Multimedia 14 (2018), 95–112.
  • Schedl and Ferwerda (2017) Markus Schedl and Bruce Ferwerda. 2017. Large-scale Analysis of Group-specific Music Genre Taste From Collaborative Tags. In Proceedings of ISM’2017. IEEE, 479–482.
  • Schedl et al. (2018a) Markus Schedl, Emilia Gómez, Erika Trent, Marko Tkalčič, Hamid Eghbal-Zadeh, and Agustín Martorell. 2018a. On the Interrelation between Listener Characteristics and the Perception of Emotions in Classical Orchestra Music. IEEE Transactions on Affective Computing 9 (2018), 507–525. Issue 4.
  • Schedl and Hauger (2015) Markus Schedl and David Hauger. 2015. Tailoring music recommendations to users by considering diversity, mainstreaminess, and novelty. In Proceedings of SIGIR’2015. ACM, 947–950.
  • Schedl et al. (2015) Markus Schedl, Peter Knees, Brian McFee, Dmitry Bogdanov, and Marius Kaminskas. 2015. Music recommender systems. In Recommender systems handbook. Springer, 453–492.
  • Schedl et al. (2018b) Markus Schedl, Hamed Zamani, Ching-Wei Chen, Yashar Deldjoo, and Mehdi Elahi. 2018b. Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval 7, 2 (01 Jun 2018), 95–116.
  • Schubert (2007) Emery Schubert. 2007. The influence of emotion, locus of emotion and familiarity upon preference in music. Psychology of Music 35, 3 (2007), 499–515.
  • Shi et al. (2014) Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative Filtering Beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges. Comput. Surveys 47, 1, Article 3 (May 2014), 45 pages.
  • Van Maanen and Marewski (2009) Leendert Van Maanen and Julian N Marewski. 2009. Recommender systems for literature selection: A competition between decision making and memory models. In Proceedings of the 31st Annual Conference of the Cognitive Science Society. 2914–2919.
  • Wheeler (2014) Steve Wheeler. 2014. Learning Theories: Adaptive Control of Thought. [Online under http://www.teachthought.com/learning/theory-cognitive-architecture/; accessed 19-December-2019].
  • Zajonc (1968) Robert B Zajonc. 1968. Attitudinal effects of mere exposure. Journal of personality and social psychology 9, 2p2 (1968), 1.
  • Zangerle and Pichl (2018) Eva Zangerle and Martin Pichl. 2018. Content-based User Models: Modeling the Many Faces of Musical Preference. In 19th International Society for Music Information Retrieval Conference.