Attention-based Group Recommendation

Recommender systems are widely used in big information-based companies such as Google, Twitter, LinkedIn, and Netflix. A recommender system deals with the problem of information overload by filtering important information fragments according to users' preferences. However, most traditional recommendation techniques have limitations. In light of the increasing success of deep learning, recent studies have proved the benefits of using deep learning in various recommendation tasks. Recommendation architectures have been utilizing deep learning in order to overcome limitations of traditional recommendation techniques. We propose an extension of deep learning to solve the group recommendation problem. On the one hand, as different individual preferences in a group necessitate preference trade-offs in making group recommendations, it is essential that the recommendation model can discover substitutes among user behaviors. On the other hand, it has been observed that a user as an individual and as a group member behaves differently. To tackle such problems, we propose using an attention mechanism to capture the impact of each user in a group. Specifically, our model automatically learns the influence weight of each user in a group and recommends items to the group based on its members' weighted preferences. We conduct extensive experiments on four datasets. Our model significantly outperforms baseline methods and shows promising results in applying deep learning to the group recommendation problem.



There are no comments yet.


page 12


An Intelligent Group Event Recommendation System in Social networks

The importance of contexts has been widely recognized in recommender sys...

Auto-detecting groups based on textual similarity for group recommendations

In general, recommender systems are designed to provide personalized ite...

Overcoming Data Sparsity in Group Recommendation

It has been an important task for recommender systems to suggest satisfy...

Real-time Attention Based Look-alike Model for Recommender System

Recently, deep learning models play more and more important roles in con...

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

This paper proposes a recommender system to alleviate the cold-start pro...

Social Influence-based Attentive Mavens Mining and Aggregative Representation Learning for Group Recommendation

Frequent group activities of human beings have become an indispensable p...

Adversarial Counterfactual Learning and Evaluation for Recommender System

The feedback data of recommender systems are often subject to what was e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The rapid growth of social networking services such as event-based social network services (Meetup), dining and restaurant services (Yelp), and media (Netflix and has made it increasingly easy for people to organize and participate in group activities. Those group activities include, for example, having dinners with colleagues, going to the cinema with spouses, and going picnic with friends. Owning tremendous data from people sharing social relationships and activities online, such social networking services face the challenge of targeting not only individuals but also user groups. As a result, recommendation for groups of users becomes an important task, which is beneficial for both users and social network services.

Traditional recommendation methods targeting individuals, however, cannot be applied in group recommendations. Many recent studies therefore have focused on developing recommender systems for group recommendations (Amer-Yahia et al., 2009; Baltrunas et al., 2010; Carvalho and Macedo, 2013; Gorla et al., 2013; Liu et al., 2012b; Ye et al., 2012; Yuan et al., 2014; Pham et al., 2016, 2017). Group recommendation is challenging since the preference differences among users in a group require certain trade-offs in order to balance the different preferences and recommend the most favorable items to the group. Existing group recommendation methods can be categorized into memory-based and model-based approach (Su and Khoshgoftaar, 2009; Koren et al., 2009). The memory-based approach can be further divided into two classes: preference aggregation (McCarthy and Anagnost, 1998; Koren et al., 2009) and score aggregation (Baltrunas et al., 2010; Crossen et al., 2002; McCarthy, 2002; O’Connor et al., 2001; Pizzutilo et al., 2005). The preference aggregation strategy combines all user preferences to create a group profile and then makes recommendations to the group, whereas the score aggregation strategy computes a list of recommendations for each member, and then combines individual lists to generate recommendations, using strategies such as average (Baltrunas et al., 2010; Berkovsky and Freyne, 2010), least misery (Amer-Yahia et al., 2009), maximum satisfaction (Boratto and Carta, 2011), etc. However, the two strategies cannot model the interaction among group members in term of aggregating members preferences. In contrast, the model-based approach models the decision making process of the group to recommend items. Nevertheless, existing models are insufficient to replicate the complicated decision making process of the group (more details in Section 2.1).

In this work, we propose a new approach for resolving the group recommendation problem. Our solution results from the key intuition that members in a group tend to follow the opinions of the most important members (leaders/experts) rather than consider opinions from all group members equally. This is because members in a group have different expertise levels on the groups topic, and a group usually only has a few users who are experts in that topic. Based on this intuition, we aim to model the interaction among users in a group, which helps us to explore the roles of different group members and hence, make more accurate recommendations for the group.

To achieve our goal, we introduce the Attentive Group Recommendation (AGR) model with the use of attention mechanisms in collaborative filtering to address the two mentioned challenges in group recommendation. Employing attention mechanism, AGR is able to learn the influence of other group members expertise on one members decision in the same group, and therefore, better model the group decision making process. Applying attention mechanism not only helps our model to explore the impact of each user on the group but also capture the various impacts of one user across different groups, which none of existing methods is able to achieve. The key contributions in this paper are:

  • The proposed Attentive Group Recommendation model takes a novel deep learning approach to tackle the group recommendation problem. AGR is the first to exploit the attention mechanism technique for group recommendation.

  • AGR is able to dynamically adjust the weight for each user across groups, providing a clearer picture of the roles of a user in different groups and better explanation of the groups final decision.

  • We conduct extensive experiments on four datasets and show that AGR consistently achieves better results than state-of-the-art methods.

The rest of the paper is organized as follows: Section 2 overviews existing literature; Section 3 describes the preliminaries, including Attention Mechanism and Bayesian Personalized Ranking (BPR); Section 4 proposes the Attentive Group Recommendation model; Section 5 shows the comparison experiments on four datasets; and Section 6 concludes our work.

2. Related Work

2.1. Group Recommendation

Group recommendation is a relevant problem in many industries such as social media (Pizzutilo et al., 2005; Salehi-Abari and Boutilier, 2015), tourism (McCarthy et al., 2006), and entertainment (Crossen et al., 2002; Yu et al., 2006; O’Connor et al., 2001). While recommendation techniques targeting individuals are extensively studied, research into group recommendation has been limited (Boratto, 2016).

Group recommendation methods can be categorized into memory-based and model-based approach, where the memory-based approach can be further divided into the preference aggregation approach and the score aggregation approach (Amer-Yahia et al., 2009). The preference aggregation approach makes recommendations based on a group profile that combines all user preferences (McCarthy and Anagnost, 1998; Yu et al., 2006), while the score aggregation approach computes a recommendation score of an item for each user, and then aggregates the scores across users to derive a group recommendation score of the item (Baltrunas et al., 2010; Crossen et al., 2002; McCarthy, 2002; O’Connor et al., 2001; Pizzutilo et al., 2005). The two most popular strategies for score aggregation are the average (AVG) and the least misery (LM) strategies. The AVG strategy takes the average score across individuals in the group as the final recommendation score, thereby maximizing overall group satisfaction (McCarthy and Anagnost, 1998; Yu et al., 2006). Alternatively, the LM strategy pleases everyone by choosing the lowest among all individuals scores as the final score (Baltrunas et al., 2010). Both score aggregation methods have major drawbacks. The AVG strategy may return items that are favorable to some members but not to the others, while the LM strategy may end up recommending mediocre items that no one either loves or hates. Baltrunas et al. (Baltrunas et al., 2010) point out that the performance of either strategy depends on group size and inner-group similarity. Yahia et al. (Amer-Yahia et al., 2009) propose the concepts of relevance and disagreement. Arguing that preference disagreements on each item among group members are inevitable, the authors experimentally show that taking into account disagreement significantly improves the recommendation quality of AVG and LM strategies.

We next review the model-based approaches (Koren et al., 2009; Agarwal and Chen, 2010; Wang and Blei, 2011; Su and Khoshgoftaar, 2009) for group recommendation. For example, Seko et al. (Seko et al., 2011) propose a model that incorporates item categories into recommendation, arguing that item categories influence the groups decision and items of different categories are not strictly comparable. The method, however, only applies to pre-defined groups such as couples, which can be treated as pseudo-users and apply single user recommendation techniques, while real-life groups are often ad-hoc and formed just for one-off or few activities (Liu et al., 2012b; Quintarelli et al., 2016)

. Applying game theory in group recommendation, Carvalho

et al. (Carvalho and Macedo, 2013) consider each group event as a non-cooperative game, or a game with competition among members in the group, and suggest that the recommendation goal should be the games Nash equilibrium. However, since a Nash equilibrium can be a set of items, the game theory approach may fail to recommend one specific item.

Probabilistic models have also been widely applied to solve group recommendation. Liu et al. (Liu et al., 2012b) propose a personal impact topic (PIT) model for group recommendation, assuming that the most influential user should represent the group and have big impact on the groups decisions. However, such an assumption does not reflect the reality that a users influence only contributes to the groups final decision if she is an expert in the field. For example, a movie expert may determine which movie the group should watch when she goes to the cinema with her friends, but she may not be the one to choose which restaurant they are dining afterwards. Yuan et al. (Yuan et al., 2014) propose a consensus model (COM) for group recommendation. The model assumes (i) that a users influence depends on the topic of decision, and (ii) that the group decision making process is subject to both the topic of the groups preferences and each user

s personal preferences. Despite such assumptions, COM suffers from a drawback similar to that of PIT: COM assumes that a user has the same probability to follow the group

s decisions across different groups. Alternatively, Gorla et al. (Gorla et al., 2013) assume that the score of a candidate item depends not only on its relevance to each member in a group but also its relevance to the whole group. They develop an information-matching based model for group recommendation, but the model seriously suffers from high time complexity, , where and are number of users and number of items, respectively. According to the experimental study (Yuan et al., 2014), the method (Gorla et al., 2013) cannot run to finish within days on datasets as we use in our experiments, and thus we do not compare with it in our experiments. Recently, Hu et al. (Hu et al., 2014) develop a deep-architecture model called DLGR that learns high-level comprehensive features of group preferences to avoid the vulnerability of the data. DLGR only focuses on pre-defined groups instead of ad-hoc groups. Therefore, we do not compare DLGR with our proposed model in this paper.

2.2. Deep Learning based Recommender Systems

Deep learning techniques have been extensively applied in recommender systems thanks to their high-quality recommendations (Cheng et al., 2016; Covington et al., 2016; Okura et al., 2017; He et al., 2017; Chen et al., 2017a; Tay et al., 2018, 2017; Hidasi et al., 2015; Chen et al., 2017b). Deep learning is able to capture non-linear and non-trivial relationships between users and items, which provides in-depth understanding of user demands and item characteristics, as well as the interactions between them. A tremendous part of literature has focused on integrating deep learning into recommender systems to perform various recommendation tasks, where a comprehensive review can be found at (Zhang et al., 2017; Karatzoglou and Hidasi, 2017). However, there is very little work that exploits deep learning techniques into group recommendation.

Our proposed AGR model makes recommendations using deep learning techniques based on user rating history. Specifically, AGR leverages attention mechanism to adapt the representation of the group, where more details will be shown in Section 3 and Section 4.

3. Preliminaries

3.1. Problem Formulation

Let and be the sets of users and items, respectively. We denote a history log as , where denotes an ad-hoc group and denotes the selected item by the group.

Given a target group , we aim to generate a recommendation list of items that group members in the group may be interested in. Note that the target group can be an ad-hoc group. The group recommendation problem can be defined as follows:

Input: Users U, items I, historical log H, and a target group .

Output: A function that maps an item to a real value to represent the item score for the target group .

3.2. Bayesian Personalized Ranking (BPR)

Based on the matrix factorization method, the Bayesian personalized ranking (BPR) method aims to address the challenge of implicit feedback recommendation (Rendle et al., 2009). Although the method was previously developed for the personalized ranking task, one can also leverage BPR for optimizing the group recommendation task. Specifically, we model a triplet of one group and two items: the positive item, which is observed, and the negative item, which is not. Our triplet model assumes that if group has viewed an item j (positive item), the group must prefer this item over all other unobserved item k (negative item). The model thus ranks positive items higher than the negative ones.

Specifically, our objective function is formulated as follows:


in which the set contains all pairs of positive and negative items for each group; represents the model parameters; is the predicted score for group and item j;

is the logistic sigmoid function; and

is the regularization parameter.

Since groups real-life data are usually implicit (Liu et al., 2012b; Yuan et al., 2014), BPR is the suitable learning method for our group recommendation model.

3.3. Attention Mechanism

Attention mechanism is one of the most exciting among recent advancements in deep learning (Vinyals et al., 2014; Bahdanau et al., 2014; Chorowski et al., 2015)

. It has been successfully applied in various machine learning tasks such as machine translation

(Bahdanau et al., 2014), image and video captioning (Vinyals et al., 2014), and speech recognition (Chorowski et al., 2015). The concept of attention is that when people visually access an object, we tend to focus on (pay attention to) certain important parts of the object instead of the whole object in order to come up with a response. Figure 1

represents an example of the attention mechanism. The attention model takes

arguments and a context

. It then returns a vector

, which is the combination of , given the information linked to the context . Specifically, given a context , the model returns a weighted arithmetic mean of each , where the weight is chosen according to the relevance of .

Our model adopts the attention mechanism with the main idea is to learn attentive weights of a user for other users in a group, in which higher weights indicate the corresponding users are more important; thus, their contributions are more important for the groups final decision. Aiming to expand the use of attention mechanism to group recommendation, the proposed AGR to the best of our knowledge is the first attention-based group recommendation model.

Figure 1. Attention Mechanism

4. Attentive Group Recommendation

This section introduces our Attentive Group Recommendation (AGR) model. We first present the key intuition behind our model and then describe the general framework.

4.1. Model Intuitions

Figure 2. Attentive Group Recommendation

The AGR model aims to simulate group decision making based on the key intuition that users in a group usually come from diverse backgrounds with different expertise; therefore, when it comes to make the groups decision, each persons opinion has different influence (Intuition 1). Moreover, users usually appreciate opinions of users who are experts in the group’s topic more than those of other users (Intuition 2). For example, in a group of travel planning for upcoming vacation, opinion from those who have traveled to Canada should have higher impact if the group is considering Canada as the destination for the trip. In addition, a user contributes differently in different groups depending on her relevant expertise (Intuition 3). For example, a user who has solid knowledge about movies may dominate the decision making of choosing a movie for a group to watch, but may not make any contribution to the decision making of choosing a camping site.

Overall, we observe that interactions between members are important to the groups decision making: users always discuss their opinions with one another before making a group decision. The importance of member interactions has not been examined in existing methods of group recommendation. A major contribution of the proposed AGR is that the model explores how a user affects other users in the group decision making and how the group decision changes accordingly.

Intuitively, we argue that during group decision making, (i) each user nominates certain users as the main decision makers of the group, and (ii) the most voted users then choose an item for the group. Such a voting scheme implies that the expert users in the fields relevant to the groups preferences usually receive high votes. With that in mind, for a group of users, we can consider the voting scheme (or the group decision making process) as sub-processes that happen simultaneously. That is, the first user votes for other users, the second user votes for other users, …, and the -th user votes for other users. Therefore, we create sub-groups from a group of users to explore the influence of each user as perceived by the remaining members (more details in Section 4.2).

4.2. General Framework

AGR models the group

s preference score with respect to a candidate item using neural network. We next simulate the two steps of the group decision making process as mentioned.

To simulate step (i) of group decision making, we can consider an influence weight , which represents the preference vote of user with respect to user in group . It is expected that if user is an important user in the group, she will receive high weight from user , which supports our Intuition 2

. One can estimate the influence weight

for every -pair of users assuming that is constant across all groups in which user and user participate. However, such an assumption contradicts to our Intuition 3 because user can perceive the influence of user differently in different groups, which contradicts to our intuition. According to Intuition 3, user should have larger influence in groups with topics relevant to her expertise than she does in groups with topics she is unfamiliar with. Thus, when it comes to voting, user tends to nominate user as the main decision maker in the group where user is an expert, but not in other groups that user is not familiar. The influence weight therefore should be calculated dynamically according to changing group preferences. Therefore, by using the attention mechanism to learn dynamically, we are able to detect the most important users in the group whose weights are higher than others; thus, the step (ii) of the group decision making process can be obtained easily.

We propose estimating influence weights for users in a group using an attention-based model. Particularly, given group members in a group , when a user look into other users in the group, she tends to only focuses on few expert users, and ignores other users, which is similar with the idea of attention mechanism introduced in Section 3.3. This motivates us to use an attention network to compute to measure how much important user is in group under user s opinion. Consequently, combining outputs from all sub-groups, we can estimate the importance weights of all users in group (Intuition 1), and accordingly make the recommendation for the group.

Formally, AGR models each user by two factor vectors: the user-latent vector and the user-context vector . The attention model uses the context vector to estimate the impact of the remaining users () on user . To be more specific, we first divide group into sub-groups, where each sub-group include all users in the group except user . We then can apply an attention sub-network for each sub-group, whose output is the representation for that sub-group. This representation serves as the outcome of the sub-process voting under the view of user . The representation (or the output) of each attention sub-networks is calculated as . Therefore, the representation of a group is where . We consider , that is, the voting scheme counts the votes of all users equally.

Figure 2 illustrates the architecture of AGR. For a given group , we create attention sub-networks. Each attention sub-network takes the user-context vector (filled orange circle) and the set of member user-latent vectors (solid blue circles), and then returns the attention weight of each user () (dashed blue circles). Finally, the output of each attention sub-network is calculated as the weighted sum (solid green square), which represents the group given the user-context . The final representation of the group is then computed as the summation (filled green square). Lastly, we employ BPR for optimizing the group recommendation task as mentioned in Section 3.2.

Attention Sub-networks. Each attention sub-network models the interactions between each member and the rest of the group to learn the preference votes of user for other members in the group. Given a user-context vector and a set of user-latent vectors , we use a two-layer network to compute the attention score as:


in which the matrices , and bias b are the first layer parameters, and the vector w and bias c are the second layer parameters. We simply use a linear

, but one can also use a ReLU function


We normalize using the Softmax function to obtain the final attention weights:


Predicted Score. After we obtain the representation of the group , the predicted score for group and item is computed as follows:


in which is the item latent vector for item .

Objective Function. AGR leverages BPR pair-wise learning objective to optimize the pair-wise ranking between the positive and negative items. The objective function Eq. (1) can be rewritten as follows:


in which represents the model parameters; and is the weight of user vote for user in group

. We then use Adaptive Moment Estimation (Adam) to optimize our objective function.

5. Experiments

In this section, we report experimental results of comparing AGR and six baseline techniques on four datasets. We also report the learned attention weights to evaluate the dominant users in group decision making. Such results will offer explanation for group recommendation result, which is an additional advantage of AGR. In general, our experiments aim to answer the following research questions (RQ):

  • RQ 1: How does AGR perform as compared to existing state-of-the-art methods?

  • RQ 2: Are the dynamic weights learned by AGR more preferable than the fixed weights learned by existing methods? How effective is our attention model?

  • RQ 3: How does AGR perform with different group sizes?

5.1. Experimental Settings

5.1.1. Datasets

We conduct our experiments on four real-world datasets. The first dataset is from an event-based social network (EBSN), Plancast111, which is used in (Liu et al., 2012a). Plancast allows users to directly follow the event calendars of other users. An event in Plancast consists of a user group and a venue. We therefore consider an event a group, and each user in the event a group member. Members in the group will select a venue (the candidate item) to host the event. Our goal is to recommend a venue for the group event.

Our second dataset is the crawled dataset from the EBSN Meetup222, which is from the work (Pham et al., 2016). We select the NYC data, which contains events held in New York City, as the dataset for our experiments. Similar to Plancast, we aim to recommend a venue for a given group to host an event.

We derive the last two datasets from the MovieLens 1M Data333 The MovieLens 1M Data contains one million movie ratings from over 6,000 users on approximately 4,000 movies. Following the approach in (Baltrunas et al., 2010), we extract from the MovieLens 1M Data two datasets: MovieLens-Simi and MovieLens-Rand. Users in the MovieLens-Simi data are assigned into the same group when they have high inner group similarity, while users in the MovieLens-Rand data are grouped randomly. MovieLens-Simi and MovieLens-Rand groups thereby resemble two typical real life situations: groups can either include people with similar preferences, or form between unrelated people. For example, a group of close friends has high inner group similarity, whereas people on the same bus can be considered a random group.

Figure 3. Performance of Group Recommendation Methods in terms of prec@K ()
Figure 4. Performance of Group Recommendation Methods in terms of rec@K ()
Figure 5. Performance of Group Recommendation Methods in terms of ndcg@K ()
Dataset Plancast Meetup
Total Users 41,065 42,747 5,759 5,802
Total Groups 25,447 13,390 29,975 54,969
Total Items 13,514 2,705 2,667 3,413
Avg. Group Size 12.01 16.66 5.00 5.00
Avg. Record
for a User
7.44 5.22 26.03 47.37
Avg. Record
for an Item
1.88 4.95 11.24 16.11
Table 1. Dataset Statistics

Table 1

reports descriptive statistics of the four datasets. We randomly split each dataset into training, tuning and testing data with the ratio of

, and respectively.

5.1.2. Evaluation Metrics

Following the literature (Baltrunas et al., 2010; Liu et al., 2012b; Pham et al., 2016; Yuan et al., 2014; Gorla et al., 2013; Cremonesi et al., 2010), we evaluate model performance using three metrics: precision (prec@K), recall (rec@K), and normalized discounted cumulative gain (NDCG) (ndcg@K). Here is the number of recommendations. We evaluate recommendation accuracy with .

precision@K is the fraction of top- recommendations selected by the group, while recall@K is the fraction of relevant items (true items) that have been retrieved in the top relevant items. We average the precision@K and recall@K values across all testing groups to calculate prec@K and rec@K. We also use the NDCG metric to evaluate the rankings of true items in the recommendation list. We average the NDCG values across all testing groups to obtain the ndcg@K metric.

For all of the three metrics, a larger metric value indicates better recommendations.

5.1.3. Recommendation Methods

This section experimentally compares six recommendation methods: CF-AVG, CF-LM, CF-RD (Amer-Yahia et al., 2009), PIT (Liu et al., 2012b), COM (Yuan et al., 2014), MF-AVG and the proposed AGR. Among the baseline models, CF-AVG, CF-LM, CF-RD are score-aggregation approaches; PIT and COM are state-of-the-art probabilistic models; and MF-AVG is a matrix factorization model.

User-based CF with averaging strategy (CF-AVG): CF-AVG applies user-based CF to calculate a preference score for each user with respect to a candidate item , and then averages the preference scores across all users to obtain the group recommendation score of item .

User-based CF with least-misery strategy (CF-LM): Similar to CF-AVG, CF-LM first applies user-based CF to calculate a score for each user with respect to a candidate item . However, the recommendation score of item is taken as the items lowest preference score across all users.

User-based CF with relevance and disagreement strategy (CF-RD) (Amer-Yahia et al., 2009): CF-RD also performs user-based CF to calculate a score for each user with respect to a candidate item

. The group recommendation score is calculated as the sum of the relevance and the disagreement scores of the item. The relevance score is obtained using CF-AVG or CF-LM, whereas the disagreement score is either the average pair-wise relevance difference for the item across group members (the average pair-wise disagreement method), or the mathematical variance of the relevance of the item across group members (the disagreement variance method).

Personal impact topic model (PIT) (Liu et al., 2012b): PIT is an author-topic model. Assuming that each user has an impact weight that represents the influence of the user to the final decision of the group, PIT chooses a user with a relatively large impact score as the groups representative. The selected user then chooses a topic based on her preference, and then the topic generates a recommended item for the group.

Consensus model (COM) (Yuan et al., 2014): COM relies on two assumptions: (i) the personal impacts are topic-dependent, and (ii) both the groups topic preferences and individuals preferences influence the final group decision. The first assumption allows COM to derive topic-dependent personal impacts from group-topic and topic-user distributions. The second assumption allows COM to aggregate the groups topic preferences and individuals preferences subjective to weights derived from users personal variables.

Average Matrix Factorization (MF-AVG): MF-AVG is a matrix factorization model. MF-AVG takes the average score of an item across all group members as the group recommendation score of the item. MF-AVG thereby considers all personal impact weights equal, assuming that all members contribute equally to the groups decision making. In other words, MF-AVG can be considered as our model with uniform weights setting. We represent a group as where , we optimize the BPR pair-wise ranking objective function to predict group recommendation scores.

Attentive Group Recommendation (AGR): Our proposed model combines attention sub-networks to create an attentive model that learns the dynamic personal impact weights of each user.

Parameter Settings.

For PIT and COM, we tuned the number of topics and kept other hyper-parameters as default. As regards to the MF-AVG and AGR models, we first randomly initialize the parameters using the Gaussian distribution with mean of 0 and standard deviation of 0.05, and then use Adaptive Moment Estimation (Adam) to optimize our objective functions. We also tested the batch size of [128, 256, 512], the learning rate of [0.001, 0.005, 0.01, 0.05, 0.1], and different regularizers of [0.001, 0.01, 0.1, 0]. We empirically set the embedding size of MF-AVG and AGR with the dimension of 50. We obtain the optimal setting with the batch size of 256, learning rate of 0.001, and regularizers of 0.01.

Figure 6. Attention Weights Learned by PIT and AGR

5.2. Overall Performance Comparison (RQ 1)

This section compares the recommendation results from AGR to those from the baseline models. Figure 3, Figure 4 and Figure 5 report the prec@K, rec@K and ndcg@K values for the four datasets with . We observe from the three figures that:

  • AGR consistently achieves the best performance across all methods, including score-aggregation approaches (CF-AVG, CF-LM, CF-RD) and probabilistic model approaches (PIT, COM).

  • Although the group information of the two MovieLens datasets are conducted manually instead of already observed as Meetup and Plancast, our model can still be able to show the recommendation flexibility in adopting to randomness datasets.

  • AGR and MF-AVG models produce good results in comparison to the previous state-of-the-art probabilistic models. MF-AVG performs better than PIT and COM on the Meetup and Plancast datasets, but not on the MovieLens-Simi and MovieLens-Rand dataset. One explanation is that the simplistic setup of MF-AVG cannot model the complexity of real life group interactions.

The prec@5 metric values show that AGR outperforms the best baseline method by , , , on Meetup, Plancast, MovieLens-Simi and MovieLens-Rand, respectively. We observe the same improvements for rec@5. In general, AGRs recommendations are consistently better than the baseline methods, with the -value less than 0.0001 for all the results.

It is noticeable that PITs performance is not compare with those of other baseline models. One possible reason is that as a Meetup or Plancast group usually has a large number of participants, many of whom only join a few groups and thus have very limited historical data, the user impacts learned by PIT for such participants are not reliable. Another possible reason for PITs poor performance is that the assumptions underlying PIT do not hold in the context of MovieLens data: since MovieLens users select movies independently from one another, there is no representative user in a MovieLens group.

We also observe that the AGR does not outperform the baselines models on the Meetup dataset as significantly as it does on the other three datasets. One explanation is that since a Meetup group often has few venue options, group members tend to choose the place they are most familiar with, making it relatively easy to recommend the venue to the group (Pham et al., 2016). While Meetup users form a big group before hosting an event and choosing the venue, such services as Plancast allow users to follow other users event calendars and choose to participate in existing events. Plancast groups therefore tend to be more diverse than Meetup groups, and Plancast event venues are not as easily predicted as Meetup event venues.

In general, AGR achieves remarkable recommendation results on the variety of datasets. Our experiments show the flexibility of AGR in making group recommendation given different data types.

Model Meetup Plancast
MF-AVG 0.851962 0.632753 0.467176 0.211513 0.321079 0.191142 0.530135 0.274007
AGR 0.888513 0.689149 0.502620 0.270515 0.360627 0.230099 0.578485 0.338959
Table 2. Performance Comparison between MF-AVG and AGR

5.3. The Role of Attention Mechanism (RQ 2)

We further conduct paired -tests on the performances of AGR and MF-AVG models to verify that AGRs improvements over MF-AVG are statistically significant at the five percent significance level. While AGR employs attention mechanism to calculate the weights, MF-AVG assigns a normalized constant weight to each group member. We conduct paired -tests on the performance metrics of top- recommended lists () for the two models to see whether AGR statistically significantly outperforms MF-AVG.

Table 2 compares the performances of AGR and MF-AVG. We observe that the mean pooling strategy of MF-AVG always performs worse than the attention mechanism of AGR, except in the MovieLens-Rand case. The good performance of MF-AVG on MovieLens-Rand data is expected because MovieLens-Rand groups satisfy MF-AVG assumptions: randomly-grouped users in MovieLens-Rand data tend to equally contribute to the groups final decisions. The reported -values are nominal for all tests, indicating that AGRs performances are statistically much stronger than MF-AVGs performances.

Attention Weight Visualization. As noted previously, an advantage of AGR is that the method allows us to calculate the attention weight values for further analyses. Figure 6 visualizes the attention weights learned by AGR and PIT for two randomly-chosen groups from our experiments. We look at the weights learned by PIT because similar to AGR, PIT learns the personal impact weight for each user (Liu et al., 2012b). Since COM does not learn the personal impact weight for each user (Yuan et al., 2014), we do not compare AGR with COM in this Section.

Figure 6(a) and Figure 6(b) show a few user attention weights learned by PIT and AGR of two randomly-chosen groups that share a user no. 4122 (“”). Figure 6(a) reports the personal impact weights of three users in the first group (group A). According to both models, the user no. 4122 has the highest weights in group A and therefore dominates group As decision making. Figure 6(b), however, shows that PIT continues to assume that user no. 4122 is the most influential user in group B, whereas AGR is able to detect other users who play more important roles in group Bs decision making than user no. 4122 does. While PITs personal impact parameter cannot differentiate the roles of one user in different groups and thus may fail to recognize influential members in a group, the attention mechanism of AGR can capture the dynamic user impacts in group decision making.

Figure 7. Performance on Different Group Sizes

5.4. Model Performances for Different Levels of Group Sizes (RQ 3)

To study the performance of each recommendation method on different group sizes, we run the experiments for four levels of group size (1-5 members, 6-10 members, 11-15 members, and 16-20 members) using Meetup and Plancast groups. We keep the same setting as illustrated in Section 5.1

for all the models, and classify the groups into bins based on group size. Since the number of groups with more than 20 members is small, we decide to exclude these groups in this experiment. Figure

7 plots the resulting rec@5 and ndcg@5 curves. Note that since the group size of the MovieLens-Simi and MovieLens-Rand dataset is fixed, we do not study the different levels of group sizes on these two datasets.

Figure 7 shows that AGR achieves better performances than other baseline methods across different group sizes. We have the following observations: 1) AGR shows clear improvements in recommendations for groups of larger sizes. AGR improves and on the Meetup dataset over the best baseline for groups of 11-15 members and 16-20 members respectively, while these numbers are and for the Plancast dataset. This indicates the significance of AGR in addressing groups of larger sizes. 2) CF approaches often deliver good performance when the group size is small, as low diversity within a group facilitates smooth aggregations. As the number of members in the group increases, we need relatively complex methods such as probabilistic models or neural networks models to make adequate recommendations.

6. Conclusions

Aiming to solve the group recommendation problem through a deep learning approach, the Attentive Group Recommendation model (AGR) utilizes an attention mechanism to learn the influence weight of each user in a group in order to make group recommendations. The major contributions of AGR are that the model not only assumes and dynamically learns different impact weights of each given user for different groups, but it also also considers the interactions between users in the group. Thus, AGR is able to better model the group decision making process. To the best of our knowledge, AGR is the first model to exploit attention mechanism in group recommendation. Conducting extensive experiments on four real-world datasets, we show the consistent performance of AGR as compared to those of existing group recommendation methods.

A promising development direction for AGR is to incorporate side information such as social connections, text information (for example event description) or time information into the context to learn the attention model.


  • (1)
  • Agarwal and Chen (2010) Deepak Agarwal and Bee-Chung Chen. 2010. fLDA: matrix factorization through latent dirichlet allocation. In Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, February 4-6, 2010. 91–100.
  • Amer-Yahia et al. (2009) Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das, and Cong Yu. 2009. Group Recommendation: Semantics and Efficiency. PVLDB 2, 1 (2009), 754–765.
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014).
  • Baltrunas et al. (2010) Linas Baltrunas, Tadas Makcinskas, and Francesco Ricci. 2010. Group recommendations with rank aggregation and collaborative filtering. In Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, September 26-30, 2010. 119–126.
  • Berkovsky and Freyne (2010) Shlomo Berkovsky and Jill Freyne. 2010. Group-based Recipe Recommendations: Analysis of Data Aggregation Strategies. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys ’10). 111–118.
  • Boratto (2016) Ludovico Boratto. 2016. Group Recommender Systems. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). 427–428.
  • Boratto and Carta (2011) Ludovico Boratto and Salvatore Carta. 2011. State-of-the-Art in Group Recommendation and New Approaches for Automatic Identification of Groups.
  • Carvalho and Macedo (2013) Lucas Augusto Montalvão Costa Carvalho and Hendrik Teixeira Macedo. 2013. Users’ satisfaction in recommendation systems for groups: an approach based on noncooperative games. In 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, Companion Volume. 951–958.
  • Chen et al. (2017a) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017a. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 335–344.
  • Chen et al. (2017b) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017b. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. CoRR abs/1606.07792 (2016).
  • Chorowski et al. (2015) Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, KyungHyun Cho, and Yoshua Bengio. 2015. Attention-Based Models for Speech Recognition. CoRR abs/1506.07503 (2015).
  • Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016. 191–198.
  • Cremonesi et al. (2010) Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys ’10).
  • Crossen et al. (2002) Andrew Crossen, Jay Budzik, and Kristian J. Hammond. 2002. Flytrap: intelligent group music recommendation. In Proceedings of the 7th International Conference on Intelligent User Interfaces, IUI 2002, San Francisco, California, USA, January 13-16, 2002. 184–185.
  • Gorla et al. (2013) Jagadeesh Gorla, Neal Lathia, Stephen Robertson, and Jun Wang. 2013. Probabilistic group recommendation via information matching. In 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013. 495–504.
  • He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017. 173–182.
  • Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based Recommendations with Recurrent Neural Networks. CoRR abs/1511.06939 (2015).
  • Hu et al. (2014) Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, and Wei Cao. 2014. Deep Modeling of Group Preferences for Group-Based Recommendation. In

    Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada.

  • Karatzoglou and Hidasi (2017) Alexandros Karatzoglou and Balázs Hidasi. 2017. Deep Learning for Recommender Systems. In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys ’17).
  • Koren et al. (2009) Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30–37.
  • Liu et al. (2012a) Xingjie Liu, Qi He, Yuanyuan Tian, Wang-Chien Lee, John McPherson, and Jiawei Han. 2012a. Event-based social networks: linking the online and offline social worlds. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012. 1032–1040.
  • Liu et al. (2012b) Xingjie Liu, Yuan Tian, Mao Ye, and Wang-Chien Lee. 2012b. Exploring personal impact for group recommendation. In 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012. 674–683.
  • McCarthy (2002) Joseph F. McCarthy. 2002. Pocket Restaurant Finder: A situated recommender systems for groups. In Proceeding of Workshop on Mobile Ad-Hoc Communication at the 2002 ACM Conference on Human Factors in Computer Systems.
  • McCarthy and Anagnost (1998) Joseph F. McCarthy and Theodore D. Anagnost. 1998. MusicFX: An Arbiter of Group Preferences for Computer Aupported Collaborative Workouts. In CSCW ’98, Proceedings of the ACM 1998 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, November 14-18, 1998. 363–372.
  • McCarthy et al. (2006) Kevin McCarthy, Maria Salamó, Lorcan Coyle, Lorraine McGinty, Barry Smyth, and Paddy Nixon. 2006. CATS: A Synchronous Approach to Collaborative Group Recommendation. In Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference, Melbourne Beach, Florida, USA, May 11-13, 2006. 86–91.
  • O’Connor et al. (2001) Mark O’Connor, Dan Cosley, Joseph A. Konstan, and John Riedl. 2001. PolyLens: A recommender system for groups of user. In Proceedings of the Seventh European Conference on Computer Supported Cooperative Work, 16-20 September 2001, Bonn, Germany. 199–218.
  • Okura et al. (2017) Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based News Recommendation for Millions of Users. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 1933–1942.
  • Pham et al. (2017) Tuan-Anh Nguyen Pham, Xutao Li, and Gao Cong. 2017. A General Model for Out-of-town Region Recommendation. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17). 401–410.
  • Pham et al. (2016) Tuan-Anh Nguyen Pham, Xutao Li, Gao Cong, and Zhenjie Zhang. 2016. A General Recommendation Model for Heterogeneous Networks. IEEE Trans. on Knowl. and Data Eng. 28, 12 (Dec. 2016), 3140–3153.
  • Pizzutilo et al. (2005) Sebastiano Pizzutilo, Berardina De Carolis, Giovanni Cozzolongo, and Francesco Ambruoso. 2005. Group Modeling in a Public Space: Methods, Techniques, Experiences. In Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications (AIC’05). 175–180.
  • Quintarelli et al. (2016) Elisa Quintarelli, Emanuele Rabosio, and Letizia Tanca. 2016. Recommending New Items to Ephemeral Groups Using Contextual User Influence. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16).
  • Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009. 452–461.
  • Salehi-Abari and Boutilier (2015) Amirali Salehi-Abari and Craig Boutilier. 2015. Preference-oriented Social Networks: Group Recommendation and Inference. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15).
  • Seko et al. (2011) Shunichi Seko, Takashi Yagi, Manabu Motegi, and Shin-yo Muto. 2011. Group recommendation using feature space representing behavioral tendency and power balance among members. In Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, USA, October 23-27, 2011. 101–108.
  • Su and Khoshgoftaar (2009) Xiaoyuan Su and Taghi M. Khoshgoftaar. 2009. A Survey of Collaborative Filtering Techniques. Adv. Artificial Intellegence 2009 (2009), 421425:1–421425:19.
  • Tay et al. (2017) Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. 2017. Translational Recommender Networks. CoRR abs/1707.05176 (2017).
  • Tay et al. (2018) Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Multi-Pointer Co-Attention Networks for Recommendation. CoRR abs/1801.09251 (2018).
  • Vinyals et al. (2014) Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and Tell: A Neural Image Caption Generator. CoRR abs/1411.4555 (2014).
  • Wang and Blei (2011) Chong Wang and David M. Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011. 448–456.
  • Ye et al. (2012) Mao Ye, Xingjie Liu, and Wang-Chien Lee. 2012. Exploring social influence for recommendation: a generative model approach. In The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12-16, 2012. 671–680.
  • Yu et al. (2006) Zhiwen Yu, Xingshe Zhou, Yanbin Hao, and Jianhua Gu. 2006. TV Program Recommendation for Multiple Viewers Based on user Profile Merging. User Model. User-Adapt. Interact. 16, 1 (2006), 63–82.
  • Yuan et al. (2014) Quan Yuan, Gao Cong, and Chin-Yew Lin. 2014. COM: a generative model for group recommendation. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014. 163–172.
  • Zhang et al. (2017) Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep Learning based Recommender System: A Survey and New Perspectives. CoRR abs/1707.07435 (2017).