Attentional Graph Convolutional Networks for Knowledge Concept Recommendation in MOOCs in a Heterogeneous View

06/23/2020 ∙ by Shen Wang, et al. ∙ Beihang University Tsinghua University University of Illinois at Chicago Yanshan University 0

Massive open online courses are becoming a modish way for education, which provides a large-scale and open-access learning opportunity for students to grasp the knowledge. To attract students' interest, the recommendation system is applied by MOOCs providers to recommend courses to students. However, as a course usually consists of a number of video lectures, with each one covering some specific knowledge concepts, directly recommending courses overlook students'interest to some specific knowledge concepts. To fill this gap, in this paper, we study the problem of knowledge concept recommendation. We propose an end-to-end graph neural network-based approach calledAttentionalHeterogeneous Graph Convolutional Deep Knowledge Recommender(ACKRec) for knowledge concept recommendation in MOOCs. Like other recommendation problems, it suffers from sparsity issues. To address this issue, we leverage both content information and context information to learn the representation of entities via graph convolution network. In addition to students and knowledge concepts, we consider other types of entities (e.g., courses, videos, teachers) and construct a heterogeneous information network to capture the corresponding fruitful semantic relationships among different types of entities and incorporate them into the representation learning process. Specifically, we use meta-path on the HIN to guide the propagation of students' preferences. With the help of these meta-paths, the students' preference distribution with respect to a candidate knowledge concept can be captured. Furthermore, we propose an attention mechanism to adaptively fuse the context information from different meta-paths, in order to capture the different interests of different students. The promising experiment results show that the proposedACKRecis able to effectively recommend knowledge concepts to students pursuing online learning in MOOCs.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In recent years, massive open online courses (MOOCs) are gradually becoming a mode of alternative education worldwide. For example, Coursera, edX, and Udacity, the three pioneering MOOC platforms, offer millions of user accesses to numerous courses from internationally renowned universities. In China, millions of users study in XuetangX111, which is one of the largest MOOC platforms(Qiu et al., 2016), where thousands of courses are offered on various subjects. Although the number of students in MOOCs is continuously growing, there are still some straits with MOOCs. A challenging problem for MOOCs is how to attract students to study continuously and efficiently on the platforms, where the overall course completion rate is lower than 5% (Zhang et al., 2017). Therefore, it requires better understanding and capturing of student interests.

To understand and capture student interests on MOOCs platforms, multiple efforts have been done, including course recommendation (Jing and Tang, 2017; Zhang et al., 2019), behavior prediction (Qiu et al., 2016), user intentions understanding (Zhang et al., 2017)

, etc. Among these efforts, recommendation system is applied by MOOCs provider to recommend courses to students. However, a course usually consists of a number of video lectures with each one covering some specific knowledge concepts. Direct course recommendation overlooks students’ interest to specific knowledge concept, e.g., computer vision courses taught by different instructors may be quite different in a microscopic view (cover different sets of knowledge concepts): someone instructor may only cover geometry based methods while other one may only cover deep learning based methods, and thus recommending the computer vision course only covering geometry based methods to the student interested in the deep learning based methods will not be a good match. Therefore, it requires to study students’ online learning interests from a microscopic view and conduct knowledge concept recommendation.

Figure 1. A comprehensive view of dataset collected from MOOCs. The left is the system structure of MOOCs. The right is the structure rebuild by users’ behaviors of online learning. Different types of lines indicate different relationships between pairs of entities.

Traditional recommendation strategy, such as collaborative filtering (CF), which considers user (students) historical interactions and makes recommendations based on potential common preferences from users with similar interests, has achieved great success. However, CF based methods suffer from the sparsity of user-item (student-knowledge concept) relationships, which limits the recommendation performance. To overcome this problem, a number of efforts have been done by leveraging side information, such as social networks (Jamali and Ester, 2010), user/item attributes (Wang et al., 2018), images (Zhang et al., 2016), contexts (Sun et al., 2017), etc. In a MOOCs platform, we observe that in addition to the user and knowledge concept, there exist multiple types of entities (video, course, teacher) and multiple types of relationships between pair of different entities. Table 1 shows the statistics of the real-world XuetangX data collected between January 1st, 2018 and March 31st, 2018. This data consists of 9,986 users, 43,405 videos, 7,020 courses, 5,038 teachers, 1,029 knowledge concepts, and corresponding multiple types of relationships. As shown in Figure 1, the “course: V_9e77179” includes the “knowledge concept: c++”, the “student: 207256” is taking the “course: CaltechX”, the “video: V_1a9aa686” is related to the “knowledge concept: binary tree”, and the “course: CaltechX” is taught by the “teacher: Smith”. Further more, taking users’ behavior history into consideration, we can discover additional relationships. For example, the “user: 207256” clicked the “knowledge concept: c++”, “knowledge concept: binary tree”, and “knowledge concept: depth-first search”. Accounting for above multiple types of relationships, we can get much more fruitful facts and interactions between the user and knowledge concepts. If we merely depend on the basic structures, it is difficult to find the significant interaction between “knowledge concept: depth-first search” and “knowledge concept: time complexity ”, which belong to different courses but are clicked by one user. As shown in Figure 1, different knowledge concepts contain different context. Only utilizing single type of interaction may overlook the significant relations between user and knowledge concept. For example, “knowledge concept: c++” and “knowledge concept: binary tree” have dissimilar semantics even though they are included in the same video. These heterogeneous relationships provide rich side information and can benefit the recommendation system in three folds: (1) semantic relatedness among knowledge concepts can be introduced and help to identify the latent interaction; (2) a user’s interests can be reasonably extended and the diversity of recommended knowledge concepts can be increased; and (3) a user’s interest can be interpreted by tracking a user’s historical records along these relationships. Thus, it requires to incorporate these heterogeneous relationships into the representation learning of the entities.

Entities Statistic Relations Statistic
user 9,986 user-course 14,326
video 43,405 course-video 87,129
course 7,020 teacher-course 13,274
teacher 5,038 video-knowledge concept 11,732
knowledge 1,029 course-knowledge concept 21,507
Table 1. Statistics of entities and relations for dataset.

Based on above observation, we propose Attentional Heterogeneous Graph Convolutional Deep Knowledge Recommender (ACKRec ), an end-to-end framework for knowledge concept recommendation on MOOCs platform. To capture heterogeneous complex relationships, we model the MOOCs platform data as a heterogeneous information network (HIN) (Shi et al., 2019). Then, we propose an attention-based graph convolutional networks (GCNs) to learn the representation of different entities. Traditional GCNs can only capture the homogeneous relationships among homogeneous entities, which overlooks the rich information among heterogeneous relationships. To address this issue, we use meta-paths (Sun et al., 2017) as the guidance to capture the heterogeneous context information in a HIN via GCN. In this way, the heterogeneous relationships are utilized in a more natural and intuitive way. Moreover, considering that different students may have different interests, we further propose an attention mechanism to adaptively leverage context in multiple meta-paths. In the end, we propose to optimize the parameters of proposed model via an extended matrix factorization and obtain the final recommendation list.

The key contributions of this paper can be summarized as follows:

  • We identify the important problem of knowledge concept recommendation, which is often overlooked by the existing MOOCs recommendation system. Knowledge concept recommendation fills this gap and provides a more microscope level recommendation.

  • We propose ACKRec , a novel end-to-end framework utilizing rich heterogeneous context side information to assist knowledge concept recommendation.

  • We develop a heterogeneous information network modeling to capture various complex interactions among different types of entities in the MOOCs platform.

  • We design an attention-based graph convolutional network, which can incorporate both content and heterogeneous context together into the representation learning of different entities. The proposed model can automatically discovers user potential interests by propagating users’ preferences under the guide of meta-path in an attentional way.

  • We conduct numerous experimental studies using real-world data collected from XuetangX to fully evaluate the performance of the proposed model. We study the parameters, including meta-path combination, representation dimension, number of latent factories, and number of GCNs layers. We synthetically demonstrate the effectiveness of the proposed model compared with a series of strong baselines.


2.1. Problem Statement

Given a target user with corresponding interactive data in MOOCs, the goal is to calculate the interest score about the user and a series of knowledge concepts, and the recommend results – a top list of knowledge concepts. More formally, given interactive data of a user , a predict function is learned and used to generate a recommend list of knowledge concepts (e.g., ”c++”, ”binary tree”, ”linked list”, ect.), such that .

Figure 2. System architecture of ACKRec .

2.2. System Architecture

The architecture of our proposed knowledge concept recommendation system, ACKRec, is shown in Figure 2. It consists of the following components:

  • Feature Extraction. By using the data collected from the MOOCs, we first extract content information as content feature from the knowledge concepts’ name, and then analyze various relationships (e.g., and

    relations) among different types of entities (e.g., knowledge concept, video, course) to describe the knowledge concept. Similarly, we also generate the concept features and context feature for the user. (See Section 3.1 for details regarding feature extraction.)

  • Meta-path Selection. Based on the features extracted from the data, in this module we construct a structural HIN to model the relationships among different types of entities, and then select different meta-paths from the HIN to depict the relatedness over knowledge concept (i.e., with different meanings). For example, if two different users enrolled in the same course, we brings an edge between two users. (See Section 3.2 for details regarding the meta-path builder on HIN.)

  • Representation Learning of Heterogeneous Entities. Based on the meta-paths constructed in the previous step, a representation learning model is proposed to learn the low-dimensional representations of the entities in a heterogeneous view. The model is capable of capturing the structural correlations between heterogeneous entities. Specifically, we leverage the selected meta-paths to guide the entity representation learning via graph convolutional networks. Later, we utilize the attention mechanism to adaptively fuse the learned entity representations from different meta-paths. (See Section 3.3 for details regarding our proposed model ACKRec .)

  • Rating Prediction.

    After generating the low-dimensional representations of users and knowledge concepts, the dense vectors of entities are fed to an extended matrix factorization to learn the parameters of the model. Moreover, we predict users’ interests in the unclicked knowledge concepts base on the user-item (student-knowledge concept) rating matrix.

3. Proposed Method

In this section, we introduce the details of how we learn the representation of knowledge concepts and users based on the generated content feature and context feature, and how we perform knowledge concept recommendation based on the learned representations.

3.1. Feature Extraction

3.1.1. Content Feature.

In general, names of knowledge concepts are almost a generalization of knowledge concepts (e.g., “c++,” “binary tree,” “linked list”), which contains rich semantic information. Hence, we generate the word embedding of the name of the knowledge concept and use it as content feature for knowledge concept. Specifically, we use Word2vector (Mikolov et al., 2013) to generate the word embedding. For the user concept, we generate the content feature in a similar way.

3.1.2. Context Feature.

Content feature such as word embedding of knowledge concept names can be used to represent information of a knowledge concept. Besides, there exist rich context information, such as relationships between different entities in the network structure (e.g., user: 207256 watched video: v_9e77179 and video: v_1a9aa686; this behavior implies a relation between two videos). To include these complex relationships among different types of entities, we further model the context information as the feature. Specifically, we consider the following relationships in a user learning activities.

  • : Based on the data of users’ online learning behaviors, we build the user-click-knowledge concept matrix , where each element implies that a user clicked a knowledge concept during his learning activities.

  • : To describe the relation between a user and a course, we generate the user-learn-course matrix , where each element . It indicates that a user is taking a course .

  • : Similarly, we generate the user-watch-video matrix , where each element denotes that a user has watched a video .

  • : To describe the behavior that a user is taking a course, which is taught by a teacher, we generate the user-learn-course-taught by-teacher matrix , where element denotes that a user is taking a course taught by a teacher .

We generate these relationship to describe the user related interactions in the heterogeneous information network. For knowledge concepts, we also discover a number of knowledge concepts related relationships. For example, the relation knowledge concept-included by-video denotes that a knowledge concept is included in a video, and the relation knowledge concept-involved-course indicates that a knowledge concepts is covered in a course.

3.2. Meta-path Based Relationship

To model different types of entities and their complex relationships in a proper manner, we first describe how to utilize a HIN to depict users, knowledge concepts, and corresponding heterogeneous relations among them. Before proceeding to our approach, we first introduce some related concepts.

Definition 1. Heterogeneous information network (HIN) (Shi et al., 2019). A HIN is denoted as consisting of an object set and a link set . A HIN is also associated with an object type mapping function and a link type mapping function . and denote the sets of the predefined object and link types, where .

In this study, we model the MOOCs data as the a HIN. Specifically, the constructed HIN includes five entities (i.e., user (U), course (C), video (V), teacher (T), knowledge concept (K) as shown in Figure 2) and a series of relationships among them (e.g., , , , ). Based on the constructed HIN, we can obtain the network schema, where their definitions are as followed.

Definition 2. Network schema (Wang et al., 2019a). The network schema is denoted as . It is a meta-template for an information network with the object type mapping and the link type mapping , which is a directed graph defined over object types with edges as relations from .

We defined our network schema in Figure 3, which represents semantic and relation information comprehensively in the MOOCs dataset. Based on the the network schema, we can discover the semantic paths between a pair of entities, which is called meta-path.

Definition 3. Meta-path (Gori et al., 2005). A meta-path is defined on a network schema and is denoted as a path in the form of (abbreviated as ), which describes a composite relation between object and , where denotes the composition operator on relations.

Typical meta-paths between two users can be defined as follows: , which means that two different users are related because they click the same knowledge concept; , which denotes that two users are related through paths containing different courses taught by the same teacher. Notice that, the potential meta-paths induced from the HIN can be infinite, but not everyone is relevant and useful for the specific task of interest. Fortunately, there are some algorithms (Chen and Sun, 2017) proposed recently for automatically selecting the meta-paths for particular tasks. Given all the concepts about the HIN, we now proceed to our problem of Heterogeneous Information Network Representation Learning. The notations we will use throughout the article are summarized in Table 2.

Notation Explanation
heterogeneous information network
set of entities
set of relations
network schema
set of meta-paths
adjacency matrix and degree matrix
base different meta-paths
the features matrix of entities
layer of entity representation
weights of GCN layer
the representation of entities
the weights of meta-paths
the latent factors of user
the latent factors of knowledge concept
the number of users and
knowledge concepts in MOOCs dataset
the representation dimension
the number of latent factors
the true rating of user to knowledge concept
predicted rating of user to knowledge concept
the matrix to integrate the and
be in the same space
the tuning parameter
Table 2. Notations and explanations.
Figure 3. Network schema for HINs in MOOCs. The MOOCs include user, course, video, teacher and knowledge concept. Different types line indicate different type of relationships among different types entities.

3.3. Attention-based Graph Convolutional Networks for HIN Representation Learning

After the content features and context features are obtained, we feed the entity content features to the graph convolutional networks to learn the latent entity representation. Given the heterogeneous information network associated with a set of meta-paths and the corresponding adjacency matrix . denotes the number of meta-paths. We adopt a multiple-layer graph convolutional network (GCN) with the following layer-wise propagation rule:


Here we remove the subscripts of meta-path indicator, user indicator and knowledge concept indicator for all the graph related symbols for simplification. denotes the new representation of an entity. In particular, is the content feature we have extracted at the first step. , is the adjacency matrix corresponds to a specific meta-path with self-connections and

is the identity matrix,

, and is the all-ones vector. Here is defined as , where

is an entry-wise rectified linear activation function.

is the layer number indicator. is the shared trainable weight matrix for all the entities at layer . Weight sharing is beneficial since it is statistically and computationally more efficient than the traditional embedding methods. With the help of the weight sharing, the model can be well regularized and the number of parameters is significantly reduced.

The information propagation process of content or context can be regarded as a Markov process converges to a stationary distribution and the row indicating a likelihood of spreading from the knowledge concept . This stationary distribution of the diffusion process is proven to have a closed form solution. When considering the 1-step truncation of the diffusion process, the propagation layer computes the weighted sum of the contexts’ current representation. We set = = = , and the three propagation layers are defined as follows:


where is the final representations of an entity.

Going through the three propagation layers, we learn the representations for each meta-path. However, different meta-paths should not be considered equally. To address this problem, we utilize the attention mechanism to fuse the representation of entities learned under the guide of different meta-paths and generate the attentional joint representation. Specifically, we learn the attention weights for different meta-paths as follows:


Here, indicates the attention function. indicates final representation of an entity, which has integrated the attention weights of different meta-paths. Since in this problem, we mainly focus on the user and knowledge concept. The target entity is user or knowledge concept. Formally, given the corresponding representation for each meta-path , we define the attention weights as follows:


where is the representation of an entity based on the target meta-path, and denotes the representation based on the other meta-paths. denotes a trainable attention vector, and

denotes the nonlinear gating function. We formulate a feed-forward neural network to compute the correlation between one meta-path and the other meta-paths. This correlation is normalized by a softmax function. The attentional joint representation can be represented as follows:


where , and denotes the final representation of knowledge concepts. The meta-path attention allows us to better infer the importance of different meta-paths by leveraging their correlations and learning the entities representations.The algorithm framework is shown in Algorithm 1.

0:    the given meta-paths set ; the corresponding adjacency matrix set ; the features matrix of target entities;the dimension of representations .
0:    The representations of target entities .
1:  Initialize ;
2:  ;
3:  for each  do
4:     Calculate , and according to Eq.1;
5:     ;
6:     Calculate by Eq.2;
7:     Calculate by Eq.3;
8:     Calculate by Eq.4;
9:     Add to ;
10:  end for
11:  Generate by Eq.8;
12:  return  .
Algorithm 1 Generating the representations of entities.

3.4. Matrix Factorization for Knowledge Concept Recommendation

So far, we have studied how to extract content feature and context feature of users and knowledge concepts, respectively. Using attention-based GCNs for representation learning, we can obtain the representation of knowledge concepts , and the representation of users . In this part, we propose to utilize an extended matrix factorization (MF) based method to perform knowledge concept recommendation for the users. We consider the number of times users click on the knowledge concepts as a rating matrix. The rating of a user on a knowledge concept can be defined as follows:


where indicates latent factors of the user and denotes latent factors of the knowledge concept. is the number of latent factors. And and are the number of the user entities and knowledge concept entities. Because we have also obtained the representations for user u and knowledge concept k, we further feed them into the rating predictor as follows:


where and are the representation of users and knowledge concepts. The trainable parameters and are introduced to make sure and be in the same space. and are the tuning parameters. The purpose is to achieve a suitable ratings prediction, so the object function of MF is defined as follows:


We further add regularization terms to the function. Therefore, the final objective function is formulated as follows:



is the regularization parameter. We then utilize the stochastic gradient descent algorithm to optimize the local minimum of the final objective function.

4. Experiments

4.1. Datasets

We collected real world data from MOOC platform. We select enrollment behaviors occurring between October 1st, 2016 and December 30th, 2017 as the training set, and those occurring between January 1st, 2018 and March 31st, 2018 as the test set. Each instance in the training set or test set is a sequence representing a user’s history of click behaviors. In the training process, for each sequence in the training data, we treat the last clicked knowledge concept as the target and the remainders as the past behaviors. Moreover, for each positive instance, we randomly generate one negative instance to replace the target knowledge concept. In the testing process, we treat each enrolled knowledge concept in the test set as the target knowledge concept; the corresponding knowledge concepts of the same user in the training set are treated as the sequence representing the history of clicked knowledge concepts. To evaluate the recommendation performance, each positive instance in the test set is paired with 99 randomly sampled negative instances, and outputs prediction scores for the 100 instances (1 positive and 99 negatives)(He et al., 2018).

4.2. Evaluation Metrics

We evaluate all the methods in terms of the widely used metrics, including Hit Ratio of top-K items (HR@K) and Normalized Discounted Cumulative Gain of top-K items (NDCG@K) (Järvelin and Kekäläinen, 2000). HR@K is a recall-based metric that measures the percentage of ground truth instances that are successfully recommended in the top K, and NDCG@K is a precision-based metric that accounts for the predicted position of the ground truth instance. We set K to 5, 10, and 20, and calculate all metrics for every 100 instances (1 positive plus 99 negatives). The final recommendation list for user is . denotes rank at the position in based on the predict score. is the interacted items set of user in the test data, and is the total number of users in our test data.


where, is an indicator function whose value is 1 when and otherwise. The large the value of the better the performance of the model.


where is a normalization constant which is the maximum possible value of . We also use the mean reciprocal rank (MRR). From the definition, we can see that a larger MRR value indicates a better performance of the model(He et al., 2018).


where refers to the rank position of the one positive instance for the user in 100 instances. In addition, we also add the area under the curve of ROC (AUC) as a metric.

4.3. Detailed Analysis of the Proposed Approach

Meta-path HR@5 NDCG@5 MRR AUC
MP1 0.5393 0.3817 0.3621 0.8645
MP2 0.4508 0.3136 0.3059 0.8487
MP3 0.5870 0.4215 0.3972 0.8796
MP4 0.4302 0.3016 0.2967 0.8314
MP1 & MP2 0.5669 0.3962 0.3749 0.8824
MP1 & MP3 0.6114 0.4295 0.4042 0.9091
MP1 & MP4 0.5936 0.4157 0.3891 0.8899
MP2 & MP3 0.6062 0.4273 0.3998 0.8927
MP2 & MP4 0.4541 0.3210 0.3151 0.8469
MP3 & MP4 0.6011 0.4233 0.3950 0.8871
MP1 & MP2 & MP3 0.6404 0.4543 0.4240 0.9115
MP1 & MP2 & MP4 0.6029 0.4279 0.4022 0.8955
MP1 & MP3 & MP4 0.6212 0.4389 0.4102 0.9021
MP2 & MP3 & MP4 0.6025 0.4318 0.4097 0.8932
MP1&MP2&MP3&MP4 0.6470 0.4635 0.4352 0.9232
Table 3. Different results from different combinations of meta-paths.

4.3.1. Evaluation of Different Meta-paths Combination

In this part of the experiments, we analyze how selection of meta-path combinations affect the performance of ACKRec , since a small number of high-quality meta-paths can lead to considerable performance (Shi et al., 2019). We consider both single meta-path and their combinations. Specifically, We select four types of meta-paths to characterize the relatedness between pair of users, including : , : , : , and : . We also select three types of meta-path to characterize the relatedness between pair of knowledge concepts, including , and . To analyze the impact of different combinations in a small number of meta-paths, we use all three meta-paths to model the knowledge concept and study the performance with single user related meta-path and their combinations. The experiments results are shown in Table 3. From Table 3, we can find that each single meta-path (i.e., , , , ) exhibits different performance, where the performance ranking is ¿ ¿ ¿ , and the combinations of single meta-paths follow the same tendency (e.g., the performance of ¿ &, && ¿ ). This illustrates that different meta-paths indicate different relations Further, although the growth of performance is not quite obvious, we can observe that the combination including more meta-paths will exhibit better performance, and the best performance is achieved by combining all four meta-paths.

4.3.2. Evaluation of Model Parameters.

In a matrix factorization-based method, the number of latent factors is an important parameter. Therefore, we present a comparison of the performance obtained with different numbers of latent factors. As shown in Figure 4, we select the metrics , , , and to show how the performance of ACKRec  changes with changes in the number of latent factors. We tune the number from 10 to 40 in increments of 10. We can see the increase in performance becomes flat as the number of latent factors increases. We find that using 30 latent factors can produce optimal performance.

After setting the number of latent factors as 30, we study the dimension settings of the entities representation. The experiment results are shown in Figure 5. We conducted the experiments using different numbers of dimensions (i.e., 20, 50, 100, 150, 200), and found that optimal performance was achieved with 100. Therefore, both the user and the knowledge concepts are represented as 100-dimensional vectors. The results also show that the representation of user and knowledge concepts in the heterogeneous information network is an important factor in improving the performance of the recommendation task.

We also examine how the number of GCN layers influence the performance of the model. As shown in Figure 6, we can clearly see that the performance of the proposed model changes with different numbers of layers (i.e., 1, 2, 3, and 4). It illustrates that the optimal number of GCN layers is around 3.

Figure 4. Performance of different number of latent factors .
Figure 5. Performance of different dimension of representations .
Figure 6. Performance of different number of layers.

4.4. Baseline Methods

HR@1 HR@5 HR@10 HR@20 NDCG@5 NDCG@10 NDCG@20 MRR AUC
0.1699 0.4633 0.6246 0.7966 0.3217 0.3736 0.4151 0.3156 0.8610
0.0660 0.3680 0.5899 0.7237 0.2231 0.2926 0.3441 0.2146 0.8595
0.2272 0.4057 0.5867 0.7644 0.3655 0.3968 0.3930 0.3067 0.8574
0.1410 0.5849 0.7489 0.7610 0.3760 0.4203 0.4279 0.3293 0.8532
0.078 0.4112 0.6624 0.8649 0.2392 0.3201 0.3793 0.2392 0.8863
0.1382 0.4437 0.6215 0.7475 0.2364 0.3172 0.3821 0.2117 0.8215
0.2476 0.5983 0.7598 0.8689 0.4194 0.4422 0.4602 0.3873 0.8909
0.2092 0.5388 0.7139 0.8665 0.3783 0.4348 0.4738 0.3634 0.8927
0.2457 0.5917 0.7542 0.8778 0.4216 0.4763 0.5079 0.4026 0.8974
0.2195 0.5917 0.7476 0.8553 0.4154 0.4659 0.4933 0.3891 0.8848
0.2588 0.6427 0.7911 0.8909 0.4591 0.5074 0.5329 0.4285 0.9035
0.2645 0.6470 0.8122 0.9255 0.4635 0.5170 0.5459 0.4352 0.9232
Table 4. Results obtained with different models using the MOOC dataset.

To evaluate the performance of the proposed approach, we consider various of baseline methods as follows:

  • (Rendle et al., 2012): It optimizes a pairwise ranking loss for the recommendation task in a Bayesian manner.

  • (He et al., 2017)

    : It applies a multi-layer perceptron (MLP) to a pair of user representations and corresponding knowledge concept representations to learn the probability of recommending a knowledge concept to the user.

  • (Rendle, 2012)

    : This is a principled approach that can easily incorporate any heuristic features. However, to ensure a fair comparison to other methods, we only use the representations of users and knowledge concepts.

  • (Kabbur et al., 2013): This is an item-to-item collaborative filtering algorithm that conducts recommendations based on the average embeddings of all behavior histories and the embeddings of the target knowledge concept.

  • (He et al., 2018): This is also an item-to-item collaborative filtering algorithm, but distinguishes the weights of different online learning behaviors using an attention mechanism method.

  • (Li et al., 2017): This is an improved GRU (Hidasi et al., 2015)

    model that estimates an attention coefficient for each behavior history based on the corresponding hidden vector output by GRU. And GRU is a gated recurrent unit model which receives a list of historical behavior as input.

  • (Dong et al., 2017): This is a meta-paths based representation method in heterogeneous information network by random walk and skip-gram. With the same meta-path of user and knowledge concept in ACKRec model, we use metapath2vec to generate the representations of users and knowledge concepts with the same dimension of ACKRec.

  • : A variant of ACKRec , which ignores the heterogeneity of entities in the heterogeneous information network. We regenerate the adjacency matrix to represent the relationship between different entities.

  • : This can be viewed as a variant of ACKRec without the attention mechanism method, and concatenates different meta-paths together.

  • : Content feature-based ACKRec . The input of this model is just the content feature of entities.

  • : Context feature-based ACKRec . Similar to the model , the context feature of entities in MOOCs are fed into this model.

  • : The proposed method, which combines heterogeneous context feature and content feature of entities to maximally depict the entities in the HIN.

For the MOOCs dataset, we split the user history behavior data into a training set and a test set. The training time for our method and the baseline methods are followed: BPR 2hrs, MLP 2hrs, FM 2.5hrs, FISM 4hrs, NAIS 3.5 hrs, NASR 4.5 hrs, metapath2vec 5.5 hrs, ACKRec (our method) 4.5 hrs. As shown in Table 4, we compare ACKRec 

with other machine learning methods. For the HIN based methods, We select meta-paths combination given the best performance in Section 3.1.2. As shown in Table

4, HIN based methods outperform all the other methods. It indicates the importance of the heterogeneity in MOOCs data Furthermore, the results show that , which integrates content feature and context feature, gives the best performance. Different from metapath2vec (Dong et al., 2017) generating representations based on random walk strategy and skip-gram method to generate representations of nodes, our model utilizes the graph convolutional networks to learn representations and adaptive attention mechanism to learn the different meta-paths weights and can better capture the heterogeneity within the data.

In addition, compared with , it obviously demonstrates that ACKRec utilizing the meta-paths based method on the HIN can more effectively capture the heterogeneous relationships. Compared with , we can see that adaptively fuse representation learned in different meta-path is better than simple concatenation, since different meta-path has different importance corrsponds to the task. Moreover, compared with and , it demonstrates that methods utilizing either content feature or context feature alone will lose information needed for the representation of entities; thus, they cannot comprehensively depict the feature of users and knowledge concepts in MOOCs. The final method , which uses adaptive weights and the meta-path based approach over the HIN, can integrate rich content feature of entities and structural relations between different types of entities in a more comprehensive and effective manner, and achieves the best performance.

4.5. Case Study

In this part, we conduct one case to demonstrate the effectiveness of our proposed method ACKRec . We randomly select a student:2481307 and obtain two top 10 recommend lists based on single meta-path and combined meta-paths , respectively. As shown in Figure 7, we can intuitively observe that the proposed model generates different results with different reality conditions. With more meaningful relationships among user and knowledge concept, the recommend list will contain more related knowledge concepts for the corresponding user. For example, from the click history list of , we know that the student is especially interested in the field of computer network, the result of meta-paths shows that the ACKRec can capture the student’s interests of knowledge concept, matching his real next click, asynchronous transfer mode, shown in blue. The other recommendations such as protocol data unit, switched telephone network, switch, and IPv4 are also highly related. In particularly, the recommended results will be significantly different for different meta-paths.

Figure 7. The case study of ACKRec bases different meta-paths. The blue labels denote the real next click, the green labels are the related knowledge concepts of the field of user student:2481307 interests in. The left set is the recommendation list which bases meta-paths ; the middle set is the leaning history of student:2481307; the right set is the recommendation list which bases a single meta-path .

5. Related Work

5.1. Graph Neural Network in Heterogeneous Information Network

Graphs play a crucial role in modern machine learning(Hamilton et al., 2017; Gori et al., 2005). Recently, graph neural networks(Zhou et al., 2018; Kipf and Welling, 2016; Wu et al., 2018; Ying et al., 2018; Huang, 2018; Chen et al., 2018; Gama et al., 2019) have become recurrent topics in machine learning, and both have broad applicability. However, in the real world, the graphs are usually heterogeneous. There are a few attempts heterogeneous information network setting. Wang et al. (Wang et al., 2019a) proposed DeepHGNN, an attentional heterogeneous graph neural network model to learn from the heterogeneous program behavior graph to guide the reidentification process. Wang et al. (Wang et al., 2019b) presented HAGNN, a Hierarchical Attentional Graph Neural Encoder and used it for program behavior graph analysis. Additionally, the GEM(Liu et al., 2018) model, a heterogeneous graph neural network approach for detecting malicious accounts at Alipay, has been presented. Unlike these approaches, our proposed model utilizes attentional graph convolutional networks for the representations of users and knowledge concepts in heterogeneous information networks.

5.2. Recommendation System in Heterogeneous Information Network

Some information recommendation models are based on heterogeneous information networks. (Pham et al., 2016) proposed Heaters, a graph-based model, to solve the general recommendation problem in heterogeneous networks. Yu et al.(Yu et al., 2014) proposed to use meta-paths based latent features to represent the connectivity between users and items along with different types of paths. Additionally, Follow precious work, Shi et al. (Shi et al., 2015; Hu et al., 2018; Shi et al., 2019) proposed to use meta-path concept to mode the heterogeneous information in HIN. Different from previous methods, this study focuses on capture the representations of different types of entities on the heterogeneous information network and fuses themselves content feature of different types of entities and the structure features of entities in MOOCs data together for the recommendation task of the knowledge concept.

6. Conclusion

In this work, we investigate the problem of the knowledge concept recommendation in MOOCs system, which is often overlooked by MOOCs recommendation system. We propose ACKRec , an end-to-end graph neural network based approach that naturally incorporates rich heterogeneous context side information into knowledge concept recommendation. To make use of rich context information in a more natural and intuitive way, we model the MOOCs as a heterogeneous information network. We design an attention-based graph convolutional network to learn the representation of different entities via propagate context information under the guide of meta-path in an attentional way. With the help of proposed attention-based graph convolutional network, the users’ potential interests can be effectively explored and aggregated. Comprehensive experimental study on real data collected from XuetangX is conducted. The proposed approaches outperform the strong baseline. The promising experimental results illustrate the effectiveness of the proposed method.

7. Acknowledgments

This work is supported by NSF under grants III-1526499, III-1763325, III-1909323, CNS-1930941, by Science and Technology Project of the Headquarters of State Grid co., LTD under Grant No. 5700-202055267A-0-0-0, and by NKPs under grants 2018YFC0830804. The authors also would like to thank XuetangX for data collection and supports.


  • J. Chen, T. Ma, and C. Xiao (2018) FastGCN: fast learning with graph convolutional networks via importance sampling. CoRR abs/1801.10247. External Links: Link, 1801.10247 Cited by: §5.1.
  • T. Chen and Y. Sun (2017) Task-guided and path-augmented heterogeneous network embedding for author identification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 295–304. Cited by: §3.2.
  • Y. Dong, N. V. Chawla, and A. Swami (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 135–144. Cited by: 7th item, §4.4.
  • F. Gama, A. G. Marques, G. Leus, and A. Ribeiro (2019) Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing 67 (4), pp. 1034–1049. External Links: Document, ISSN 1053-587X Cited by: §5.1.
  • M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734 vol. 2. External Links: Document, ISSN 2161-4393 Cited by: §3.2, §5.1.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Representation learning on graphs: methods and applications. CoRR abs/1709.05584. External Links: Link, 1709.05584 Cited by: §5.1.
  • X. He, Z. He, J. Song, Z. Liu, Y. Jiang, and T. Chua (2018) NAIS: neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering 30, pp. 2354–2366. Cited by: 5th item, §4.1, §4.2.
  • X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. S. Chua (2017) Neural collaborative filtering. Cited by: 2nd item.
  • B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk (2015)

    Session-based recommendations with recurrent neural networks

    arXiv preprint arXiv:1511.06939. Cited by: 6th item.
  • B. Hu, C. Shi, W. X. Zhao, and P. S. Yu (2018)

    Leveraging meta-path based context for top-n recommendation with a neural co-attention model

    In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1531–1540. Cited by: §5.2.
  • J. Huang (2018) Adaptive graphconvolutional neural networks. Cited by: §5.1.
  • M. Jamali and M. Ester (2010) A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems, pp. 135–142. Cited by: §1.
  • K. Järvelin and J. Kekäläinen (2000) IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 41–48. Cited by: §4.2.
  • X. Jing and J. Tang (2017) Guess you like: course recommendation in moocs. In Proceedings of the International Conference on Web Intelligence, WI ’17, New York, NY, USA, pp. 783–789. External Links: ISBN 978-1-4503-4951-2, Link, Document Cited by: §1.
  • S. Kabbur, X. Ning, and G. Karypis (2013) FISM: factored item similarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, New York, NY, USA, pp. 659–667. External Links: ISBN 978-1-4503-2174-7, Link, Document Cited by: 4th item.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §5.1.
  • J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma (2017) Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017, pp. 1419–1428. External Links: Link, Document Cited by: 6th item.
  • Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song (2018) Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM ’18, New York, NY, USA, pp. 2077–2085. External Links: ISBN 978-1-4503-6014-2, Link, Document Cited by: §5.1.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §3.1.1.
  • T. N. Pham, X. Li, G. Cong, and Z. Zhang (2016) A general recommendation model for heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 28 (12), pp. 3140–3153. External Links: Document, ISSN 1041-4347 Cited by: §5.2.
  • J. Qiu, J. Tang, T. X. Liu, J. Gong, C. Zhang, Q. Zhang, and Y. Xue (2016) Modeling and predicting learning behavior in moocs. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16, New York, NY, USA, pp. 93–102. External Links: ISBN 978-1-4503-3716-8, Link, Document Cited by: §1, §1.
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2012) BPR: Bayesian Personalized Ranking from Implicit Feedback. arXiv e-prints. External Links: 1205.2618 Cited by: 1st item.
  • S. Rendle (2012) Factorization machines with libfm. ACM TIST 3 (3), pp. 57:1–57:22. External Links: Link, Document Cited by: 3rd item.
  • C. Shi, B. Hu, W. X. Zhao, and P. S. Yu (2019) Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering 31 (2), pp. 357–370. External Links: Document, ISSN 1041-4347 Cited by: §1, §3.2, §4.3.1, §5.2.
  • C. Shi, Z. Zhang, P. Luo, P. S. Yu, Y. Yue, and B. Wu (2015) Semantic path based personalized recommendation on weighted heterogeneous information networks. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, New York, NY, USA, pp. 453–462. External Links: ISBN 978-1-4503-3794-6, Link, Document Cited by: §5.2.
  • Y. Sun, N. J. Yuan, X. Xie, K. McDonald, and R. Zhang (2017) Collaborative intent prediction with real-time contextual data. ACM Transactions on Information Systems (TOIS) 35 (4), pp. 30. Cited by: §1, §1.
  • H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, and Q. Liu (2018) Shine: signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 592–600. Cited by: §1.
  • S. Wang, Z. Chen, D. Li, Z. Li, L. Tang, J. Ni, J. Rhee, H. Chen, and P. S. Yu (2019a) Attentional heterogeneous graph neural network: application to program reidentification. In Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 693–701. Cited by: §3.2, §5.1.
  • S. Wang, Z. Chen, X. Yu, D. Li, J. Ni, L. Tang, J. Gui, Z. Li, H. Chen, and P. S. Yu (2019b) Heterogeneous graph matching networks for unknown malware detection. In

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    pp. 3762–3770. Cited by: §5.1.
  • S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2018) Session-based recommendation with graph neural networks. CoRR abs/1811.00855. External Links: Link, 1811.00855 Cited by: §5.1.
  • R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, New York, NY, USA, pp. 974–983. External Links: ISBN 978-1-4503-5552-0, Link, Document Cited by: §5.1.
  • X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han (2014) Personalized entity recommendation: a heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM ’14, New York, NY, USA, pp. 283–292. External Links: ISBN 978-1-4503-2351-2, Link, Document Cited by: §5.2.
  • F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma (2016) Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362. Cited by: §1.
  • H. Zhang, M. Sun, X. Wang, Z. Song, J. Tang, and J. Sun (2017) Smart jump: automated navigation suggestion for videos in moocs. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, April 3-7, 2017, pp. 331–339. External Links: Link, Document Cited by: §1, §1.
  • J. Zhang, B. Hao, B. Chen, C. Li, H. Chen, and J. Sun (2019)

    Hierarchical reinforcement learning for course recommendation in moocs

    In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 435–442. Cited by: §1.
  • J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: A review of methods and applications. CoRR abs/1812.08434. External Links: Link, 1812.08434 Cited by: §5.1.