1 Introduction
With the increasing amount of information available online, there is a rising need for structuring how one should process that information and learn knowledge efficiently in a reasonable order. As a result, recent work has tried to learn prerequisite relations among concepts, or which concept is needed to learn another concept within a concept graph [22, 12, 2]. Figure 1 shows an illustration of prerequisite chains as a directed graph. In such a graph, each node is a concept, and the direction of each edge indicates the prerequisite relation. Consider two concepts and , we define as is a prerequisite concept of . For example, the concept Variational Autoencoders is a prerequisite concept of the concept Variational Graph Autoencoders. If someone wants to learn about the concept Variational Graph Autoencoders, the prerequisite concept Variation Autoencoder should appear in the prerequisite concept graph in order to create a proper study plan.
Recent work has attempted to extract such prerequisite relationships from various types of materials including Wikipedia articles, university course dependencies or MOOCs (Massive Open Online Courses) [25, 12, 22]. However, these materials either need additional steps for preprocessing and cleaning, or contain too many noisy freetexts, bringing more challenges to prerequisite relation learning or extracting. Recently, li2018should presented a collection of university lecture slide files mainly in NLP lectures with related prerequisite concept annotations. We expanded this dataset as we believe these lecture slides offer a concise yet comprehensive description of advanced topics.
Deep models such as word embeddings [24] and more recently contextualized word embeddings [9] have achieved great success in the NLP tasks as demonstrate a stronger ability to represent the semantics of the words than other traditional models. However, recent prerequisite learning approaches fail to make use of distributional semantics and advances in deep learning representations [18, 25]. In this paper, we investigate deep node embeddings within a graph structure to better capture the semantics of concepts and resources, in order to learn accurate the prerequisite relations.
In addition to learning node representations, there has been growing research in geometric deep learning [5]
and graph neural networks
[13], which apply the representational power of neural networks to graphstructured data. Notably, kipf2016semi proposed Graph Convolutional Networks (GCNs) to perform deep learning on graphs, yielding competitive results in semisupervised learning settings. TextGCN was proposed by
[30] to model a corpus as a heterogeneous graph in order to jointly learn word and document embeddings for text classification. We build upon these ideas for constructing a resourceconcept graph^{1}^{1}1We use the term resource instead of document for generalization.. Additionally, most of the mentioned methods require a subset of labels for training, a setting which is often infeasible in the real world. Limited research has been investigated learning prerequisite relations without using human annotated relations during training [2]. In practice, it is very challenging to obtain annotated conceptconcept relations, as the complexity for annotating is given concepts. To tackle this issue, we propose a method to learn prerequisite chains without any annotated conceptconcept relations, which is more applicable in the real word.Our contributions are twofold: 1) we expand upon previous annotations to increase coverage for prerequisite chain learning in five categories, including AI (artificial intelligence), ML (machine learning), NLP, DL (deep learning) and IR (information retrieval). We also expand a previous corpus of lecture files to include an additional 5000 more lecture slides, totaling 1,717 files. More importantly, we add additional concepts, totaling 322 concepts, as well as the corresponding annotations of each concept pair, which totals 103,362 relations. 2) we present a novel graph neural model for learning prerequisite relations in an unsupervised way using deep representations as input. We model all concepts and resources in the corpus as nodes in a single heterogeneous graph and define a propagation rule to consider multiple edge types by eliminating conceptconcept relations during training, making it possible to perform unsupervised learning. Our model leads to improved performance over a number of baseline models. Notably, it is the first graphbased model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. Resources, annotations and code are publicly available online
^{2}^{2}2https://github.com/YaleLILY/LectureBank/tree/master/LectureBank2.2 Related Work
2.1 Deep Models for Graphstructured Data
There has been much research focused on graphstructured data such as social networks and citation networks [28, 1, 8], and many deep models have achieved satisfying results. Deepwalk [26] was a breakthrough model which learns node representations using random walks. Node2vec [14]
was an improved scalable framework, achieving promising results on multilabel classification and link prediction. Besides, there has been some work like graph convolution neural networks (GCNs), which target on deepbased propagation rules within graphs. A recent work applied GCN for text classification
[30] by constructing a single text graph for a corpus based on word cooccurrence and document word relations. The experimental results showed that the proposed model achieved stateoftheart methods on many benchmark datasets. We are inspired by this work in that we also attempt to construct a single graph for a corpus, however, we have different types of nodes and edges.2.2 Prerequisite Chain Learning
Learning prerequisite relations between concepts has attracted much recent work in machine learning and NLP field. Existing research focuses on machine learning methods (i.e., classifiers) to measure the prerequisite relations among concepts
[21, 23, 22]. Some research integrates feature engineering to represent a concept, inputting these features to a classic classifier to predict relationship of a given concept pair [22, 21]. The resources to learn those concept features include university course descriptions and materials as well as online educational data [23, 22]. Recently, li2018should introduced a dataset containing 1,352 English lecture files collected from universitylevel lectures as well as 208 manuallylabeled prerequisite relation topics, initially introduced in [10]. To avoid feature engineering, they applied graphbased methods including GAE and VGAE [17] which treat each concept as a node thus building a concept graph. They pretrained a Doc2vec model [19]to infer each concept as a dense vector, and then trained the concept graph in a semisupervised way. Finally, the model was able to recover unseen edges of a concept graph. Different from their work, we wish to do the prerequisite chain learning in an unsupervised manner, while in training, no concept relations will be provided to the model.
Domain  #courses  #files  #tokens  #pages  #tokens/page 

NLP  45  953  1,521,505  37,213  40.886 
ML  15  312  722,438  12,556  57.537 
DL  7  259  450,879  7,420  60.765 
AI  5  98  139,778  3,732  37.454 
IR  5  95  205,359  4,107  50.002 
Overall  77  1,717  3,039,959  65,028  46.748 
3 Dataset
3.1 Resources
We manually collected English lecture slides mainly on NLPrelated courses in recent years from known universities. We treated them as individual slide file in PDF or PowerPoint Presentations format. Our new collection has 529 additional files from 17 courses, which we combined with the data provided by [20]. We ended up with a total number of 77 courses with 1,717 English lecture slide files, covering five domains. We show the final statistics in Table 1. For our experiments, we converted those files into TXT format which allowed us to load the free texts directly.
3.2 Concepts
We manually expanded the size of concept list proposed by [20] from 208 to 322. We included concepts which were not found in their version like restricted boltzmann machine and neural parsing. Also, we revisited their topic list and corrected a small number of the topics. For example, we combined certain topics (e.g. BLUE and ROUGE) into a single topic (machine translation evaluation). We asked two NLP PhD students to reevaluate existing annotations from the old corpus and to provide labels for each added concept pair in the new corpus. A Cohen kappa score [7] of 0.6283 achieved between our annotators which can be considered as a substantial agreement. We then took the union of the annotations, where if at least one judge stated that a given concept pair had as a prerequisite of , then we define it a positive relation. We believe that the union of annotations makes more sense for our downstream application, where we want users to be able to mark which concepts they already know and displaying all potential concepts is essential. We have 1,551 positive relations on the 322 concept nodes.
4 Method
4.1 Problem Definition
In our corpus, every concept is a single word or a phrase; every resource is free text extracted from the lecture files. We then wish to determine for a given concept pair , whether is a prerequisite concept of . We define the conceptresource graph as , where denotes node features or representations and denotes the adjacency matrix. In our case, the adjacency matrix is the set of relations between each node pair, or the edges between the nodes. In Figure 2, we build a single, large graph consisting of concepts (oval nodes) and resources (rectangular nodes) as nodes, and the corresponding relations as edges. So there are three types of edges in : the edge between two resource nodes (blue line), the edge between a concept node and a resource node (black solid line), and the edge between two concept nodes (black dashed line). Our goal is to learn the relations between concepts only (), so prerequisite chain learning can be formulated as a link prediction problem. Our unsupervised setting is to exclude any direct concept relations () during training, and we wish to predict these edges through message passing via the resource nodes indirectly.
4.2 Preliminaries
Graph Convolutional Networks (GCN) [17] is a semisupervised learning approach for node classification on graphs. It aims to learn the node representation in the hidden layers, given the initial node representation and the adjacency matrix . The model incorporates local graph neighborhoods to represent a current node. In a simple GCN model, a layerwise propagation rule can be defined as the following:
(1) 
where is the current layer number,
is a nonlinear activation function, and
is a parameter matrix that can be learned during training. We eliminate thefor the last layer output. For the task of node classification, the loss function is crossentropy loss. Typically, a twolayer GCN (by plugging Equation
1 in) is defined as:(2) 
where is the new adjacency matrix at the second graph layer.
Relational Graph Convolutional Networks (RGCNs) [27] expands the types of graph nodes and edges based on the GCN model, allowing operations on largescale relational data. In this model, an edge between a node pair and is denoted as , where
is considered a relation type, while in GCN, there is only one type. Similarly, to obtain the hidden representation of the node
, we consider the local neighbors and itself; when multiple types of edges exist, different sets of weight will be considered. So the layerwise propagation rule is defined as:(3) 
where is the set of relations or edge types in the graph, denotes the neighbors of node with relation , is the weight matrix at layer for nodes in , is the shared weight matrix at layer , is the number of weight matrices in each layer.
Variational Graph AutoEncoders (VGAE) [16] is a framework for unsupervised learning on graphstructured data based on variational autoencoders [15]. It takes the adjacency matrix and node features as input and tries to recover the graph adjacency matrix through the hidden layer embeddings . Specifically, the nonprobabilistic graph autoencoder (GAE) model calculates embeddings via a twolayer GCN encoder: , which is given by Equation 2.
Then, in the variational graph autoencoder, the goal is to sample the latent parameters
from a normal distribution:
(4) 
where is the matrix of mean vectors, and . The training loss then is given as the KLdivergence between the normal distribution and the sampled parameters :
(5) 
In the inference stage, the reconstructed adjacency matrix is the inner product of the latent parameters : .
4.3 Proposed Model
To take multiple relations into consideration and make it possible to do unsupervised learning for concept relations, we propose our RVGAE model. Our model builds upon RGCN and VGAE by taking the advantages of both: RGCN is a supervised model that deals with multiple relations; VGAE is an unsupervised graph neural network. We then make it possible to directly to train on a heterogeneous graph in an unsupervised way for link prediction, in order to learn the prerequisite relations for the concept pairs.
Our model first applies the RGCN in Equation 3 as the encoder to obtain the latent parameters , given the initial node features and adjacency matrix : . In terms of the variational verison, as opposed to the standard VGAEs, we parameterize by the RGCN model: , and .
To predict the link between a concept pair , we followed the DistMult [29] method: we take the last layer output node features , and define the following score function to recover the adjacency matrix by learning a trainable weight matrix :
(6) 
The loss consists of the crossentropy reconstruction loss of adjacency matrix () and the loss from the latent parameters defined in Equation 5:
(7) 
We compare two variations of our RGAE model. Unsupervised: only the conceptresource edges and resourceresource edges are provided during training. This is an unsupervised model because no conceptconcept edges are used. Semisupervised: the model has access to conceptresource edges and resourceresource edges , as well as a percentage of the available conceptconcept edges , described later.
4.4 Node Features
Sparse Embeddings We used TFIDF (term frequency–inverse document frequency) to get sparse embeddings for all nodes. We restricted the global vocabulary to be the 322 concept terms only, which means that the dimension of the node features is 322, as we aim to model keywords.
Dense Embeddings As the concepts in our corpus often consist of phrases such as dynamic programming, we made use of Phrase2vec [3]. Phrase2vec (P2V) is a generalization of skipgram models [24]
which learns ngram embeddings during training, and here we aim to infer the embeddings of the concepts in our corpus. We trained the P2V model using only our corpus by treating each slide file as a short document as a sequence of tokens. For each resource node, we take an elementwise average of the P2V embeddings of each single token and phrases that resource covered. Similarly, for each concept node, we took elementwise average of the embeddings of each individual token and the concept phrase. In addition, we then utilized the BERT model
[9] as another type of dense embedding. We finetuned the masked language modeling of BERT using our corpus.4.5 Adjacency Matrix
To construct the adjacency matrix , for each node pair
, we applied cosine similarity based on enriched TFIDF features
^{3}^{3}3This means that the TFIDF features are calculated on an extended vocabulary that includes all possible tokens appeared in the corpus. as the value . Previous work has applied cosine similarity for vector space models [11, 31, 4], so we believe it is a suitable method in our case. This way we were able to generate conceptresource edge values () and resourceresource edge values (). Note that for conceptconcept edge values : 1 if is a prerequisite of , 0 otherwise. These values are not computed in the unsupervised setting.5 Evaluation
We compare our proposed model with two groups of baseline models. We report accuracy, F1 scores, the macro averaged Mean Average Precision (MAP) and Area under the ROC Curve (AUC) scores in Table 2, as done by previous research [6, 25, 20]. We split the positive relations into 9:1 (train/test), and randomly select negative relations as negative training samples, and then we run over five random seeds and report the average scores, following the same setting with kipf2016variational and li2018should.
Method  Acc  F1  MAP  AUC 

Concept embedding + classifier  
P2V (lb1)  0.5927  0.5650  0.5623  0.5929 
P2V (lb2)  0.6369  0.5961  0.6282  0.6370 
BERT (lb1)  0.6540  0.6099  0.6475  0.6540 
BERT (lb2)  0.6558  0.6032  0.6553  0.6558 
BERT (original)  0.7088  0.6963  0.6779  0.7090 
Graphbased methods  
DeepWalk [26]  0.6292  0.5860  0.6270  0.6281 
Node2vec [14]  0.6209  0.6181  0.5757  0.6259 
VGAE [20]  0.6055  0.6030  0.5628  0.6055 
GAE [20]  0.6307  0.6275  0.5797  0.6307 
RGCN [27]  0.5387  0.4784  0.5203  0.5387 
RVGAE (Our proposed model)  
US+BERT (finetuned)  0.5704  0.5704  0.5579  0.5955 
US+BERT (original)  0.5669  0.5668  0.5658  0.6164 
US+TFIDF  0.6495  0.6458  0.7069  0.5507 
US+P2V  0.7694*  0.7638*  0.8919*  0.9126* 
SS+BERT (finetuned)  0.6942  0.6942  0.6613  0.7412 
SS+BERT (original)  0.6839  0.6839  0.6556  0.7372 
SS+TFIDF  0.7252  0.7082  0.8181  0.7625 
SS+P2V  0.8065  0.8010  0.9380  0.9454 
Concept embedding + classifier
The first group is the concept embedding with traditional classifiers including Support Vector Machines, Logistic Regression, Naïve Bayes and Random Forest. For a given concept pair, we concatenate the dense embeddings for both concepts as input to train the classifiers, and then we report the best result. We compare Phrase2Vec (P2V) and BERT embeddings. We have two corpora: one is the old version (
lb1) provided by [20], another one is our version (lb2). For the BERT model, we applied both the original version from Google (original) ^{4}^{4}4https://github.com/googleresearch/bert, and the finetuned language models version on our corpora (lb1, lb2) from xiao2018bertservice, and perform inference on the concepts. The P2V embeddings have 150 dimension, and the BERT embeddings have 768 dimensions. We show improvements on the BERT and P2V baselines by using our additional data via the underscored values. This indicates that the concept relations can be more accurately predicted when enriching the training corpus to train better embeddings. In our following experiments, if not specified, we applied lb2 as the training corpus.Graphbased methods We apply the classic graphbased embedding methods DeepWalk [26] and Node2vec [14], by considering the concept nodes only. Then the positive concept relations in training set are the known sequences, allowing to train both models to infer node features. Similarly, in the testing phrase, we concatenate the node embeddings given a concept pair, and utilize the mentioned classifiers to predict the relation and report the performance of the best one. We then include VGAE and GAE methods for prerequisite chain learning following li2018should. Both methods construct the concept graph in a semisupervised way. We apply P2V embeddings to replicate their methods, though it is possible to try additional embeddings, this is not our main focus. Finally, we compare with the original RGCN model for link prediction proposed by schlichtkrull2018modeling and apply the same embeddings with the VGAE and GAE methods. Other semisupervised graph methods such as GCNs require node labels and thus are not applicable to our setting. We can see that the GAE method achieves the best results among the baselines. Compare with the first group, BERT (original) still has a better performance due to its ability to represent phrases.
RVGAE Our model can be trained in both unsupervised (US+*) and semisupervised (SS+*) way. We also utilize various types of embeddings include P2V, TFIDF, BERT (finetuned) and BERT (original). The best performed model in the unsupervised setting is with P2V embeddings, marked with asterisks, and it is better than all the baseline supervised methods with a large margin. In addition, our semisupervised setting models boost the overall performance. We show that the SS+P2V model performs the best among all the mentioned methods, with a significant improvement of 9.77% in accuracy and 10.47% in F1 score compared with the best baseline model BERT (original). This indicates that RVGAE model does better on link prediction by bringing extra resource nodes into the graph, while the concept relation can be improved and enhanced indirectly via the connected resource nodes. We also observe that with BERT embeddings, the performance lags behind the other embedding methods for our approach. A reason might be that the dimensionality of the BERT embeddings is relatively large compared to P2V and may cause overfitting, especially when the edges are sparse; and it might not suitable to represent resources as they are a list of keywords when finetuning the language modeling. The P2V embeddings outperform TFIDF for both unsupervised and semisupervised models. This shows that compared with sparse embeddings, dense embeddings can better preserve the semantic features when integrated within the RGAE model, thus boosting the performance. Besides, as a variation of RGCN and GAE, our model surpasses them by taking the advantages of both, comparing with RGCN and GAE results reported in the second group.
6 Analysis
Concept  Gold Prerequisite Concepts  Model Output Concept 

dependency parsing  syntax  syntax 
classic parsing methods  classic parsing methods  
linguistics basics  linguistics basics  
parsing  parsing  
nlp introduction  nlp introduction  
chomsky hierarchy  chomsky hierarchy  
linear algebra  linear algebra  
conditional probability 

tree adjoining grammar  classic parsing methods  classic parsing methods 
linguistics basics  linguistics basics  
parsing  parsing  
nlp introduction  nlp introduction  
context free grammar  context free grammar  
probabilistic context free grammars  probabilistic context free grammars  
chomsky hierarchy  chomsky hierarchy  
context sensitive grammar  context sensitive grammar 
We then take the recovered concept relations from our best performed model RVGAE (SS+P2V) in Table 2), and compare them with the gold annotated relations. Note that here we only look at concept nodes. The average degree for gold graph concept nodes is 9.79, while our recovered one has an average degree of 6.10, and this means our model predicts fewer edges. We also check the most popular concepts that have the most degrees. We select dependency parsing and tree adjoining grammar as examples. In Table 3, we show a comparison of the prerequisites from the annotations and our model’s output. The upper group illustrates results for dependency parsing, where one can notice that the predicted concepts all appear in the gold results, missing only a single concept. This shows that even though our model predicts less number of relations, it still predicts correct relations. The lower group shows the comparison for the concept tree adjoining grammar, our model gives precise prerequisite concepts among all eight concepts from the gold set. When a concept has a certain amount number of prerequisite concepts, our model is able to provide a comprehensive concept set with a good quality. In the real word, especially in a learner’s scenario, he or she wants to learn the new concept with enough prerequisite knowledge, which our model tends to provide.
7 Conclusion and Future Work
In this paper we introduced an expanded dataset for prerequisite chain learning with additional an 5,000 lecture slides, totaling 1,717 files. We also provided prerequisite relation annotations for each concept pair among 322 concepts. Additionally, we proposed an unsupervised learning method which makes use of advances in graphbased deep learning algorithms. Our method avoids any feature engineering to learn concept representations. Experimental results demonstrate that our model performs well in an unsupervised setting and is able to further benefit when labeled data is available. In future work, we would like to perform a more comprehensive model comparison and evaluation by bringing other possible variations of graphbased models to learn a concept graph. Another interesting direction is to apply multitask learning to the proposed model by adding a node classification task if there are node labels available. A part of the future work would also include developing educational applications for learners to find out their study path for certain concepts.
References

[1]
(2015)
Graph based anomaly detection and description: a survey
. Data mining and knowledge discovery 29 (3), pp. 626–688. Cited by: §2.1.  [2] (2018) Mining MOOC Lecture Transcripts to Construct Concept Dependency Graphs.. International Educational Data Mining Society. Cited by: §1, §1.
 [3] (2018) Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31  November 4, 2018, pp. 3632–3642. Cited by: §4.4.
 [4] (2016) Automatic labelling of topics with neural embeddings. arXiv preprint arXiv:1612.05340. Cited by: §4.5.
 [5] (2017) Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Process. Mag. 34 (4), pp. 18–42. Cited by: §1.
 [6] (2016) Datadriven Automated Induction of Prerequisite Structure Graphs.. International Educational Data Mining Society. Cited by: §5.
 [7] (1960) A Coefficient of Agreement for Nominal Scales. Educational and psychological measurement 20 (1), pp. 37–46. Cited by: §3.2.
 [8] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.1.
 [9] (2018) Bert: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1, §4.4.
 [10] (2018) TutorialBank: A ManuallyCollected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 611–620. Cited by: §2.2.

[11]
(2018)
W2VLDA: almost unsupervised system for aspect based sentiment analysis
. Expert Systems with Applications 91, pp. 127–137. Cited by: §4.5.  [12] (2016) Modeling Concept Dependencies in a Scientific Corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 866–875. Cited by: §1, §1.
 [13] (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §1.
 [14] (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Cited by: §2.1, Table 2, §5.
 [15] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §4.2.
 [16] (2016) Variational Graph AutoEncoders. Bayesian Deep Learning Workshop (NIPS 2016). Cited by: §4.2.
 [17] (2017) SemiSupervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, Cited by: §2.2, §4.2.
 [18] (2017) SemiSupervised Techniques for Mining Learning Outcomes and Prerequisites. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–915. Cited by: §1.
 [19] (2014) Distributed representations of sentences and documents. In International conference on machine learning, pp. 1188–1196. Cited by: §2.2.
 [20] (2019) What Should I Learn First: Introducing Lecturebank for NLP Education and Prerequisite Chain Learning. In 33rd AAAI Conference on Artificial Intelligence (AAAI19). Cited by: §3.1, §3.2, Table 2, §5, §5.

[21]
(2018)
Investigating active learning for concept prerequisite learning
. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §2.2.  [22] (2017) Recovering Concept Prerequisite Relations from University Course Dependencies. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §1, §1, §2.2.
 [23] (2016) Learning concept graphs from online educational data. Journal of Artificial Intelligence Research 55, pp. 1059–1090. Cited by: §2.2.
 [24] (2013) Distributed Representations of Words and Phrases and Their Compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §1, §4.4.
 [25] (2017) Prerequisite Relation Learning for Concepts in MOOCs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1447–1456. Cited by: §1, §1, §5.
 [26] (2014) DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA, pp. 701–710. External Links: ISBN 9781450329569, Link, Document Cited by: §2.1, Table 2, §5.
 [27] (2018) Modeling Relational Data with Graph Convolutional Networks. In European Semantic Web Conference, pp. 593–607. Cited by: §4.2, Table 2.
 [28] (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §2.1.
 [29] (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §4.3.
 [30] (2018) Graph Convolutional Networks for Text Classification. In 33rd AAAI Conference on Artificial Intelligence (AAAI19). Cited by: §1, §2.1.
 [31] (2018) Learning transferable features for opendomain question answering. In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §4.5.
Comments
There are no comments yet.