Inferring Concept Prerequisite Relations from Online Educational Resources

11/30/2018 ∙ by Sudeshna Roy, et al. ∙ National University of Singapore I-MACX Studios 0

The Internet has rich and rapidly increasing sources of high quality educational content. Inferring prerequisite relations between educational concepts is required for modern large-scale online educational technology applications such as personalized recommendations and automatic curriculum creation. We present PREREQ, a new supervised learning method for inferring concept prerequisite relations. PREREQ is designed using latent representations of concepts obtained from the Pairwise Latent Dirichlet Allocation model, and a neural network based on the Siamese network architecture. PREREQ can learn unknown concept prerequisites from course prerequisites and labeled concept prerequisite data. It outperforms state-of-the-art approaches on benchmark datasets and can effectively learn from very less training data. PREREQ can also use unlabeled video playlists, a steadily growing source of training data, to learn concept prerequisites, thus obviating the need for manual annotation of course prerequisites.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Inferring Concept Prerequisite Relations from Online Educational Resources. AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19)

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


A concept is generally called a prerequisite to another concept if the knowledge of is necessary to understand . Such dependencies are natural in cognitive processes when we learn, organize, and apply knowledge [laurence1999concepts]

. Prerequisite relations at a different level – between courses – are commonly found in university curricula. Course-level prerequisites have been manually created by experts over decades and often form a guide to prerequisites between the more granular concepts within the courses. For instance, the course Linear Algebra is usually a prerequisite to the course Machine Learning. Several concepts in a course on Linear Algebra are prerequisites to concepts in a course on Machine Learning, e.g. Eigen Analysis is a prerequisite to Principal Components Analysis.

Figure 1: PREREQ learns unknown prerequisite edges in the concept space, using (1) known concept prerequisite edges and (2) either course prerequisites or video playlists.

While textual information about courses has been increasing steadily on the Internet over the years, recently there has been a tremendous growth in online educational data through Massive Open Online Courses (MOOCs) as well as freely accessible videos and blogs from experts. This, in turn, has spurred the development of new applications for personalized online education such as automatic reading list generation [jardine2014automatically], automatic curriculum planning [liu2016learning], and automated evaluation of curricula [rouly2015we]. Concept prerequisite relations play a fundamental role in all these applications.

The value of concept prerequisite maps has been recognized and studied in educational psychology [novak1990concept] and these relations were manually obtained by domain experts. Such a manual process is not scalable in modern online applications that aim to (a) serve students from varying educational backgrounds and (b) generalize to any domain. Hence there is a need to develop methods that can automatically infer pair-wise concept prerequisite relations.

Inference of prerequisite relations has been studied in other contexts, e.g. from Wikipedia [Talukdar:2012], in databases [yosef2011aida] and from text books [liang2018investigating]. These tools can be leveraged but cannot be directly used to detect prerequisite relations from online resources like MOOCs, due to the complexity and scale of courses and educational concepts involved [acl2017prerequisite]. There has been growing interest in designing algorithms specifically to infer educational concept prerequisites [liu2016learning, eaai17, acl2017prerequisite].

In this paper, we develop PREREQ, a new supervised learning approach to inferring concept prerequisites. Similar to the problem setting assumed in previous studies, we assume that prerequisites between courses are known and different courses share an underlying concept space. In addition, we assume that some concept prerequisites are also available to train a supervised model. Manual annotation of course prerequisites, although available, may be hard to scale. We show that PREREQ can also effectively learn from unlabeled video playlists, available through MOOCs.

Figure 1 shows a schematic of the underlying concept space shared across different courses from different universities and over different video playlists. We use known course prerequisites or temporal ordering of videos along with labeled training data of concept prerequisites to predict unknown concept prerequisites. Our method uses latent representations of concepts obtained through a Pairwise- Link Latent Dirichlet Allocation (LDA) model [nallapati2008joint], a model for citations in document corpora. These representations are then used to train a neural network based on the Siamese architecture [siamese]

to obtain a binary classifier that can predict, for a given ordered pair of concepts, whether or not a prerequisite relation exists between them.

To summarize, our contributions are:

  • We develop PREREQ, a method to predict unknown concept prerequisites from (1) labeled concept prerequisites and (2) course prerequisite data or video playlists. PREREQ uses the pairwise-link LDA model to obtain vector representations of concepts and a Siamese network to predict unknown concept prerequisites.

  • Our extensive experiments demonstrate the superiority of PREREQ over state-of-the-art methods for inferring prerequisite relations between educational concepts. We also empirically demonstrate that PREREQ can effectively learn from very less training data.

Problem Statement

Let be the concept space, the set of all concepts of interest, that is assumed to be fixed and known in advance. A concept may be a single word (e.g. “vector”) or a phrase (e.g. “machine learning”). Let be a directed acyclic graph, called concept graph, whose nodes represent concepts and edges represent prerequisite dependency, i.e., contains the directed edge if and only if concept is a prerequisite of concept .

We infer the edges of the concept graph from known prerequisite relations between documents, where documents are text sources containing the concepts of interest. Examples of such documents include course web pages with known course prerequisites. We assume as input a set of text documents and a document graph, a directed acyclic graph , whose nodes represent documents and edges represent document prerequisite dependency, i.e., contains the directed edge if and only if document is a prerequisite of document . Each document is represented by the concepts contained in it, i.e. where is the set of -grams in document , for .

For a given set of concepts , we want to infer concept prerequisites from the known document graph and the set of text documents . In the supervised setting, some concept prerequisites, denoted by the training set, , are known, and the remaining, , are unknown, where and . The problem can be stated as, for a given set of concepts , documents , document graph and known concept prerequisites , predict the unknown concept prerequisites .

Our Approach

Figure 2: PREREQ Algorithm: latent representations of concepts are obtained using the Pairwise-link LDA model; known concept prerequisite relations are used to train a Siamese network to identify prerequisites.

Latent topic models for text with citation links between documents have been studied extensively and we study the applicability of one such well-known model, the pairwise-link LDA model [nallapati2008joint], to address the problem of concept prerequisite inference. Our experiments reveal that the latent representations obtained from this model in itself does not have sufficient discriminatory signal. In particular they are a good measure of concept relatedness but not of prerequisite directionality. However, learning unsupervised latent representations through a generative probabilistic model helps in disentangling causal factors by discovering underlying causes such as organization of explanatory factors, natural clustering, sparsity and simplicity [bengio2013representation]. So, we use these latent topic representations of concepts to train a neural network based on the Siamese architecture [siamese] that can identify prerequisite relations.

A schematic view of our method, called PREREQ, is shown in Figure 2. The input documents and the document graph are used to learn the pairwise-link LDA model. Latent representations of the concepts are obtained from this model and used along with known prerequisites to train a Siamese Network. The following sections describe the details of PREREQ.

Concept Representations from Pairwise-link LDA

Figure 3: Graphical representation of the Pairwise Link-LDA model. Topicality of the document is explicitly made dependent on prerequisite documents using the variable , the observed prerequisite relation between the documents.

The Pairwise-Link-LDA model [nallapati2008joint]combines the ideas of Latent Dirichlet Allocation (LDA) [blei2003latent] and Mixed Membership Block Stochastic Models [airoldi2008mixed] for jointly modeling text and links between documents in the topic modeling framework. Document graph and the set of text documents are input to this model. A mixed membership model is a natural choice for modeling the documents, each of which includes many key concepts from different underlying topics. Figure 3 shows the graphical representation of the generative model. Explicit modeling of the directional links (i.e., prerequisite edges) between an ordered pair of documents () captures the topicality of documents and the word distribution over topics better, in terms of capturing the prerequisite relationship between the words itself.

Generative Process and Inference.

Figure 3 shows that each document generation process is same as in LDA. Each text unit (), in our case an -gram, is generated from a topic () sampled from the document-topic distribution (). The topic-word distribution describes the topic distribution of each word. For each pair of documents

, the observed Bernoulli random variable

denotes the presence or absence of a prerequisite link from to (i.e. an edge in ). For each document , is the index of the topic that generates the th unit in document . For each pair of documents , the latent topic sampled from for the prerequisite is and similarly, the latent topic sampled from is . The topic is sampled from the same document-word distribution () that is used to generate the document, thus modeling the dependence of the prerequisites on the underlying topics.

It is important to note that the asymmetric prerequisite relation between pairs of documents is modeled by a Bernoulli random variable whose parameter is dependent on the underlying topics in the document pair. The Bernoulli parameter enables asymmetric directionality in the prerequisite link. For example, let be a prerequisite of and and be the latent topics sampled from and respectively for this interaction. Then the parameter used to generate the Bernoulli random variable will be which is different from thus modeling the directionality in the relationship.

We refer the reader to [nallapati2008joint]

for more details of the model and a mean–field variational approximation to infer the model parameters. Using this approach, we train the model with our input document corpus and their corresponding prerequisite links, with a fixed value of the hyperparameter

. We learn , the word distribution over topics and , the asymmetric relationship between each pair of topics, where is our vocabulary of -grams and is the chosen number of topics.

The inferred matrix shows the asymmetric pairwise relation between underlying topics in the document corpus. A natural approach to learning prerequisite relations is to use the inferred topic distribution for each concept word (from , after suitable normalization) and topic prerequisite relation to learn concept prerequisiteness. However, our experiments show that this approach does not work well.

Predicting Relations using Siamese Network



tied weights

o o o



Figure 4: Siamese Architecture. Each branch consists of 2 fully connected (FC) layers with ReLU nonlinearities between them with tied weights. We train this network with the concept vectors learned from Pairwise link-LDA model (with topics and vocabulary ), using cross-entropy loss, with positive and negative concept pairs, is the label of the corresponding pair.

A Siamese network generally comprises of two identical sub-networks that are joined by one cost module [siamese]. The architecture is shown in Figure 4. Each input to our Siamese Network is a pair of vectors () and a binary label . The weights of the sub-networks are tied and each subnetwork is denoted as . The pair () is passed through the sub-networks (

) of two fully connected (FC) neural network layers and a rectified linear unit (ReLU), yielding two corresponding outputs (

). The loss function is optimized with respect to the parameter vectors controlling both the subnets through stochastic gradient decent method using the Adam optimizer.

PREREQ: Concept Prerequisite Prediction

We use exponentiated and normalized columns of as vector representations of the concepts. Hence, each concept is represented as a dimensional vector where is the number of latent topics. Labeled pairs of such vectors from our training set are used as input to train the Siamese network. The label is set to 1 when the first concept is a prerequisite to the second and 0, otherwise.

These vectors are passed through the sub-networks, . We use the sum of the weighted element-wise differences between the twin feature vectors and . and

denotes the weights and biases that connects the difference of the outputs from the two sub-networks to the loss layer. We use the cross-entropy loss function and obtain the probability (

) of the first input vector () being a prerequisite to the second ():


where is the element of the 2-dimensional vector , and , as we are solving a binary classification problem. The trained Siamese network can be used for predicting prerequisite relations.

Related Work

Inferring concept prerequisites, from course dependencies or from video based course data are relatively new areas of study. To our knowledge, there have been three previous methods specifically designed to infer educational concept prerequisites, viz. CGL [liu2016learning], CPR-Recover [eaai17], and MOOC-RF [acl2017prerequisite]. CGL is a supervised learning approach to map courses from different universities onto a universal space of concepts and predict prerequisites between both courses and concepts [liu2016learning]. They represent the courses in vector space and use ranking and another classification based approach. CPR-Recover solves the same problem by formulating a quadratic optimization problem [eaai17] and shows better performance than CGL. But the number of constraints in the optimization problem is proportional to the number of course prerequisite edges, which does not scale well with the size of the training data. Pan et al. recently proposed a method MOOC-RF for concept prerequisite recovery from Coursera data [acl2017prerequisite]

. They define various features and train a classifier that can identify prerequisite relations among concepts from video transcripts. They do not use course/video prerequisite pairs to infer concept prerequisites. Both CGL and MOOC-RF use semantics and context based features. Our method, instead of using hand tuned features, utilizes a pairwise generative model to automatically learn the features from the hidden representation in order to infer concept prerequisite edges.


We first compare the performance of PREREQ (source code111 with that of other state-of-the-art algorithms for inferring prerequisite relations on benchmark datasets. We consider both cases of course prerequisites as well as video playlists, as input, with corresponding baseline methods. Additionally we demonstrate how PREREQ can learn effectively even when there is less training data available.


We use a published benchmark dataset, the University Course Dataset and in addition create a new dataset as described below. Dataset statistics are detailed in table 1.

University Course Dataset 654 861 1008 365
NPTEL MOOC Dataset 382 1445 1008 345
Table 1: Dataset Statistics.

University Course Dataset. This dataset, from [eaai17], has 654 courses, from various universities in USA, and 861 course prerequisite edges. Manual annotation of 1008 pairs of concepts with prerequisite relations are provided. There are 406 unique concepts (word or phrases) among which 1008 prerequisite relationships are annotated.
Data Preparation. We create bag-of-words (BoW) for unigram, bigram and trigrams from each course text, removing the standard English stopwords, to represent each course as a BoW vector. The BoW vectors of 654 courses and 861 course prerequisite edges are used by the pairwise link-LDA model to infer concept vectors. We lemmatize the given concepts to match with the concepts from the BoW vocabulary and get 365 concepts from the vocabulary out of the 406. The concept vectors, which are inferred by the pairwise link-LDA, and the 1008 concept prerequisite pairs are used by the Siamese network for 5 train-test splits as described below.

MOOC Dataset. This dataset is based on video playlists from a MOOC corpus. We download the subtitles of the videos from playlists of computer science departments from NPTEL 222 We use 382 videos from 38 different playlists. The same 1008 concept prerequisite pairs from the University Course Dataset are used as annotated concept pairs, since both the datasets are based on computer science courses.
Data Preparation. We use the video subtitle text (i.e. speech transcripts) to create the BoW vectors of the videos and the words and phrases present in concept vocabulary are considered as the vocabulary for creating BoW. Using similar pre-processing as for the previous dataset, here we find 345 concepts from the BoW vocabulary. As there is no video-video (or course) prerequisite edge present in this case, we use the temporal relatedness as a proxy for course prerequisite relationships. That is, a video lecture in a particular playlist is a prerequisite for all the videos that are in the same playlist after the particular video, which may add some noise in the form of erroneous edges. This gives us total 1455 prerequisite edges between video pairs.

Performance Evaluation

To evaluate the performance of PREREQ on the datasets, we split the concept prerequisite edges into train and test sets. 60% of the given concept pairs from the datasets are used for training while the rest 40% for testing. Training a binary classifier requires both positive and negative instances. But the University Course dataset has only positive samples: the concept prerequisite pairs that are manually annotated. We generate negative samples by sampling random unrelated pairs of phrases from the vocabulary in addition to the reverse pair of original positive samples, to enable our model to learn prerequisite directionality. We oversample the negative instances to 1.5 times the number of positive examples in the training set to address the imbalance. All presented results are averaged over 5 train-test splits.

PREREQ Parameter Settings.

For all our experiments, we choose number of topics = 100 and a fixed Dirichlet parameter = 0.01, to enable sparse topic distribution. The Siamese network is trained with learning rate of 0.0001 and batch size of 128 over 3500 iterations.

Evaluation Metric.

To compare the performance of PREREQ, we use the same evaluation metric used in our main baseline CPR-Recover

[eaai17], i.e., Precision@K = , where rel(.) is a binary indicator of presence of the concept pair in the ground truth. We sort all concept pairs based on their probability (as in Eq. 1

predicted by PREREQ) and choose top K = 50 or K = 100 to calculate the precision. The x-axis in the performance graphs denotes the number of course prerequisite edges used to predict the concept prerequisite pairs. In addition we also use Precision, Recall and F-score to evaluate performance of PREREQ with all the baselines.

Baseline Methods.

The method CPR-Recover [eaai17] is, to our knowledge, the best known method (though unsupervised) for inferring concept prerequisites as tested on the University Course Dataset, where they have shown that it outperforms previous methods including CGL [liu2016learning]. MOOC-RF [acl2017prerequisite] is a supervised method, designed for online video based courses. Note that they have a different problem setting and do not use course/video prerequisite pairs to infer concept prerequisites. Hence, precision@Kfor using different sets of course prerequisite edges is not a computable measure for this method. So, we compare with MOOC-RF using metrics precision, recall and F-score.

In addition to these baselines, we use a simple count-based method, Freq, that calculates the score of a concept pair based on the number of times the pair ‘co-occurs’ in course prerequisite pairs as described in [eaai17]. Also, based on the parameters ( and ) inferred from the pairwise link-LDA model, we predict the directed relationship between a pair of concepts and by computing the score . Each column of is exponentiated and normalized by dividing all element of the column by the maximum element. For precision, recall and F-score computation we use a threshold of 0.5 on the score to distinguish the classes. We call this method Pairwise LDA.

Performance on Benchmark Datasets

From the University Course Dataset, 100, 200, …, 800 course prerequisite edges are randomly sampled and precision@K values are computed and averaged over multiple iterations. Figure 5 shows the comparative results where PREREQ performs significantly better than CPR-Recover consistently over different number of tested prerequisite edges. Figure 6 shows the Precision@K scores for K = 50 and K =100 on the MOOC dataset obtained from the NPTEL video playlists. The results demonstrate that on video playlists, where course/video prerequisite information is not available, PREREQ is able to infer concept prerequisites accurately.

Figure 5: PREREQ shows significant improvement in Precision@50 and Precision@100, on the University Course Dataset. The performance is consistently better even when lesser course prerequisites () are used.
Figure 6: Results on the MOOC Dataset. PREREQ accurately retrieves the concept prerequisite edges with high probability, as measured by Precision@50 and Precision@100, even on video playlist data where course prerequisite links are unavailable.
(a) Performance of PREREQ (b) Representation comparison
Figure 7: Results on University Course dataset, (a) Effect of Training Data Size (b) Effect of Concept Representations.
Dataset University Course Dataset MOOC Dataset
Method PREREQ Pairwise LDA CPR-Recover MOOC-RF PREREQ Pairwise LDA CPR-Recover MOOC-RF
Precision 46.76 98.27 16.66 43.70 55.60 48.43 17.18 59.74
Recall 91.64 16.42 46.51 53.43 75.74 10.47 52.97 56.48
F-score 59.68 28.14 24.54 50.95 60.73 17.22 25.94 58.07
Table 2: Performance of PREREQ on benchmark the University Course Dataset and MOOC dataset. Row-wise best results in bold.

Table 2 shows the precision, recall and F-score on both the University Course Dataset and MOOC Dataset. The F-score of PREREQ is higher than that of CPR-Recover and MOOC-RF in both the datasets.

Effect of Training Data Size on Performance.

Labeled concept prerequisites may be hard to obtain and may not be sufficient to train a supervised model. However the simple Siamese architecture in PREREQ uses very few parameters and can be trained easily even with less training data as seen in Figure 7(a) where the performance of PREREQ on different amounts of training data is compared. We find that the performance reduces only marginally even when only 40% of the available labeled data is used for training.

Effect of Concept Representations.

Our experiments (not shown) show that inferred topics from pairwise-link LDA can discriminate between related and unrelated documents (based on the prerequisite relation), but the topics do not have sufficient signal to determine the directionality of the prerequisite edge. Nevertheless, inferred topics are good concept representations. We test the performance of three different concept representations – (1) topic distributions obtained from pairwise-link LDA (as used in PREREQ), (2) topic distributions obtained directly from LDA learnt from the same document corpus, and (3) word2vec representations [w2v], trained over Wikipedia. Figure 7(b) shows that pairwise-link LDA based concept representations are superior to those based on LDA and word2vec.


We develop PREREQ, a supervised learning method, to learn concept prerequisites from course prerequisite data and from unlabeled video playlists (increasingly available from MOOCs), that obviates the need for manual creation of labeled course prerequisite datasets. PREREQ obtains latent representations of concepts through the pairwise-link LDA model, which are then used to train a Siamese network that can identify prerequisite relations accurately. PREREQ outperforms state-of-the-art methods for inferring prerequisite relations between educational concepts, on benchmark datasets. We also empirically show that PREREQ can learn effectively from very less training data and from unlabeled video playlists. PREREQ can effectively utilize the large and increasing amount of online educational material in the form of text (course webpages) and video (MOOCs) to solve a fundamental problem that is essential for several online educational technology applications.