Inferring Concept Prerequisite Relations from Online Educational Resources. AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19)
The Internet has rich and rapidly increasing sources of high quality educational content. Inferring prerequisite relations between educational concepts is required for modern large-scale online educational technology applications such as personalized recommendations and automatic curriculum creation. We present PREREQ, a new supervised learning method for inferring concept prerequisite relations. PREREQ is designed using latent representations of concepts obtained from the Pairwise Latent Dirichlet Allocation model, and a neural network based on the Siamese network architecture. PREREQ can learn unknown concept prerequisites from course prerequisites and labeled concept prerequisite data. It outperforms state-of-the-art approaches on benchmark datasets and can effectively learn from very less training data. PREREQ can also use unlabeled video playlists, a steadily growing source of training data, to learn concept prerequisites, thus obviating the need for manual annotation of course prerequisites.READ FULL TEXT VIEW PDF
Open Educational Resources (OERs) are openly licensed educational materi...
Open Educational Resources (OERs) are openly licensed educational materi...
One major task of spoken language understanding (SLU) in modern personal...
This paper discusses a Metropolis-Hastings algorithm developed by
A prerequisite is anything that you need to know or understand first bef...
Interactive multimedia educational content has recently been of interest...
Nowadays, the learning management system (LMS) has been widely used in
Inferring Concept Prerequisite Relations from Online Educational Resources. AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19)
A concept is generally called a prerequisite to another concept if the knowledge of is necessary to understand . Such dependencies are natural in cognitive processes when we learn, organize, and apply knowledge [laurence1999concepts]
. Prerequisite relations at a different level – between courses – are commonly found in university curricula. Course-level prerequisites have been manually created by experts over decades and often form a guide to prerequisites between the more granular concepts within the courses. For instance, the course Linear Algebra is usually a prerequisite to the course Machine Learning. Several concepts in a course on Linear Algebra are prerequisites to concepts in a course on Machine Learning, e.g. Eigen Analysis is a prerequisite to Principal Components Analysis.
While textual information about courses has been increasing steadily on the Internet over the years, recently there has been a tremendous growth in online educational data through Massive Open Online Courses (MOOCs) as well as freely accessible videos and blogs from experts. This, in turn, has spurred the development of new applications for personalized online education such as automatic reading list generation [jardine2014automatically], automatic curriculum planning [liu2016learning], and automated evaluation of curricula [rouly2015we]. Concept prerequisite relations play a fundamental role in all these applications.
The value of concept prerequisite maps has been recognized and studied in educational psychology [novak1990concept] and these relations were manually obtained by domain experts. Such a manual process is not scalable in modern online applications that aim to (a) serve students from varying educational backgrounds and (b) generalize to any domain. Hence there is a need to develop methods that can automatically infer pair-wise concept prerequisite relations.
Inference of prerequisite relations has been studied in other contexts, e.g. from Wikipedia [Talukdar:2012], in databases [yosef2011aida] and from text books [liang2018investigating]. These tools can be leveraged but cannot be directly used to detect prerequisite relations from online resources like MOOCs, due to the complexity and scale of courses and educational concepts involved [acl2017prerequisite]. There has been growing interest in designing algorithms specifically to infer educational concept prerequisites [liu2016learning, eaai17, acl2017prerequisite].
In this paper, we develop PREREQ, a new supervised learning approach to inferring concept prerequisites. Similar to the problem setting assumed in previous studies, we assume that prerequisites between courses are known and different courses share an underlying concept space. In addition, we assume that some concept prerequisites are also available to train a supervised model. Manual annotation of course prerequisites, although available, may be hard to scale. We show that PREREQ can also effectively learn from unlabeled video playlists, available through MOOCs.
Figure 1 shows a schematic of the underlying concept space shared across different courses from different universities and over different video playlists. We use known course prerequisites or temporal ordering of videos along with labeled training data of concept prerequisites to predict unknown concept prerequisites. Our method uses latent representations of concepts obtained through a Pairwise- Link Latent Dirichlet Allocation (LDA) model [nallapati2008joint], a model for citations in document corpora. These representations are then used to train a neural network based on the Siamese architecture [siamese]
To summarize, our contributions are:
We develop PREREQ, a method to predict unknown concept prerequisites from (1) labeled concept prerequisites and (2) course prerequisite data or video playlists. PREREQ uses the pairwise-link LDA model to obtain vector representations of concepts and a Siamese network to predict unknown concept prerequisites.
Our extensive experiments demonstrate the superiority of PREREQ over state-of-the-art methods for inferring prerequisite relations between educational concepts. We also empirically demonstrate that PREREQ can effectively learn from very less training data.
Let be the concept space, the set of all concepts of interest, that is assumed to be fixed and known in advance. A concept may be a single word (e.g. “vector”) or a phrase (e.g. “machine learning”). Let be a directed acyclic graph, called concept graph, whose nodes represent concepts and edges represent prerequisite dependency, i.e., contains the directed edge if and only if concept is a prerequisite of concept .
We infer the edges of the concept graph from known prerequisite relations between documents, where documents are text sources containing the concepts of interest. Examples of such documents include course web pages with known course prerequisites. We assume as input a set of text documents and a document graph, a directed acyclic graph , whose nodes represent documents and edges represent document prerequisite dependency, i.e., contains the directed edge if and only if document is a prerequisite of document . Each document is represented by the concepts contained in it, i.e. where is the set of -grams in document , for .
For a given set of concepts , we want to infer concept prerequisites from the known document graph and the set of text documents . In the supervised setting, some concept prerequisites, denoted by the training set, , are known, and the remaining, , are unknown, where and . The problem can be stated as, for a given set of concepts , documents , document graph and known concept prerequisites , predict the unknown concept prerequisites .
Latent topic models for text with citation links between documents have been studied extensively and we study the applicability of one such well-known model, the pairwise-link LDA model [nallapati2008joint], to address the problem of concept prerequisite inference. Our experiments reveal that the latent representations obtained from this model in itself does not have sufficient discriminatory signal. In particular they are a good measure of concept relatedness but not of prerequisite directionality. However, learning unsupervised latent representations through a generative probabilistic model helps in disentangling causal factors by discovering underlying causes such as organization of explanatory factors, natural clustering, sparsity and simplicity [bengio2013representation]. So, we use these latent topic representations of concepts to train a neural network based on the Siamese architecture [siamese] that can identify prerequisite relations.
A schematic view of our method, called PREREQ, is shown in Figure 2. The input documents and the document graph are used to learn the pairwise-link LDA model. Latent representations of the concepts are obtained from this model and used along with known prerequisites to train a Siamese Network. The following sections describe the details of PREREQ.
The Pairwise-Link-LDA model [nallapati2008joint]combines the ideas of Latent Dirichlet Allocation (LDA) [blei2003latent] and Mixed Membership Block Stochastic Models [airoldi2008mixed] for jointly modeling text and links between documents in the topic modeling framework. Document graph and the set of text documents are input to this model. A mixed membership model is a natural choice for modeling the documents, each of which includes many key concepts from different underlying topics. Figure 3 shows the graphical representation of the generative model. Explicit modeling of the directional links (i.e., prerequisite edges) between an ordered pair of documents () captures the topicality of documents and the word distribution over topics better, in terms of capturing the prerequisite relationship between the words itself.
Figure 3 shows that each document generation process is same as in LDA. Each text unit (), in our case an -gram, is generated from a topic () sampled from the document-topic distribution (). The topic-word distribution describes the topic distribution of each word. For each pair of documents
, the observed Bernoulli random variabledenotes the presence or absence of a prerequisite link from to (i.e. an edge in ). For each document , is the index of the topic that generates the th unit in document . For each pair of documents , the latent topic sampled from for the prerequisite is and similarly, the latent topic sampled from is . The topic is sampled from the same document-word distribution () that is used to generate the document, thus modeling the dependence of the prerequisites on the underlying topics.
It is important to note that the asymmetric prerequisite relation between pairs of documents is modeled by a Bernoulli random variable whose parameter is dependent on the underlying topics in the document pair. The Bernoulli parameter enables asymmetric directionality in the prerequisite link. For example, let be a prerequisite of and and be the latent topics sampled from and respectively for this interaction. Then the parameter used to generate the Bernoulli random variable will be which is different from thus modeling the directionality in the relationship.
We refer the reader to [nallapati2008joint]
for more details of the model and a mean–field variational approximation to infer the model parameters. Using this approach, we train the model with our input document corpus and their corresponding prerequisite links, with a fixed value of the hyperparameter. We learn , the word distribution over topics and , the asymmetric relationship between each pair of topics, where is our vocabulary of -grams and is the chosen number of topics.
The inferred matrix shows the asymmetric pairwise relation between underlying topics in the document corpus. A natural approach to learning prerequisite relations is to use the inferred topic distribution for each concept word (from , after suitable normalization) and topic prerequisite relation to learn concept prerequisiteness. However, our experiments show that this approach does not work well.
A Siamese network generally comprises of two identical sub-networks that are joined by one cost module [siamese]. The architecture is shown in Figure 4. Each input to our Siamese Network is a pair of vectors () and a binary label . The weights of the sub-networks are tied and each subnetwork is denoted as . The pair () is passed through the sub-networks (
) of two fully connected (FC) neural network layers and a rectified linear unit (ReLU), yielding two corresponding outputs (
). The loss function is optimized with respect to the parameter vectors controlling both the subnets through stochastic gradient decent method using the Adam optimizer.
We use exponentiated and normalized columns of as vector representations of the concepts. Hence, each concept is represented as a dimensional vector where is the number of latent topics. Labeled pairs of such vectors from our training set are used as input to train the Siamese network. The label is set to 1 when the first concept is a prerequisite to the second and 0, otherwise.
These vectors are passed through the sub-networks, . We use the sum of the weighted element-wise differences between the twin feature vectors and . and
denotes the weights and biases that connects the difference of the outputs from the two sub-networks to the loss layer. We use the cross-entropy loss function and obtain the probability () of the first input vector () being a prerequisite to the second ():
where is the element of the 2-dimensional vector , and , as we are solving a binary classification problem. The trained Siamese network can be used for predicting prerequisite relations.
Inferring concept prerequisites, from course dependencies or from video based course data are relatively new areas of study. To our knowledge, there have been three previous methods specifically designed to infer educational concept prerequisites, viz. CGL [liu2016learning], CPR-Recover [eaai17], and MOOC-RF [acl2017prerequisite]. CGL is a supervised learning approach to map courses from different universities onto a universal space of concepts and predict prerequisites between both courses and concepts [liu2016learning]. They represent the courses in vector space and use ranking and another classification based approach. CPR-Recover solves the same problem by formulating a quadratic optimization problem [eaai17] and shows better performance than CGL. But the number of constraints in the optimization problem is proportional to the number of course prerequisite edges, which does not scale well with the size of the training data. Pan et al. recently proposed a method MOOC-RF for concept prerequisite recovery from Coursera data [acl2017prerequisite]
. They define various features and train a classifier that can identify prerequisite relations among concepts from video transcripts. They do not use course/video prerequisite pairs to infer concept prerequisites. Both CGL and MOOC-RF use semantics and context based features. Our method, instead of using hand tuned features, utilizes a pairwise generative model to automatically learn the features from the hidden representation in order to infer concept prerequisite edges.
We first compare the performance of PREREQ (source code111https://github.com/suderoy/PREREQ-IAAI-19/) with that of other state-of-the-art algorithms for inferring prerequisite relations on benchmark datasets. We consider both cases of course prerequisites as well as video playlists, as input, with corresponding baseline methods. Additionally we demonstrate how PREREQ can learn effectively even when there is less training data available.
We use a published benchmark dataset, the University Course Dataset and in addition create a new dataset as described below. Dataset statistics are detailed in table 1.
|University Course Dataset||654||861||1008||365|
|NPTEL MOOC Dataset||382||1445||1008||345|
University Course Dataset.
This dataset, from [eaai17], has 654 courses, from various universities in USA, and 861 course prerequisite edges. Manual annotation of 1008 pairs of concepts with prerequisite relations are provided. There are 406 unique concepts (word or phrases) among which 1008 prerequisite relationships are annotated.
Data Preparation. We create bag-of-words (BoW) for unigram, bigram and trigrams from each course text, removing the standard English stopwords, to represent each course as a BoW vector. The BoW vectors of 654 courses and 861 course prerequisite edges are used by the pairwise link-LDA model to infer concept vectors. We lemmatize the given concepts to match with the concepts from the BoW vocabulary and get 365 concepts from the vocabulary out of the 406. The concept vectors, which are inferred by the pairwise link-LDA, and the 1008 concept prerequisite pairs are used by the Siamese network for 5 train-test splits as described below.
This dataset is based on video playlists from a MOOC corpus.
We download the subtitles of the videos from playlists of computer science departments from NPTEL 222http://nptel.ac.in/. We use 382 videos from 38 different playlists. The same 1008 concept prerequisite pairs from the University Course Dataset are used as annotated concept pairs, since both the datasets are based on computer science courses.
Data Preparation. We use the video subtitle text (i.e. speech transcripts) to create the BoW vectors of the videos and the words and phrases present in concept vocabulary are considered as the vocabulary for creating BoW. Using similar pre-processing as for the previous dataset, here we find 345 concepts from the BoW vocabulary. As there is no video-video (or course) prerequisite edge present in this case, we use the temporal relatedness as a proxy for course prerequisite relationships. That is, a video lecture in a particular playlist is a prerequisite for all the videos that are in the same playlist after the particular video, which may add some noise in the form of erroneous edges. This gives us total 1455 prerequisite edges between video pairs.
To evaluate the performance of PREREQ on the datasets, we split the concept prerequisite edges into train and test sets. 60% of the given concept pairs from the datasets are used for training while the rest 40% for testing. Training a binary classifier requires both positive and negative instances. But the University Course dataset has only positive samples: the concept prerequisite pairs that are manually annotated. We generate negative samples by sampling random unrelated pairs of phrases from the vocabulary in addition to the reverse pair of original positive samples, to enable our model to learn prerequisite directionality. We oversample the negative instances to 1.5 times the number of positive examples in the training set to address the imbalance. All presented results are averaged over 5 train-test splits.
For all our experiments, we choose number of topics = 100 and a fixed Dirichlet parameter = 0.01, to enable sparse topic distribution. The Siamese network is trained with learning rate of 0.0001 and batch size of 128 over 3500 iterations.
To compare the performance of PREREQ, we use the same evaluation metric used in our main baseline CPR-Recover[eaai17], i.e., Precision@K = , where rel(.) is a binary indicator of presence of the concept pair in the ground truth. We sort all concept pairs based on their probability (as in Eq. 1
predicted by PREREQ) and choose top K = 50 or K = 100 to calculate the precision. The x-axis in the performance graphs denotes the number of course prerequisite edges used to predict the concept prerequisite pairs. In addition we also use Precision, Recall and F-score to evaluate performance of PREREQ with all the baselines.
The method CPR-Recover [eaai17] is, to our knowledge, the best known method (though unsupervised) for inferring concept prerequisites as tested on the University Course Dataset, where they have shown that it outperforms previous methods including CGL [liu2016learning]. MOOC-RF [acl2017prerequisite] is a supervised method, designed for online video based courses. Note that they have a different problem setting and do not use course/video prerequisite pairs to infer concept prerequisites. Hence, precision@Kfor using different sets of course prerequisite edges is not a computable measure for this method. So, we compare with MOOC-RF using metrics precision, recall and F-score.
In addition to these baselines, we use a simple count-based method, Freq, that calculates the score of a concept pair based on the number of times the pair ‘co-occurs’ in course prerequisite pairs as described in [eaai17]. Also, based on the parameters ( and ) inferred from the pairwise link-LDA model, we predict the directed relationship between a pair of concepts and by computing the score . Each column of is exponentiated and normalized by dividing all element of the column by the maximum element. For precision, recall and F-score computation we use a threshold of 0.5 on the score to distinguish the classes. We call this method Pairwise LDA.
From the University Course Dataset, 100, 200, …, 800 course prerequisite edges are randomly sampled and precision@K values are computed and averaged over multiple iterations. Figure 5 shows the comparative results where PREREQ performs significantly better than CPR-Recover consistently over different number of tested prerequisite edges. Figure 6 shows the Precision@K scores for K = 50 and K =100 on the MOOC dataset obtained from the NPTEL video playlists. The results demonstrate that on video playlists, where course/video prerequisite information is not available, PREREQ is able to infer concept prerequisites accurately.
|Dataset||University Course Dataset||MOOC Dataset|
|Method||PREREQ||Pairwise LDA||CPR-Recover||MOOC-RF||PREREQ||Pairwise LDA||CPR-Recover||MOOC-RF|
Table 2 shows the precision, recall and F-score on both the University Course Dataset and MOOC Dataset. The F-score of PREREQ is higher than that of CPR-Recover and MOOC-RF in both the datasets.
Labeled concept prerequisites may be hard to obtain and may not be sufficient to train a supervised model. However the simple Siamese architecture in PREREQ uses very few parameters and can be trained easily even with less training data as seen in Figure 7(a) where the performance of PREREQ on different amounts of training data is compared. We find that the performance reduces only marginally even when only 40% of the available labeled data is used for training.
Our experiments (not shown) show that inferred topics from pairwise-link LDA can discriminate between related and unrelated documents (based on the prerequisite relation), but the topics do not have sufficient signal to determine the directionality of the prerequisite edge. Nevertheless, inferred topics are good concept representations. We test the performance of three different concept representations – (1) topic distributions obtained from pairwise-link LDA (as used in PREREQ), (2) topic distributions obtained directly from LDA learnt from the same document corpus, and (3) word2vec representations [w2v], trained over Wikipedia. Figure 7(b) shows that pairwise-link LDA based concept representations are superior to those based on LDA and word2vec.
We develop PREREQ, a supervised learning method, to learn concept prerequisites from course prerequisite data and from unlabeled video playlists (increasingly available from MOOCs), that obviates the need for manual creation of labeled course prerequisite datasets. PREREQ obtains latent representations of concepts through the pairwise-link LDA model, which are then used to train a Siamese network that can identify prerequisite relations accurately. PREREQ outperforms state-of-the-art methods for inferring prerequisite relations between educational concepts, on benchmark datasets. We also empirically show that PREREQ can learn effectively from very less training data and from unlabeled video playlists. PREREQ can effectively utilize the large and increasing amount of online educational material in the form of text (course webpages) and video (MOOCs) to solve a fundamental problem that is essential for several online educational technology applications.