Personalized Thread Recommendation for MOOC Discussion Forums

06/22/2018 ∙ by Andrew S. Lan, et al. ∙ Zoomi, Inc. Purdue University Princeton University The Hong Kong University of Science and Technology 0

Social learning, i.e., students learning from each other through social interactions, has the potential to significantly scale up instruction in online education. In many cases, such as in massive open online courses (MOOCs), social learning is facilitated through discussion forums hosted by course providers. In this paper, we propose a probabilistic model for the process of learners posting on such forums, using point processes. Different from existing works, our method integrates topic modeling of the post text, timescale modeling of the decay in post activity over time, and learner topic interest modeling into a single model, and infers this information from user data. Our method also varies the excitation levels induced by posts according to the thread structure, to reflect typical notification settings in discussion forums. We experimentally validate the proposed model on three real-world MOOC datasets, with the largest one containing up to 6,000 learners making 40,000 posts in 5,000 threads. Results show that our model excels at thread recommendation, achieving significant improvement over a number of baselines, thus showing promise of being able to direct learners to threads that they are interested in more efficiently. Moreover, we demonstrate analytics that our model parameters can provide, such as the timescales of different topic categories in a course.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Online discussion forums have gained substantial traction over the past decade, and are now a significant avenue of knowledge sharing on the Internet. Attracting learners with diverse interests and backgrounds, some platforms (e.g., Stack Overflow, MathOverflow) target specific technical subjects, while others (e.g., Quora, Reddit) cover a wide range of topics from politics to entertainment.

More recently, discussion forums have become a significant component of online education, enabling students in online courses to learn socially as a supplement to their studying of the course content individually (Brinton et al., 2016); social interactions between learners have been seen to improve learning outcomes (Brusilovsky et al., 2016). In particular, massive open online courses (MOOCs) often have tens of thousands of learners within single sessions, making the social interactions via these forums critical to scaling up instruction (Brinton et al., 2014). In addition to serving as a versatile complement to self-regulated learning (Tomkins et al., 2016), research has shown that learner participation on forums can be predictive of learning outcomes (Wang et al., 2015).

In this paper, we ask: How can we model the activity of individual learners in MOOC discussion forums? Such a model, designed correctly, presents several opportunities to optimize the learning process, including personalized news feeds to help learners sort through forum content efficiently, and analytics on factors driving participation.

1.1. Prior work on discussion forums

Generic online discussion sites.

There is vast literature on analyzing user interactions in online social networks (e.g., on Facebook, Google+, and Twitter). Researchers have developed methods for tasks including link prediction (Kim and Leskovec, 2011; Miller et al., 2009), tweet cascade analysis (Farajtabar et al., 2015; Simma and Jordan, 2010), post topic analysis (Ritter et al., 2010)

, and latent network structure estimation

(Linderman and Adams, 2014; Luo et al., 2015). These methods are not directly applicable to modeling MOOC discussion forums since MOOCs do not support an inherent social structure; learners cannot become “friends” or “follow” one another.

Generic online discussion forums (e.g., Stack Overflow, Quora) have also generated substantial research. Researchers have developed methods for tasks including question-answer pair extraction (Cong et al., 2008), topic dynamics analysis (Wu et al., 2010), post structure analysis (Wang et al., 2011), and user grouping (Shi et al., 2009). While these types of forums also lack explicit social structure, MOOC discussion forums exhibit several unique characteristics that need to be accounted for. First, topics in MOOC discussion forums are mostly centered around course content, assignments, and course logistics (Brinton et al., 2014), making them far more structured than generic forums; thus, topic modeling can be used to organize threads and predict future activity. Second, there are no sub-forums in MOOCs: learners all post in the same venue even though their interests in the course vary. Modeling individual interest levels on each topic can thus assist learners in navigating through posts.

MOOC forums.

A few studies on MOOC discussion forums have emerged recently. The works in (Ramesh et al., 2014; Ramesh et al., 2015) extracted forum structure and post sentiment information by combining unsupervised topic models with sets of expert-specified course keywords. In this work, our objective is to model learners’ forum behavior, which requires analyzing not only the content of posts but also individual learner interests and temporal dynamics of the posts.

In terms of learner modeling, the work in (Gillani et al., 2014) employed Bayesian nonnegative matrix factorization to group learners into communities according to their posting behavior. This work relies on topic labels of each discussion post, though, which are either not available or not reliable in most MOOC forums. The work in (Brinton et al., 2016) inferred learners’ topic-specific seeking and disseminating tendencies on forums to quantify the efficiency of social learning networks. However, this work relies on separate models for learners and topics, whereas we propose a unified model. The work in (Kardan et al., 2017) couples social network analysis and association rule mining for thread recommendation; while their approach considers social interactions among learners, they ignore the content and timing of posts.

As for modeling temporal dynamics, the work in (Brinton et al., 2014)

proposed a method that classifies threads into different categories (e.g., small-talk, course-specific) and ranks thread relevance for learners over time. This model falls short of making recommendations, though, since it does not consider learners individually. The work in

(Yang et al., 2014) employed matrix factorization for thread recommendation and studied the effect of window size, i.e., recommending only threads with posts in a recent time window. However, this model uses temporal information only in post-processing, which limits the insights it offers. The work in (Mi and Faltings, 2017) focuses on learner thread viewing rather than posting behavior, which is different from our study of social interactions since learners view threads independently.

The model proposed in (Mozer and Lindsey, 2016) is perhaps most similar to ours, as it uses point processes to analyze discussion forum posts and associates different timescales with different types of posts to reflect recurring user behavior. With the task of predicting which Reddit sub-forum a user will post in next, the authors base their point processes model on self-excitations, as such behavior is mostly driven by a user’s own posting history. Our task, on the contrary, is to recommend threads to learners taking a particular online course: here, excitations induced by other learners (e.g., explicit replies) can significantly affect a learner’s posting behavior. As a result, the model we develop incorporates mutual excitation. Moreover, (Mozer and Lindsey, 2016) labels each post based on the Reddit sub-forum it belongs to; no such sub-forums exist in MOOCs.

1.2. Our model and contributions

In this paper, we propose and experimentally validate a probabilistic model for learners posting on MOOC discussion forums. Our main contributions are as follows.

First, through point processes, our model captures several important factors that influence a learner’s decision to post. In particular, it models the probability that a learner makes a post in a thread at a particular point in time based on four key factors: (i) the interest level of the learner on the topic of the thread, (ii) the timescale of the thread topic (which corresponds to how fast the excitation induced by new posts on the topic decay over time), (iii) the timing of the previous posts in the thread, and (iv) the nature of the previous posts regarding this learner (e.g., whether they explicitly reply to the learner). Through evaluation on three real-world datasets—the largest having more than 6,000 learners making more than 40,000 posts in more than 5,000 threads—we show that our model significantly outperforms several baselines in terms of thread recommendation, thus showing promise of being able to direct learners to threads they are interested in.

Second, we derive a Gibbs sampling parameter inference algorithm for our model. While existing work has relied on thread labels to identify forum topics, such metadata is usually not available for MOOC forum threads. As a result, we jointly analyze the post timestamp information and the text of the thread by coupling the point process model with a topic model, enabling us to learn the topics and other latent variables through a single procedure.

Third, we demonstrate several types of analytics that our model parameters can provide, using our datasets as examples. These include: (i) identifying the timescales (measured as half-lives) of different topics, from which we find that course logistics-related topics have the longest-lasting excitations, (ii) showing that learners are much (20-30 times) more likely to post again in threads they have already posted in, and (iii) showing that learners receiving explicit replies in threads are much (300-500 times) more likely to post again in these threads to respond to these replies.

2. Point Processes Forum Model

An online course discussion forum is generally comprised of a series of threads, with each thread containing a sequence of posts and comments on posts. Each post/comment contains a body of text, written by a particular learner at a particular point in time. A thread can further be associated with a topic, based on analysis of the text written in the thread. Figure 1 (top) shows an example of a thread in a MOOC consisting of eight posts and comments. Moving forward, the terminology “posting in a thread” will refer to a learner writing either a post or a comment.

We postulate that a learner’s decision to post in a thread at a certain point in time is driven by four main factors: (i) the learner’s interest in the thread’s topic, (ii) the timescale of the thread’s topic, (iii) the number and timing of previous posts in the thread, and (iv) the learner’s prior activity in the thread (e.g., whether there are posts that explicitly reply to the learner). The first factor is consistent with the fact that MOOC forums generally have no sub-forums: in the presence of diverse threads, learners are most likely to post in those covering topics they are interested in. The second factor reflects the observation that different topics exhibit different patterns of temporal dynamics. The third factor captures the common options for thread-ranking that online forums provide to users, e.g., by popularity or recency; learners are more likely to visit those at the top of these rankings. The fourth factor captures the common setup of notifications in discussion forums: learners are typically subscribed to threads automatically once they post in them, and notified of any new posts (especially those that explicitly reply to them) in these threads. To capture these dynamics, we model learners’ posts in threads as events in temporal point processes (Daley and Vere-Jones, 2003), which will be described next.

Point processes.

A point process, the discretization of a Poisson process, is characterized by a rate function that models the probability that an event will happen in an infinitesimal time window (Daley and Vere-Jones, 2003). Formally, the rate function at time is given by


where denotes the number of events up to time (Daley and Vere-Jones, 2003). Assuming the time period of interest is , the likelihood of a series of events at times is given by:


In this paper, we are interested in rate functions that are affected by excitations of past events (e.g., forum posts in the same thread). Thus, we resort to Hawkes processes (Mozer and Lindsey, 2016), which characterize the rate function at time given a series of past events at as

where denotes the constant background rate, denotes the amount of excitation each event induces, i.e., the increase in the rate function after an event,111 is sometimes referred to in literature as the impulse response (Linderman and Adams, 2014). and denotes a non-increasing decay kernel that controls the decay in the excitation of past events over time. In this paper, we use the standard exponential decay kernel , where denotes the decay rate. Through our model, different decay rates can be associated with different topics (Mozer and Lindsey, 2016); as we will see, this model choice enables us to categorize posts into groups (e.g., course content-related, small talk, or course logistics) based on their timescales, which leads to better model analytics.

Figure 1. An example of how threads are structured in MOOC discussion forums (top) and an illustration of corresponding rate functions (bottom) for two learners in this thread. Different posts induce different amounts of excitation depending on whether and how they refer to the learner.

Rate function for new posts.

Let , , and denote the number of learners, topics, and threads in a discussion forum, indexed by , , and , respectively. We assume that each thread  functions independently, and that each learner’s activities in each thread and on each topic are independent. Further, let denote the topic of thread , and let denote the total number of posts in the thread, indexed by ; for each post , we use and to denote the learner index and time of the post, and we use to denote the post of learner  in thread . Note that posts in a thread are indexed in chronological order, i.e., if and only if . Finally, let denote the decay rate of each topic and let denote the interest level of learner  on topic . We model the rate function that characterizes learner  posting in thread (on topic ) at time given all previous posts in the thread (i.e., posts with ) as


In our model, characterizes the base level of excitation that learner  receives from posts in threads on topic , which captures the different interest levels of learners on different topics. The exponential decay kernel models a topic-specific decay in excitation of rate from the time of the post.

Before (the timestamp of the first post learner  makes in thread ), learner ’s rate is given solely by the number and recency of posts in ( if the learner never posts in this thread), while all posts occurring after induce additional excitation characterized by the scalar variable . This model choice captures the common setup in MOOC forums that learners are automatically subscribed to threads after they post in them. Therefore, we postulate that , since new post notifications that come with thread subscriptions tend to increase a learner’s chance of viewing these new posts, in turn increasing their likelihood of posting again in these threads. The observation of users posting immediately after receiving notifications is sometimes referred to as the “bursty” nature of posts on social media (Farajtabar et al., 2015).

We further separate posts made after by whether or not they constitute explicit replies to learner . A post is considered to be an explicit reply to a post in the same thread if and one of the following conditions is met: (i) makes direct reference (e.g., through name or the @ symbol) to the learner who made post , or (ii) is the first comment under .222In this work, we restrict ourselves to these two concrete types of explicit replies; analyzing other, more ambiguous types is left for future work. in (3) denotes the set of explicit recipients of , i.e., if is an explicit reply to learner , then , while if is not an explicit reply to any learners then . This setup captures the common case of learners being notified of posts that explicitly reply to them in a thread. The scalar characterizes the additional excitation these replies induce; we postulate that , i.e., the personal nature of explicit replies to learners’ posts tends to further increase the likelihood of them posting again in the thread (e.g., to address these explicit replies).

Rate function for initial posts.

We must also model the process of generating the initial posts in threads. We characterize the rate function of these posts as time-invariant:


where denotes the background posting rate of learner  on topic . Separating the initial posts in threads from future posts in this way enables us to model learners’ knowledge seeking (i.e., starting threads) and knowledge disseminating (i.e., posting responses in threads) behavior (Brinton et al., 2016), through the background () and excitation levels (), respectively.

Post text modeling.

Finally, we must also model the text of each thread. Given the topic of thread , we model —the bag-of-words representation of the text in across all posts—as being generated from the standard latent Dirichlet allocation (LDA) model (Blei et al., 2003), with topic-word distributions parameterized by . Details on the LDA model and the posterior inference step for via collapsed Gibbs sampling in our parameter inference algorithm are omitted for simplicity of exposition.

Model intuition.

Intuitively, a learner will browse existing threads in the discussion forum when they are interested in a particular topic. If a relevant thread exists, they may make their first post there (e.g., Comment 1 by John under Post 2, in Figure 1), with the rate at which this occurs being governed by the previous activity in the thread (posts at times ) and the learner’s interest level in the topic of the thread (). Together with the exponential decay kernel, this model setting reflects the observation that discussion forum threads are often sorted by recency (the time of last post) and popularity (typically quantified by the number of replies). Additionally or alternatively, if no such thread exists, the learner may decide to start a new thread on the topic (e.g., Post 1 by Bob), depending on their background rate (). Once the learner has posted in a thread, they will receive notifications of new posts there (e.g., Lily will be notified of Post 4), which induces higher levels of excitation (); the personal nature of explicit replies to their posts (e.g., Anne’s mention of John in Comment 3 under Post 2) will induce even higher levels of excitation ().

3. Parameter Inference

We now derive the parameter inference algorithm for our model. We perform inference using Gibbs sampling, i.e., iteratively sampling from the posterior distributions of each latent variable, conditioned on the other latent variables. The detailed steps are as follows:

  • To sample from the posterior distribution of the topic of each thread, , we put a uniform prior over each topic and arrive at the posterior

    where denotes all variables except . denotes the likelihood of observing the text of thread given its topic. denotes the likelihood of observing the sequence of initial thread posts on topic  made by the learner who also made the initial post in thread ;333If is not the initial poster in any thread with , then . this is given by substituting (4) into (2) as


    where denotes the indicator function that takes the value when condition holds and otherwise. denotes the likelihood of observing the sequence of posts made by learner  in thread ,444If has not posted in , then . given by


    where the rate function for learner in thread (with topic ) is given by (3).

  • There is no conjugate prior distribution for the excitation decay rate variable

    . Therefore, we resort to a pre-defined set of decay rates . We put a uniform prior on over values in this set, and arrive at the posterior given by

  • The conjugate prior of the learner background topic interest level variable

    is the Gamma distribution. Therefore, we put a prior on

    as and arrive at the posterior distribution


  • The latent variables and have no conjugate priors. As a result, we introduce an auxiliary latent variable (Linderman and Adams, 2014; Simma and Jordan, 2010) for each post , where means that post  is the “parent” of post  in thread , i.e., post  was caused by the excitation that the previous post  induced. We first sample the parent variable for each post according to

    where depending on the relationship between posts  and from our model, i.e., whether is the first post of in the thread, and if not, whether is an explicit reply to . In general, the set of possible parents of is all prior posts in , but in practice, we make use of the structure of each thread to narrow down the set of possible parents for some posts.555For example, in Fig. 1, Post 2 is the only possible parent post of Comment 1 below, as Comment 1 is an explicit reply to Post 2. We omit the details of this step for simplicity of exposition.

    With these parent variables, we can write , the likelihood of the series of posts learner  makes in thread  as

    where denotes the likelihood of the series of posts learner  makes in thread . We can then expand the likelihood using the parent variables as

    We now see that Gamma distributions are conjugate priors for , , and . Specifically, if , its posterior is given by where

    Similarly, if , the posterior is where

    Finally, if , the posterior is where

We iterate the sampling steps 1–4 above after randomly initializing the latent variables according to their prior distributions. After a burn-in period, we take samples from the posterior distribution of each variable over multiple iterations, and use the average of these samples as its estimate.

4. Experiments

In this section, we experimentally validate our proposed model using three real-world MOOC discussion forum datasets. In particular, we first show that our model obtains substantial gains in thread recommendation performance over several baselines. Subsequently, we demonstrate the analytics on forum content and learner behavior that our model offers.

(a) ml
(b) algo
(c) comp
Figure 2. Plot of recommendation performance over different lengths of the training time window on all datasets. Our model significantly outperforms every baseline.

4.1. Datasets

We obtained three discussion forum datasets from 2012 offerings of MOOCs on Coursera: Machine Learning (

ml), Algorithms, Part I (algo), and English Composition I (comp). The number of threads, posts and learners appearing in the forums, and the duration (the number of weeks with non-zero discussion forum activity) of the courses are given in Table 1.

Dataset Threads Posts Learners Weeks
ml 5,310 40,050 6,604 15
algo 1,323 9,274 1,833 9
comp 4,860 17,562 3,060 14
Table 1. Basic statistics on the datasets.

Prior to experimentation, we perform a series of pre-processing steps. First, we prepare the text for topic modeling by (i) removing non-ascii characters, url links, punctuations and words that contain digits, (ii) converting nouns and verbs to base forms, (iii) removing stopwords,666We use the stopword list in the Python natural language toolkit ( that covers 15 languages. and (iv) removing words that appear fewer than 10 times or in more than 10% of threads. Second, we extract the following information for each post: (i) the ID of the learner who made the post (), (ii) the timestamp of the post (), and (iii) the set of learners it explicitly replies to as defined in the model (). For posts made anonymously, we do not include rates for them () when computing the likelihood of a thread, but we do include them as sources of excitation for non-anonymous learners in the thread.

Figure 3. Recommendation performance of the algorithms for varying testing window length on the algo dataset. The point process-based algorithms have highest performance and are more robust to .

4.2. Thread recommendation

Experimental setup.

We now test the performance of our model on personalized thread recommendation. We run three different experiments, splitting the dataset based on the time of each post. The training set includes only threads initiated during the time interval , i.e., , and only posts on those threads made before , i.e., . The test set contains posts made in time interval , i.e., , but excludes new threads initiated during the test interval.

In the first experiment, we hold the length of the testing interval fixed to 1 day, i.e., , and vary the length of the training interval as , where denotes the number of weeks that the discussion forum stays active. We set to 10, 8, and 8 for ml, comp, and algo, respectively, to ensure the number of posts in the testing set is large enough. These numbers are less than those in Table 1 since learners drop out during the course, which leads to decreasing forum activity. In the second experiment, we hold the length of the training interval fixed at weeks and vary the length of the testing interval as . In the first two experiments, we fix , while in the third experiment, we fix the length of the training and testing intervals to weeks and week, respectively, and vary the number of latent topics as .

For training, we set the values of the hyperparameters to

, and . We set the pre-defined decay rates to correspond to half-lives (i.e., the time for the excitation of a post to decay to half of its original value) ranging from minutes to weeks. We run the inference algorithm for a total of iterations, with of these being burn-in iterations.777

We observe that the Markov chain achieves reasonable mixing after about



We compare the performance of our point process model (PPS) against four baselines: (i) Popularity (PPL), which ranks threads from most to least popular based on the total number of posts in each thread during the training time interval; (ii) Recency (REC), which ranks threads from newest to oldest based on the timestamp of their most recent post; (iii) Social influence (SOC), a variant of our PPS model that replaces learner topic interest levels with learner social influences (the “Hwk” baseline in (Farajtabar et al., 2015)); and (iv) Adaptive matrix factorization (AMF), our implementation of the matrix factorization-based algorithm proposed in (Yang et al., 2014).

To rank threads in our model for each learner, we calculate the probability that learner  will reply to thread  during the testing time interval as

The rate function is given by (3). is given by

where the likelihoods of the initial post and other posts are given by (2) and (3), and the thread text likelihood is given by the standard LDA model. The threads are then ranked from highest to lowest posting probability.

Evaluation metric.

We evaluate recommendation performance using the standard mean average precision for top-N recommendation (MAP@N) metric. This metric is defined by taking the mean (over all learners who posted during the testing time interval) of the average precision

where denotes the set of threads learner  posted in during the testing time interval , denotes the thread recommended to the learner, denotes the precision at , i.e., the fraction of threads among the top recommendations that the learner actually posted in. We use in the first two experiments, and vary in the third experiment.

Figure 4. Plot of recommendation performance of our model over the number of topics on the ml dataset. The best performance is obtained at , though performance is stable for .
Topic Half-life Top words
1 4 hours gradient, row, element, iteration, return, transpose, logistic, multiply, initial, regularization
2 4 hours

layer, classification, probability, neuron, unit, hidden, digit, nn, sigmoid, weight

3 1 day interest, group, computer, Coursera, study, hello, everyone, student, learning, software
4 1 day Coursera, deadline, professor, hard, score, certificate, review, experience, forum, material
5 1 week screenshot, speed, player, subtitle, chrome, firefox, summary, reproduce, open, graph
Table 2.

Estimated half-lives and highest constituent words (obtained by sorting the estimated topic-word distribution parameter vectors

) for selected topics in the ml dataset with at least 100 threads. Different types of topics (course content-related, small-talk, or course logistics) exhibit different half-lives.

Results and discussion.

Fig. 2 plots the recommendation performance of our model and the baselines over different lengths of the training time window for each dataset. Overall, we see that our model significantly outperforms the baselines in each case, achieving 15%-400% improvement over the strongest baseline.888Note that these findings are consistent across each dataset. Moving forward, we present one dataset in each experiment unless differences are noteworthy. The fact that PPS outperforms the SOC baseline confirms our hypothesis that in MOOC forums, learner topic preference is a stronger driver of posting behavior than social influence, consistent with the fact that most forums do not have an explicit social network (e.g., of friends or followers). The fact that PPS outperforms the AMF baseline emphasizes the benefit of the temporal element of point processes in capturing the dynamics in thread activities over time, compared to the (mostly) static matrix factorization-based algorithms. Note also that as the amount of training data increases in the first several weeks, the recommendation performance tends to increase for the point processes-based algorithms while decreasing for PPL and REC. The observed fluctuations can be explained by the decreasing numbers of learners in the test sets as courses progress, since they tend to drop out before the end (see also Fig. 6).

Fig. 3 plots the recommendation performance over different lengths of the testing time window for the algo dataset. As in Fig. 2, our model significantly outperforms every baseline. We also see that recommendation performance tends to decrease as the length of the testing time window increases, but while the performance of point process-based algorithms decay only slightly, the performance of the PPL and AMF baselines decrease significantly (by around 50%). This observation suggests that our model excels at modeling long-term learner posting behavior.

Finally, Fig. 4 plots the recommendation performance of the PPS model over different numbers of topics for the ml dataset, for different choices of , and . In each case, the performance rises slightly up to and then drops for larger values (when overfitting occurs). Overall, the performance is relatively robust to , for .

Figure 5. Direct comparison of our model against the AMF and PPL baselines using the experimental setup in (Yang et al., 2014) on the comp dataset. Our model again significantly outperforms both baselines.

4.3. Direct comparison with AMF

The MAP@5 values we obtained for both the AMF and PPL baselines are significantly less than those reported in (Yang et al., 2014), where AMF is proposed. To investigate this, we also perform a direct, head-to-head comparison between our model and these baselines under our closest possible replication of the experimental setting in (Yang et al., 2014). In particular, we train on threads that have non-zero activity between weeks and , fix the testing time window to , and set . Since the exact procedures used in (Yang et al., 2014) to select the latent dimension in the “content level model,” to select the number of close peers in the “social peer connections”, and to aggregate these two into a single model for matrix factorization in AMF are not clear, we sweep over a range of values for these parameters and choose the values that maximize the performance of AMF.

Fig. 5 compares the MAP@5 performance of our model against that of the PPL and AMF baselines for a range of values of on the comp dataset (as in previous experiments, results on the other two datasets are similar). We see again that our model significantly outperforms both AMF and PPL in each case. Moreover, while AMF consistently outperforms PPL in agreement with the results in (Yang et al., 2014), the MAP@5 values of both baselines are significantly less than the values of reported in (Yang et al., 2014). We also emphasize that setting the length of the testing window to 1 week is too coarse of a timescale for thread recommendation in the MOOC discussion forum setting, where new discussions may emerge on a daily basis due to the release of new learning content, homework assignments, or exams.

4.4. Model analytics

Beyond thread recommendation, we also explore a few types of analytics that our trained model parameters can provide. For this experiment, we set in order to achieve finer granularity in the topics; we found that this leads to more useful analytics.

Figure 6. Plot of the total number of posts on each topic week-by-week in the ml

dataset. The week-to-week activity levels vary significantly across topics.

Dataset ml algo comp
29.0 23.3 33.6
19.2 12.2 10.6
Table 3. Estimated levels of additional excitation brought by new activity notifications and explicit replies.

Topic timescales and thread categories.

Table 2 shows the estimated half-lives and most representative words for five selected topics in the ml dataset that are associated with at least 100 threads. Fig. 6 plots the total number of posts made on these topics each week during the course.

We observe topics with half-lives ranging from hours to weeks. We can use these timescales to categorize threads: course content-related topics (Topics 1 and 2) mostly have short half-lives of hours, small-talk topics (Topics 3 and 4) stay active for longer with half-lives of around one day, and course logistics topics (Topic 5) have much longer half-lives of around one week. Activities in threads on course content-related topics develop and decay rapidly, since they are most likely spurred by specific course materials or assignments. For example, posts on Topic 1 are about implementing gradient descent, which is covered in the second and third weeks of the course, and posts on Topic 2 are about neural networks, which is covered in the fourth and fifth weeks. Small-talk discussions are extremely common at the beginning and the end of the course, while course logistics discussions (e.g., concerning technical issues) are less frequent but steady in volume throughout the course.

Excitation from notifications.

Table 3 shows the estimated additional excitation induced by new activity notifications () and explicit replies (). In each course, we see that notifications increase the likelihood of participation significantly; for example, in ml, a learner’s likelihood of posting after an explicit reply is 473 times higher than without any notification. Notice also that is lowest while is highest in comp. This observation is consistent with the fact that in humanities courses like comp the discussions in each thread will tend to be longer (Brinton et al., 2016), leading to more new activity notifications, while in engineering courses like ml and algo we would expect learners to more directly answer each other’s questions, leading to more explicit replies.

5. Conclusions and Future Work

In this paper, we proposed a point processed-based probabilistic model for MOOC discussion forum posts, and demonstrated its performance in thread recommendation and analytics using real-world datasets. Possible avenues of future work include (i) jointly analyzing discussion forum data and time-varying learner grades (Lan et al., 2014, 2013) to better quantify the “flow of knowledge” between learners, (ii) incorporating up-votes and down-votes on the posts into the model, and (iii) leveraging the course syllabus to better model the emergence of new threads.


  • (1)
  • Blei et al. (2003) D. Blei, A. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (Jan. 2003), 993–1022.
  • Brinton et al. (2016) C. G. Brinton, S. Buccapatnam, F. Wong, M. Chiang, and H. V. Poor. 2016. Social learning networks: Efficiency optimization for MOOC forums. In Proc. IEEE Conf. Comput. Commun. 1–9.
  • Brinton et al. (2014) C. G. Brinton, M. Chiang, S. Jain, H. Lam, Z. Liu, and F. Wong. 2014. Learning about social learning in MOOCs: From statistical analysis to generative model. IEEE Trans. Learn. Technol. 7, 4 (Oct. 2014), 346–359.
  • Brusilovsky et al. (2016) P. Brusilovsky, S. Somyürek, J. Guerra, R. Hosseini, V. Zadorozhny, and P. J. Durlach. 2016. Open social student modeling for personalized learning. IEEE Trans. Emerg. Topics Comput. 4, 3 (July 2016), 450–461.
  • Cong et al. (2008) G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. 2008. Finding question-answer pairs from online forums. In Proc. ACM SIGIR Conf. Res. Dev. Inf. Retr. 467–474.
  • Daley and Vere-Jones (2003) D. J. Daley and D. Vere-Jones. 2003. An Introduction to the Theory of Point Processes. Springer.
  • Farajtabar et al. (2015) M. Farajtabar, S. Yousefi, L. Tran, L. Song, and H. Zha. 2015. A Continuous-time mutually-exciting point process framework for prioritizing events in social media. arXiv preprint arXiv:1511.04145 (Nov. 2015).
  • Gillani et al. (2014) N. Gillani, R. Eynon, M. Osborne, I. Hjorth, and S. Roberts. 2014. Communication communities in MOOCs. arXiv preprint arXiv:1403.4640 (Mar. 2014).
  • Kardan et al. (2017) A. Kardan, A. Narimani, and F. Ataiefard. 2017. A hybrid approach for thread recommendation in MOOC forums. International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering 11, 10 (2017), 2195–2201.
  • Kim and Leskovec (2011) M. Kim and J. Leskovec. 2011. The network completion problem: Inferring missing nodes and edges in networks. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 47–58.
  • Lan et al. (2014) A. S. Lan, C. Studer, and R. G. Baraniuk. 2014. Time-varying learning and content analytics via sparse factor analysis. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 452–461.
  • Lan et al. (2013) A. S. Lan, C. Studer, A. E. Waters, and R. G. Baraniuk. 2013. Joint topic modeling and factor analysis of textual information and graded response data. In Proc. 6th Intl. Conf. Educ. Data Min. 324–325.
  • Linderman and Adams (2014) S. Linderman and R. Adams. 2014. Discovering latent network structure in point process data. In Intel. Conf. Mach. Learn. 1413–1421.
  • Luo et al. (2015) D. Luo, H. Xu, Y. Zhen, X. Ning, H. Zha, X. Yang, and W. Zhang. 2015. Multi-task multi-dimensional Hawkes processes for modeling event sequences. In Proc. Intl. Joint Conf. Artif. Intell. 3685–3691.
  • Mi and Faltings (2017) F. Mi and B. Faltings. 2017. Adaptive sequential recommendation for discussion forums on MOOCs using context trees. In Proc. Intl. Conf. Educ. Data Min. 24–31.
  • Miller et al. (2009) K. Miller, M. I. Jordan, and T. L. Griffiths. 2009. Nonparametric latent feature models for link prediction. In Proc. Adv. Neural Inform. Process. Syst. 1276–1284.
  • Mozer and Lindsey (2016) M. Mozer and R. Lindsey. 2016. Neural Hawkes process memories. In NIPS Symp. Recurrent Neural Netw.
  • Ramesh et al. (2014) A. Ramesh, D. Goldwasser, B. Huang, H. Daumé III, and L. Getoor. 2014. Understanding MOOC discussion forums using seeded LDA. In Proc. Conf. Assoc. Comput. Linguist. 28–33.
  • Ramesh et al. (2015) A. Ramesh, S. Kumar, J. Foulds, and L. Getoor. 2015. Weakly supervised models of aspect-sentiment for online course discussion forums.. In Proc. Conf. Assoc. Comput. Linguist. 74–83.
  • Ritter et al. (2010) A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. In Proc. Human Lang. Technol. 172–180.
  • Shi et al. (2009) X. Shi, J. Zhu, R. Cai, and L. Zhang. 2009. User grouping behavior in online forums. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 777–786.
  • Simma and Jordan (2010) A. Simma and M. I. Jordan. 2010. Modeling events with cascades of Poisson processes. In Proc. Conf. Uncertain. Artif. Intell. 546–555.
  • Tomkins et al. (2016) S. Tomkins, A. Ramesh, and L. Getoor. 2016. Predicting post-test performance from online student behavior: A high school MOOC case study. In Proc. Intl. Conf. Educ. Data Min. 239–246.
  • Wang et al. (2011) H. Wang, C. Wang, C. Zhai, and J. Han. 2011. Learning online discussion structures by conditional random fields. In Proc. ACM SIGIR Conf. Res. Dev. Inf. Retr. 435–444.
  • Wang et al. (2015) X. Wang, D. Yang, M. Wen, K. Koedinger, and C. Rosé. 2015. Investigating how student’s cognitive behavior in MOOC discussion forums affect learning gains. In Proc. Intl. Conf. Educ. Data Min. 226–233.
  • Wu et al. (2010) H. Wu, J. Bu, C. Chen, C. Wang, G. Qiu, L. Zhang, and J. Shen. 2010. Modeling dynamic multi-topic discussions in online forums.. In Proc. Conf. Am. Assoc. Artif. Intell. 1455–1460.
  • Yang et al. (2014) D. Yang, M. Piergallini, I. Howley, and C. Rose. 2014. Forum thread recommendation for massive open online courses. In Proc. Intl. Conf. Educ. Data Min. 257–260.