1. Introduction
Online discussion forums have gained substantial traction over the past decade, and are now a significant avenue of knowledge sharing on the Internet. Attracting learners with diverse interests and backgrounds, some platforms (e.g., Stack Overflow, MathOverflow) target specific technical subjects, while others (e.g., Quora, Reddit) cover a wide range of topics from politics to entertainment.
More recently, discussion forums have become a significant component of online education, enabling students in online courses to learn socially as a supplement to their studying of the course content individually (Brinton et al., 2016); social interactions between learners have been seen to improve learning outcomes (Brusilovsky et al., 2016). In particular, massive open online courses (MOOCs) often have tens of thousands of learners within single sessions, making the social interactions via these forums critical to scaling up instruction (Brinton et al., 2014). In addition to serving as a versatile complement to selfregulated learning (Tomkins et al., 2016), research has shown that learner participation on forums can be predictive of learning outcomes (Wang et al., 2015).
In this paper, we ask: How can we model the activity of individual learners in MOOC discussion forums? Such a model, designed correctly, presents several opportunities to optimize the learning process, including personalized news feeds to help learners sort through forum content efficiently, and analytics on factors driving participation.
1.1. Prior work on discussion forums
Generic online discussion sites.
There is vast literature on analyzing user interactions in online social networks (e.g., on Facebook, Google+, and Twitter). Researchers have developed methods for tasks including link prediction (Kim and Leskovec, 2011; Miller et al., 2009), tweet cascade analysis (Farajtabar et al., 2015; Simma and Jordan, 2010), post topic analysis (Ritter et al., 2010)
, and latent network structure estimation
(Linderman and Adams, 2014; Luo et al., 2015). These methods are not directly applicable to modeling MOOC discussion forums since MOOCs do not support an inherent social structure; learners cannot become “friends” or “follow” one another.Generic online discussion forums (e.g., Stack Overflow, Quora) have also generated substantial research. Researchers have developed methods for tasks including questionanswer pair extraction (Cong et al., 2008), topic dynamics analysis (Wu et al., 2010), post structure analysis (Wang et al., 2011), and user grouping (Shi et al., 2009). While these types of forums also lack explicit social structure, MOOC discussion forums exhibit several unique characteristics that need to be accounted for. First, topics in MOOC discussion forums are mostly centered around course content, assignments, and course logistics (Brinton et al., 2014), making them far more structured than generic forums; thus, topic modeling can be used to organize threads and predict future activity. Second, there are no subforums in MOOCs: learners all post in the same venue even though their interests in the course vary. Modeling individual interest levels on each topic can thus assist learners in navigating through posts.
MOOC forums.
A few studies on MOOC discussion forums have emerged recently. The works in (Ramesh et al., 2014; Ramesh et al., 2015) extracted forum structure and post sentiment information by combining unsupervised topic models with sets of expertspecified course keywords. In this work, our objective is to model learners’ forum behavior, which requires analyzing not only the content of posts but also individual learner interests and temporal dynamics of the posts.
In terms of learner modeling, the work in (Gillani et al., 2014) employed Bayesian nonnegative matrix factorization to group learners into communities according to their posting behavior. This work relies on topic labels of each discussion post, though, which are either not available or not reliable in most MOOC forums. The work in (Brinton et al., 2016) inferred learners’ topicspecific seeking and disseminating tendencies on forums to quantify the efficiency of social learning networks. However, this work relies on separate models for learners and topics, whereas we propose a unified model. The work in (Kardan et al., 2017) couples social network analysis and association rule mining for thread recommendation; while their approach considers social interactions among learners, they ignore the content and timing of posts.
As for modeling temporal dynamics, the work in (Brinton et al., 2014)
proposed a method that classifies threads into different categories (e.g., smalltalk, coursespecific) and ranks thread relevance for learners over time. This model falls short of making recommendations, though, since it does not consider learners individually. The work in
(Yang et al., 2014) employed matrix factorization for thread recommendation and studied the effect of window size, i.e., recommending only threads with posts in a recent time window. However, this model uses temporal information only in postprocessing, which limits the insights it offers. The work in (Mi and Faltings, 2017) focuses on learner thread viewing rather than posting behavior, which is different from our study of social interactions since learners view threads independently.The model proposed in (Mozer and Lindsey, 2016) is perhaps most similar to ours, as it uses point processes to analyze discussion forum posts and associates different timescales with different types of posts to reflect recurring user behavior. With the task of predicting which Reddit subforum a user will post in next, the authors base their point processes model on selfexcitations, as such behavior is mostly driven by a user’s own posting history. Our task, on the contrary, is to recommend threads to learners taking a particular online course: here, excitations induced by other learners (e.g., explicit replies) can significantly affect a learner’s posting behavior. As a result, the model we develop incorporates mutual excitation. Moreover, (Mozer and Lindsey, 2016) labels each post based on the Reddit subforum it belongs to; no such subforums exist in MOOCs.
1.2. Our model and contributions
In this paper, we propose and experimentally validate a probabilistic model for learners posting on MOOC discussion forums. Our main contributions are as follows.
First, through point processes, our model captures several important factors that influence a learner’s decision to post. In particular, it models the probability that a learner makes a post in a thread at a particular point in time based on four key factors: (i) the interest level of the learner on the topic of the thread, (ii) the timescale of the thread topic (which corresponds to how fast the excitation induced by new posts on the topic decay over time), (iii) the timing of the previous posts in the thread, and (iv) the nature of the previous posts regarding this learner (e.g., whether they explicitly reply to the learner). Through evaluation on three realworld datasets—the largest having more than 6,000 learners making more than 40,000 posts in more than 5,000 threads—we show that our model significantly outperforms several baselines in terms of thread recommendation, thus showing promise of being able to direct learners to threads they are interested in.
Second, we derive a Gibbs sampling parameter inference algorithm for our model. While existing work has relied on thread labels to identify forum topics, such metadata is usually not available for MOOC forum threads. As a result, we jointly analyze the post timestamp information and the text of the thread by coupling the point process model with a topic model, enabling us to learn the topics and other latent variables through a single procedure.
Third, we demonstrate several types of analytics that our model parameters can provide, using our datasets as examples. These include: (i) identifying the timescales (measured as halflives) of different topics, from which we find that course logisticsrelated topics have the longestlasting excitations, (ii) showing that learners are much (2030 times) more likely to post again in threads they have already posted in, and (iii) showing that learners receiving explicit replies in threads are much (300500 times) more likely to post again in these threads to respond to these replies.
2. Point Processes Forum Model
An online course discussion forum is generally comprised of a series of threads, with each thread containing a sequence of posts and comments on posts. Each post/comment contains a body of text, written by a particular learner at a particular point in time. A thread can further be associated with a topic, based on analysis of the text written in the thread. Figure 1 (top) shows an example of a thread in a MOOC consisting of eight posts and comments. Moving forward, the terminology “posting in a thread” will refer to a learner writing either a post or a comment.
We postulate that a learner’s decision to post in a thread at a certain point in time is driven by four main factors: (i) the learner’s interest in the thread’s topic, (ii) the timescale of the thread’s topic, (iii) the number and timing of previous posts in the thread, and (iv) the learner’s prior activity in the thread (e.g., whether there are posts that explicitly reply to the learner). The first factor is consistent with the fact that MOOC forums generally have no subforums: in the presence of diverse threads, learners are most likely to post in those covering topics they are interested in. The second factor reflects the observation that different topics exhibit different patterns of temporal dynamics. The third factor captures the common options for threadranking that online forums provide to users, e.g., by popularity or recency; learners are more likely to visit those at the top of these rankings. The fourth factor captures the common setup of notifications in discussion forums: learners are typically subscribed to threads automatically once they post in them, and notified of any new posts (especially those that explicitly reply to them) in these threads. To capture these dynamics, we model learners’ posts in threads as events in temporal point processes (Daley and VereJones, 2003), which will be described next.
Point processes.
A point process, the discretization of a Poisson process, is characterized by a rate function that models the probability that an event will happen in an infinitesimal time window (Daley and VereJones, 2003). Formally, the rate function at time is given by
(1) 
where denotes the number of events up to time (Daley and VereJones, 2003). Assuming the time period of interest is , the likelihood of a series of events at times is given by:
(2) 
In this paper, we are interested in rate functions that are affected by excitations of past events (e.g., forum posts in the same thread). Thus, we resort to Hawkes processes (Mozer and Lindsey, 2016), which characterize the rate function at time given a series of past events at as
where denotes the constant background rate, denotes the amount of excitation each event induces, i.e., the increase in the rate function after an event,^{1}^{1}1 is sometimes referred to in literature as the impulse response (Linderman and Adams, 2014). and denotes a nonincreasing decay kernel that controls the decay in the excitation of past events over time. In this paper, we use the standard exponential decay kernel , where denotes the decay rate. Through our model, different decay rates can be associated with different topics (Mozer and Lindsey, 2016); as we will see, this model choice enables us to categorize posts into groups (e.g., course contentrelated, small talk, or course logistics) based on their timescales, which leads to better model analytics.
Rate function for new posts.
Let , , and denote the number of learners, topics, and threads in a discussion forum, indexed by , , and , respectively. We assume that each thread functions independently, and that each learner’s activities in each thread and on each topic are independent. Further, let denote the topic of thread , and let denote the total number of posts in the thread, indexed by ; for each post , we use and to denote the learner index and time of the post, and we use to denote the post of learner in thread . Note that posts in a thread are indexed in chronological order, i.e., if and only if . Finally, let denote the decay rate of each topic and let denote the interest level of learner on topic . We model the rate function that characterizes learner posting in thread (on topic ) at time given all previous posts in the thread (i.e., posts with ) as
(3) 
In our model, characterizes the base level of excitation that learner receives from posts in threads on topic , which captures the different interest levels of learners on different topics. The exponential decay kernel models a topicspecific decay in excitation of rate from the time of the post.
Before (the timestamp of the first post learner makes in thread ), learner ’s rate is given solely by the number and recency of posts in ( if the learner never posts in this thread), while all posts occurring after induce additional excitation characterized by the scalar variable . This model choice captures the common setup in MOOC forums that learners are automatically subscribed to threads after they post in them. Therefore, we postulate that , since new post notifications that come with thread subscriptions tend to increase a learner’s chance of viewing these new posts, in turn increasing their likelihood of posting again in these threads. The observation of users posting immediately after receiving notifications is sometimes referred to as the “bursty” nature of posts on social media (Farajtabar et al., 2015).
We further separate posts made after by whether or not they constitute explicit replies to learner . A post is considered to be an explicit reply to a post in the same thread if and one of the following conditions is met: (i) makes direct reference (e.g., through name or the @ symbol) to the learner who made post , or (ii) is the first comment under .^{2}^{2}2In this work, we restrict ourselves to these two concrete types of explicit replies; analyzing other, more ambiguous types is left for future work. in (3) denotes the set of explicit recipients of , i.e., if is an explicit reply to learner , then , while if is not an explicit reply to any learners then . This setup captures the common case of learners being notified of posts that explicitly reply to them in a thread. The scalar characterizes the additional excitation these replies induce; we postulate that , i.e., the personal nature of explicit replies to learners’ posts tends to further increase the likelihood of them posting again in the thread (e.g., to address these explicit replies).
Rate function for initial posts.
We must also model the process of generating the initial posts in threads. We characterize the rate function of these posts as timeinvariant:
(4) 
where denotes the background posting rate of learner on topic . Separating the initial posts in threads from future posts in this way enables us to model learners’ knowledge seeking (i.e., starting threads) and knowledge disseminating (i.e., posting responses in threads) behavior (Brinton et al., 2016), through the background () and excitation levels (), respectively.
Post text modeling.
Finally, we must also model the text of each thread. Given the topic of thread , we model —the bagofwords representation of the text in across all posts—as being generated from the standard latent Dirichlet allocation (LDA) model (Blei et al., 2003), with topicword distributions parameterized by . Details on the LDA model and the posterior inference step for via collapsed Gibbs sampling in our parameter inference algorithm are omitted for simplicity of exposition.
Model intuition.
Intuitively, a learner will browse existing threads in the discussion forum when they are interested in a particular topic. If a relevant thread exists, they may make their first post there (e.g., Comment 1 by John under Post 2, in Figure 1), with the rate at which this occurs being governed by the previous activity in the thread (posts at times ) and the learner’s interest level in the topic of the thread (). Together with the exponential decay kernel, this model setting reflects the observation that discussion forum threads are often sorted by recency (the time of last post) and popularity (typically quantified by the number of replies). Additionally or alternatively, if no such thread exists, the learner may decide to start a new thread on the topic (e.g., Post 1 by Bob), depending on their background rate (). Once the learner has posted in a thread, they will receive notifications of new posts there (e.g., Lily will be notified of Post 4), which induces higher levels of excitation (); the personal nature of explicit replies to their posts (e.g., Anne’s mention of John in Comment 3 under Post 2) will induce even higher levels of excitation ().
3. Parameter Inference
We now derive the parameter inference algorithm for our model. We perform inference using Gibbs sampling, i.e., iteratively sampling from the posterior distributions of each latent variable, conditioned on the other latent variables. The detailed steps are as follows:

To sample from the posterior distribution of the topic of each thread, , we put a uniform prior over each topic and arrive at the posterior
where denotes all variables except . denotes the likelihood of observing the text of thread given its topic. denotes the likelihood of observing the sequence of initial thread posts on topic made by the learner who also made the initial post in thread ;^{3}^{3}3If is not the initial poster in any thread with , then . this is given by substituting (4) into (2) as
(5) where denotes the indicator function that takes the value when condition holds and otherwise. denotes the likelihood of observing the sequence of posts made by learner in thread ,^{4}^{4}4If has not posted in , then . given by
(6) where the rate function for learner in thread (with topic ) is given by (3).

There is no conjugate prior distribution for the excitation decay rate variable
. Therefore, we resort to a predefined set of decay rates . We put a uniform prior on over values in this set, and arrive at the posterior given by 
The conjugate prior of the learner background topic interest level variable
is the Gamma distribution. Therefore, we put a prior on
as and arrive at the posterior distributionwhere

The latent variables and have no conjugate priors. As a result, we introduce an auxiliary latent variable (Linderman and Adams, 2014; Simma and Jordan, 2010) for each post , where means that post is the “parent” of post in thread , i.e., post was caused by the excitation that the previous post induced. We first sample the parent variable for each post according to
where depending on the relationship between posts and from our model, i.e., whether is the first post of in the thread, and if not, whether is an explicit reply to . In general, the set of possible parents of is all prior posts in , but in practice, we make use of the structure of each thread to narrow down the set of possible parents for some posts.^{5}^{5}5For example, in Fig. 1, Post 2 is the only possible parent post of Comment 1 below, as Comment 1 is an explicit reply to Post 2. We omit the details of this step for simplicity of exposition.
With these parent variables, we can write , the likelihood of the series of posts learner makes in thread as
where denotes the likelihood of the series of posts learner makes in thread . We can then expand the likelihood using the parent variables as
We now see that Gamma distributions are conjugate priors for , , and . Specifically, if , its posterior is given by where
Similarly, if , the posterior is where
Finally, if , the posterior is where
We iterate the sampling steps 1–4 above after randomly initializing the latent variables according to their prior distributions. After a burnin period, we take samples from the posterior distribution of each variable over multiple iterations, and use the average of these samples as its estimate.
4. Experiments
In this section, we experimentally validate our proposed model using three realworld MOOC discussion forum datasets. In particular, we first show that our model obtains substantial gains in thread recommendation performance over several baselines. Subsequently, we demonstrate the analytics on forum content and learner behavior that our model offers.
4.1. Datasets
We obtained three discussion forum datasets from 2012 offerings of MOOCs on Coursera: Machine Learning (
ml), Algorithms, Part I (algo), and English Composition I (comp). The number of threads, posts and learners appearing in the forums, and the duration (the number of weeks with nonzero discussion forum activity) of the courses are given in Table 1.Dataset  Threads  Posts  Learners  Weeks 

ml  5,310  40,050  6,604  15 
algo  1,323  9,274  1,833  9 
comp  4,860  17,562  3,060  14 
Prior to experimentation, we perform a series of preprocessing steps. First, we prepare the text for topic modeling by (i) removing nonascii characters, url links, punctuations and words that contain digits, (ii) converting nouns and verbs to base forms, (iii) removing stopwords,^{6}^{6}6We use the stopword list in the Python natural language toolkit (http://www.nltk.org/) that covers 15 languages. and (iv) removing words that appear fewer than 10 times or in more than 10% of threads. Second, we extract the following information for each post: (i) the ID of the learner who made the post (), (ii) the timestamp of the post (), and (iii) the set of learners it explicitly replies to as defined in the model (). For posts made anonymously, we do not include rates for them () when computing the likelihood of a thread, but we do include them as sources of excitation for nonanonymous learners in the thread.
4.2. Thread recommendation
Experimental setup.
We now test the performance of our model on personalized thread recommendation. We run three different experiments, splitting the dataset based on the time of each post. The training set includes only threads initiated during the time interval , i.e., , and only posts on those threads made before , i.e., . The test set contains posts made in time interval , i.e., , but excludes new threads initiated during the test interval.
In the first experiment, we hold the length of the testing interval fixed to 1 day, i.e., , and vary the length of the training interval as , where denotes the number of weeks that the discussion forum stays active. We set to 10, 8, and 8 for ml, comp, and algo, respectively, to ensure the number of posts in the testing set is large enough. These numbers are less than those in Table 1 since learners drop out during the course, which leads to decreasing forum activity. In the second experiment, we hold the length of the training interval fixed at weeks and vary the length of the testing interval as . In the first two experiments, we fix , while in the third experiment, we fix the length of the training and testing intervals to weeks and week, respectively, and vary the number of latent topics as .
For training, we set the values of the hyperparameters to
, and . We set the predefined decay rates to correspond to halflives (i.e., the time for the excitation of a post to decay to half of its original value) ranging from minutes to weeks. We run the inference algorithm for a total of iterations, with of these being burnin iterations.^{7}^{7}7We observe that the Markov chain achieves reasonable mixing after about
iterations.Baselines.
We compare the performance of our point process model (PPS) against four baselines: (i) Popularity (PPL), which ranks threads from most to least popular based on the total number of posts in each thread during the training time interval; (ii) Recency (REC), which ranks threads from newest to oldest based on the timestamp of their most recent post; (iii) Social influence (SOC), a variant of our PPS model that replaces learner topic interest levels with learner social influences (the “Hwk” baseline in (Farajtabar et al., 2015)); and (iv) Adaptive matrix factorization (AMF), our implementation of the matrix factorizationbased algorithm proposed in (Yang et al., 2014).
To rank threads in our model for each learner, we calculate the probability that learner will reply to thread during the testing time interval as
The rate function is given by (3). is given by
where the likelihoods of the initial post and other posts are given by (2) and (3), and the thread text likelihood is given by the standard LDA model. The threads are then ranked from highest to lowest posting probability.
Evaluation metric.
We evaluate recommendation performance using the standard mean average precision for topN recommendation (MAP@N) metric. This metric is defined by taking the mean (over all learners who posted during the testing time interval) of the average precision
where denotes the set of threads learner posted in during the testing time interval , denotes the thread recommended to the learner, denotes the precision at , i.e., the fraction of threads among the top recommendations that the learner actually posted in. We use in the first two experiments, and vary in the third experiment.
Topic  Halflife  Top words 

1  4 hours  gradient, row, element, iteration, return, transpose, logistic, multiply, initial, regularization 
2  4 hours  layer, classification, probability, neuron, unit, hidden, digit, nn, sigmoid, weight 
3  1 day  interest, group, computer, Coursera, study, hello, everyone, student, learning, software 
4  1 day  Coursera, deadline, professor, hard, score, certificate, review, experience, forum, material 
5  1 week  screenshot, speed, player, subtitle, chrome, firefox, summary, reproduce, open, graph 
Estimated halflives and highest constituent words (obtained by sorting the estimated topicword distribution parameter vectors
) for selected topics in the ml dataset with at least 100 threads. Different types of topics (course contentrelated, smalltalk, or course logistics) exhibit different halflives.Results and discussion.
Fig. 2 plots the recommendation performance of our model and the baselines over different lengths of the training time window for each dataset. Overall, we see that our model significantly outperforms the baselines in each case, achieving 15%400% improvement over the strongest baseline.^{8}^{8}8Note that these findings are consistent across each dataset. Moving forward, we present one dataset in each experiment unless differences are noteworthy. The fact that PPS outperforms the SOC baseline confirms our hypothesis that in MOOC forums, learner topic preference is a stronger driver of posting behavior than social influence, consistent with the fact that most forums do not have an explicit social network (e.g., of friends or followers). The fact that PPS outperforms the AMF baseline emphasizes the benefit of the temporal element of point processes in capturing the dynamics in thread activities over time, compared to the (mostly) static matrix factorizationbased algorithms. Note also that as the amount of training data increases in the first several weeks, the recommendation performance tends to increase for the point processesbased algorithms while decreasing for PPL and REC. The observed fluctuations can be explained by the decreasing numbers of learners in the test sets as courses progress, since they tend to drop out before the end (see also Fig. 6).
Fig. 3 plots the recommendation performance over different lengths of the testing time window for the algo dataset. As in Fig. 2, our model significantly outperforms every baseline. We also see that recommendation performance tends to decrease as the length of the testing time window increases, but while the performance of point processbased algorithms decay only slightly, the performance of the PPL and AMF baselines decrease significantly (by around 50%). This observation suggests that our model excels at modeling longterm learner posting behavior.
Finally, Fig. 4 plots the recommendation performance of the PPS model over different numbers of topics for the ml dataset, for different choices of , and . In each case, the performance rises slightly up to and then drops for larger values (when overfitting occurs). Overall, the performance is relatively robust to , for .
4.3. Direct comparison with AMF
The MAP@5 values we obtained for both the AMF and PPL baselines are significantly less than those reported in (Yang et al., 2014), where AMF is proposed. To investigate this, we also perform a direct, headtohead comparison between our model and these baselines under our closest possible replication of the experimental setting in (Yang et al., 2014). In particular, we train on threads that have nonzero activity between weeks and , fix the testing time window to , and set . Since the exact procedures used in (Yang et al., 2014) to select the latent dimension in the “content level model,” to select the number of close peers in the “social peer connections”, and to aggregate these two into a single model for matrix factorization in AMF are not clear, we sweep over a range of values for these parameters and choose the values that maximize the performance of AMF.
Fig. 5 compares the MAP@5 performance of our model against that of the PPL and AMF baselines for a range of values of on the comp dataset (as in previous experiments, results on the other two datasets are similar). We see again that our model significantly outperforms both AMF and PPL in each case. Moreover, while AMF consistently outperforms PPL in agreement with the results in (Yang et al., 2014), the MAP@5 values of both baselines are significantly less than the values of reported in (Yang et al., 2014). We also emphasize that setting the length of the testing window to 1 week is too coarse of a timescale for thread recommendation in the MOOC discussion forum setting, where new discussions may emerge on a daily basis due to the release of new learning content, homework assignments, or exams.
4.4. Model analytics
Beyond thread recommendation, we also explore a few types of analytics that our trained model parameters can provide. For this experiment, we set in order to achieve finer granularity in the topics; we found that this leads to more useful analytics.
Dataset  ml  algo  comp 

29.0  23.3  33.6  
19.2  12.2  10.6 
Topic timescales and thread categories.
Table 2 shows the estimated halflives and most representative words for five selected topics in the ml dataset that are associated with at least 100 threads. Fig. 6 plots the total number of posts made on these topics each week during the course.
We observe topics with halflives ranging from hours to weeks. We can use these timescales to categorize threads: course contentrelated topics (Topics 1 and 2) mostly have short halflives of hours, smalltalk topics (Topics 3 and 4) stay active for longer with halflives of around one day, and course logistics topics (Topic 5) have much longer halflives of around one week. Activities in threads on course contentrelated topics develop and decay rapidly, since they are most likely spurred by specific course materials or assignments. For example, posts on Topic 1 are about implementing gradient descent, which is covered in the second and third weeks of the course, and posts on Topic 2 are about neural networks, which is covered in the fourth and fifth weeks. Smalltalk discussions are extremely common at the beginning and the end of the course, while course logistics discussions (e.g., concerning technical issues) are less frequent but steady in volume throughout the course.
Excitation from notifications.
Table 3 shows the estimated additional excitation induced by new activity notifications () and explicit replies (). In each course, we see that notifications increase the likelihood of participation significantly; for example, in ml, a learner’s likelihood of posting after an explicit reply is 473 times higher than without any notification. Notice also that is lowest while is highest in comp. This observation is consistent with the fact that in humanities courses like comp the discussions in each thread will tend to be longer (Brinton et al., 2016), leading to more new activity notifications, while in engineering courses like ml and algo we would expect learners to more directly answer each other’s questions, leading to more explicit replies.
5. Conclusions and Future Work
In this paper, we proposed a point processedbased probabilistic model for MOOC discussion forum posts, and demonstrated its performance in thread recommendation and analytics using realworld datasets. Possible avenues of future work include (i) jointly analyzing discussion forum data and timevarying learner grades (Lan et al., 2014, 2013) to better quantify the “flow of knowledge” between learners, (ii) incorporating upvotes and downvotes on the posts into the model, and (iii) leveraging the course syllabus to better model the emergence of new threads.
References
 (1)
 Blei et al. (2003) D. Blei, A. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (Jan. 2003), 993–1022.
 Brinton et al. (2016) C. G. Brinton, S. Buccapatnam, F. Wong, M. Chiang, and H. V. Poor. 2016. Social learning networks: Efficiency optimization for MOOC forums. In Proc. IEEE Conf. Comput. Commun. 1–9.
 Brinton et al. (2014) C. G. Brinton, M. Chiang, S. Jain, H. Lam, Z. Liu, and F. Wong. 2014. Learning about social learning in MOOCs: From statistical analysis to generative model. IEEE Trans. Learn. Technol. 7, 4 (Oct. 2014), 346–359.
 Brusilovsky et al. (2016) P. Brusilovsky, S. Somyürek, J. Guerra, R. Hosseini, V. Zadorozhny, and P. J. Durlach. 2016. Open social student modeling for personalized learning. IEEE Trans. Emerg. Topics Comput. 4, 3 (July 2016), 450–461.
 Cong et al. (2008) G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. 2008. Finding questionanswer pairs from online forums. In Proc. ACM SIGIR Conf. Res. Dev. Inf. Retr. 467–474.
 Daley and VereJones (2003) D. J. Daley and D. VereJones. 2003. An Introduction to the Theory of Point Processes. Springer.
 Farajtabar et al. (2015) M. Farajtabar, S. Yousefi, L. Tran, L. Song, and H. Zha. 2015. A Continuoustime mutuallyexciting point process framework for prioritizing events in social media. arXiv preprint arXiv:1511.04145 (Nov. 2015).
 Gillani et al. (2014) N. Gillani, R. Eynon, M. Osborne, I. Hjorth, and S. Roberts. 2014. Communication communities in MOOCs. arXiv preprint arXiv:1403.4640 (Mar. 2014).
 Kardan et al. (2017) A. Kardan, A. Narimani, and F. Ataiefard. 2017. A hybrid approach for thread recommendation in MOOC forums. International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering 11, 10 (2017), 2195–2201.
 Kim and Leskovec (2011) M. Kim and J. Leskovec. 2011. The network completion problem: Inferring missing nodes and edges in networks. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 47–58.
 Lan et al. (2014) A. S. Lan, C. Studer, and R. G. Baraniuk. 2014. Timevarying learning and content analytics via sparse factor analysis. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 452–461.
 Lan et al. (2013) A. S. Lan, C. Studer, A. E. Waters, and R. G. Baraniuk. 2013. Joint topic modeling and factor analysis of textual information and graded response data. In Proc. 6th Intl. Conf. Educ. Data Min. 324–325.
 Linderman and Adams (2014) S. Linderman and R. Adams. 2014. Discovering latent network structure in point process data. In Intel. Conf. Mach. Learn. 1413–1421.
 Luo et al. (2015) D. Luo, H. Xu, Y. Zhen, X. Ning, H. Zha, X. Yang, and W. Zhang. 2015. Multitask multidimensional Hawkes processes for modeling event sequences. In Proc. Intl. Joint Conf. Artif. Intell. 3685–3691.
 Mi and Faltings (2017) F. Mi and B. Faltings. 2017. Adaptive sequential recommendation for discussion forums on MOOCs using context trees. In Proc. Intl. Conf. Educ. Data Min. 24–31.
 Miller et al. (2009) K. Miller, M. I. Jordan, and T. L. Griffiths. 2009. Nonparametric latent feature models for link prediction. In Proc. Adv. Neural Inform. Process. Syst. 1276–1284.
 Mozer and Lindsey (2016) M. Mozer and R. Lindsey. 2016. Neural Hawkes process memories. In NIPS Symp. Recurrent Neural Netw.
 Ramesh et al. (2014) A. Ramesh, D. Goldwasser, B. Huang, H. Daumé III, and L. Getoor. 2014. Understanding MOOC discussion forums using seeded LDA. In Proc. Conf. Assoc. Comput. Linguist. 28–33.
 Ramesh et al. (2015) A. Ramesh, S. Kumar, J. Foulds, and L. Getoor. 2015. Weakly supervised models of aspectsentiment for online course discussion forums.. In Proc. Conf. Assoc. Comput. Linguist. 74–83.
 Ritter et al. (2010) A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. In Proc. Human Lang. Technol. 172–180.
 Shi et al. (2009) X. Shi, J. Zhu, R. Cai, and L. Zhang. 2009. User grouping behavior in online forums. In Proc. ACM SIGKDD Intl. Conf. Knowl. Discov. Data Min. 777–786.
 Simma and Jordan (2010) A. Simma and M. I. Jordan. 2010. Modeling events with cascades of Poisson processes. In Proc. Conf. Uncertain. Artif. Intell. 546–555.
 Tomkins et al. (2016) S. Tomkins, A. Ramesh, and L. Getoor. 2016. Predicting posttest performance from online student behavior: A high school MOOC case study. In Proc. Intl. Conf. Educ. Data Min. 239–246.
 Wang et al. (2011) H. Wang, C. Wang, C. Zhai, and J. Han. 2011. Learning online discussion structures by conditional random fields. In Proc. ACM SIGIR Conf. Res. Dev. Inf. Retr. 435–444.
 Wang et al. (2015) X. Wang, D. Yang, M. Wen, K. Koedinger, and C. Rosé. 2015. Investigating how student’s cognitive behavior in MOOC discussion forums affect learning gains. In Proc. Intl. Conf. Educ. Data Min. 226–233.
 Wu et al. (2010) H. Wu, J. Bu, C. Chen, C. Wang, G. Qiu, L. Zhang, and J. Shen. 2010. Modeling dynamic multitopic discussions in online forums.. In Proc. Conf. Am. Assoc. Artif. Intell. 1455–1460.
 Yang et al. (2014) D. Yang, M. Piergallini, I. Howley, and C. Rose. 2014. Forum thread recommendation for massive open online courses. In Proc. Intl. Conf. Educ. Data Min. 257–260.
Comments
There are no comments yet.