Models for Predicting Community-Specific Interest in News Articles

08/27/2018 ∙ by Benjamin D. Horne, et al. ∙ Raytheon Rensselaer Polytechnic Institute 0

In this work, we ask two questions: 1. Can we predict the type of community interested in a news article using only features from the article content? and 2. How well do these models generalize over time? To answer these questions, we compute well-studied content-based features on over 60K news articles from 4 communities on We train and test models over three different time periods between 2015 and 2017 to demonstrate which features degrade in performance the most due to concept drift. Our models can classify news articles into communities with high accuracy, ranging from 0.81 ROC AUC to 1.0 ROC AUC. However, while we can predict the community-specific popularity of news articles with high accuracy, practitioners should approach these models carefully. Predictions are both community-pair dependent and feature group dependent. Moreover, these feature groups generalize over time differently, with some only degrading slightly over time, but others degrading greatly. Therefore, we recommend that community-interest predictions are done in a hierarchical structure, where multiple binary classifiers can be used to separate community pairs, rather than a traditional multi-class model. Second, these models should be retrained over time based on accuracy goals and the availability of training data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Understanding community engagement ranges from important to crucial in a wide range of military missions. However, doing so in a meaningful and automated way has and continues to be an extremely challenging problem. The military must be aware of micro cultural issues and how populations will react to current events. Provided with this understanding, the military can adapt and spend resources on areas most crucial to tipping population sentiment in our favor. We look to augment the work of anthropologists and sociologists by automating the understanding of distinct communities. To this end, we use Reddit, as an initial source of micro-communities, to detect how communities will react to events as reported by national and local media. Specifically, we ask the question: Q1: Can we predict the type of community interested in a news article using only features from the article content? Prior work has built models to predict general popularity, both with cold-starts (i.e. based only on message content, before the spread of information) [1, 2] and warm-starts (i.e. based on early popularity) [12, 13, 23], however, there has been little work in predicting community-specific popularity of a news article.

In this work, we approach the problem by building and validating cold-start machine learning models to classify the community of interest among 4 distinct news communities on Reddit. We use well-studied content-based features, used in previous studies for news credibility [7] and news popularity [2]. Further, we examine models using intuitive feature groups to better understand what signals predict best and how they differ between these communities.

One of the many challenges with predicting human activities is the constant evolution of behavior. This often creates a moving target for machine learning models, which can be complex and hard to control for. The problem we address in this paper has many naturally evolving parts: the news cycle, the political climate of a region, and the community preferences. Thus, to explore concept drift in community-specific popularity prediction, we ask a secondary question: Q2: How well do these models generalize over time? To answer this, we gather Reddit news community data from a 3 year time period and emulate a machine learning model that has not been retrained for 2 years. This analysis is done for each feature group to show which features generalize better over time.

Lastly, this work builds a foundation for several important future works, particularly in a military setting. Since this work focuses on community-specific popular, rather than general popularity, it relates to better understanding targeted or malicious media coverage, where news articles or blogs are written with the intention of provoking certain communities. One of the objectives of our research is to eventually be able to detect such types of non-traditional attacks. Furthermore, the techniques explored in this work can be extended to many more types of communities, whether those are on different media platforms or centered in different countries.

Our results show that we can predict the community-specific popularity of news articles with high accuracy, but practitioners should approach these models carefully. While our models can predict community interest with close to 100% accuracy, these predictions are both community-pair dependent and feature dependent. In addition, these feature groups generalize over time differently, with some only degrading slightly over 2 years, but others becoming useless. Hence, we make two recommendations: 1. Community-interest predictions should be done with a hierarchical model, where multiple feature-filtered binary classifiers can be used to separate community-interest pairs, rather than a traditional multi-class model, 2. These models should be retrained over time based on the application’s accuracy goals and the availability of data.

Ii Related Work

There are many prior works on general news popularity. The majority of these works use and develop content-based features to capture popularity, such as headline features [20, 22] or body content [5]. However, others have based predictions on other signals such as users comments [24] and early popularity [12, 13, 23]. Even more general, there is a large body of work on content popularity prediction, not focused on news articles. These works include predicting the popularity of comments on Reddit [6, 10], predicting the popularity of tweets or hashtags on Twitter [16, 26], and predicting the popularity of videos [14, 23].

While news popularity has been studied extensively, community-specific popularity has not. The only prior work attempting to predict community-specific interest is in [7], in which a simple content based model is deployed on a 3 month Reddit data set, achieving 77% accuracy. We hope to both improve upon this model and gain a better understanding of what signals generalize well over time.

Community Description
mainstream News community that does not allow opinion-based articles or articles that do not properly report a story. (Generally reliable news stories)
conspiracy Conspiracy theory community that does not censor posts and encourages news about unconfirmed hypotheses. (Questionable news stories)
bias1 News community that is focused on stories from one political viewpoint, that may or may not be misleading (Hyper-partisan news stories)
bias2 News community that is focused on stories from one political viewpoint, opposite of bias1 (Hyper-partisan news stories)

TABLE I: Description of communities used in study.

Iii Data

In order to address the problem of community-specific popularity, we will use data from a set of news communities on Reddit is a social news-aggregation platform made up of interest-based communities called subreddits. Each subreddit has subscribers who can post urls to news articles, comment on posts, and vote for a post, which roughly determines its placement on the page. Further, each subreddit has a moderation team that ensures content meets the community’s standards. These standards can vary widely, from requiring informative news to news of a specific view point. This clear structure and diversity of news-based communities provides an ideal setting to explore community-specific interest in news. While this problem can be thought of more generally (not only Reddit communities) and need not be focused on country specific news communities, we use this data set to provide a clear testing bed for our methods. Further, Reddit is a widely used platform, ranking 6th in global popularity according to in 2018. Reddit has also been shown useful in other prediction tasks, such as ranking popular comments in discussions [6] and general popularity studies [4, 11, 25].

In this study, we use 4 distinct subreddits: a general news community, a conspiracy news community, and two hyper-partisan news communities. We will call these communities mainstream, conspiracy, bias1, and bias2, respectively. A description of each community can be found in Table I. It is clear that these communities capture diverse types of news in terms of both view point and reliability. Specifically, we expect conspiracy to contain questionable/unconfirmed (conspiracy-theory) news, while both bias1 and bias2 contain news from two extreme view points (opposite viewpoints). These 3 communities contrast with mainstream, which seeks to curate factual/non-opinion based news.

To construct this data set, we first collect posts from each community on Reddit for 3 months in 2017 using the Reddit API111 This collection includes the news article url, the post time, the post score (based on community members votes), and the number of comments on the post. Using the post urls, we run a generic news article scraper, used in [7], to extract article title, body text, and source. We remove any post that has a score of 0 or less, as this means community members have disapproved of the article being posted, which is indicated by “downvoting” the post to a score of 0.

In addition to this primary data set, for a secondary test of our models over time, we collect and scrape news articles from posts in mainstream and conspiracy during 3 months in 2016 and 3 months in 2015. We choose these two communities for this supplementary test as they both have rich data dating back over 10 years, where as bias1 and bias2 are younger communities.

In total, we analyze over 60K articles across 4 communities. To further illustrate the diversity of these communities, we compute the overlap of news articles posted (Table II), news sources posted (Table III), and named entities mentioned in news articles (Table IV). Specifically, news article overlap is the percent of identical news article posted in a pair of communities, where news source overlap is the percent of articles posted that come from the same source in a pair of communities. Similarly, named entity overlap is the percent of articles that mention the same person, place, or group between a pair of communities. Details about our named entity extraction processes can be found in Section IV. This meta-data shows the largest overlap in news articles is 0.58% between mainstream and bias2, the largest overlap in sources is 14.1% between conspiracy and bias2, and the largest overlap in named entities mentioned is 14.0% between bias1, bias2, and conspiracy. Overall, we see very little similarity between all 4 communities, illustrating a natural separation between the types of news shared in each.

mainstream conspiracy bias1 bias2
mainstream 100% 0.19% 0.56% 0.58%
conspiracy 0.19% 100% 0.53% 0.25%
bias1 0.56% 0.53% 100% 0.37%
bias2 0.58% 0.25% 0.37% 100%
TABLE II: Percentage of news article overlap in 2017 data set
mainstream conspiracy bias1 bias2
mainstream 100% 6.5% 10.8% 9.6%
conspiracy 6.5% 100% 13.6% 14.1%
bias1 10.8% 13.6% 100% 13.7%
bias2 9.6% 14.1% 13.7% 100%
TABLE III: Percentage of news source overlap in 2017 data set
mainstream conspiracy bias1 bias2
mainstream 100% 6.5% 11.8% 9.6%
conspiracy 6.5% 100% 14% 14%
bias1 11.8% 14% 100% 13.6%
bias2 9.6% 14% 13.6% 100%
TABLE IV: Percentage of most frequent named entity mentioned overlap in 2017 data set

Iv Features

In order to capture differences in the news read by each community, we compute 7 groups of features, all of which can be extracted from a news article alone. These feature groups are inspired by the techniques used in [7], in addition to, news popularity literature [2]. In [7], a similar set of features is used to predict both the reliability of news and the bias of news with very high accuracy. We expect tight-knit online communities to have preferences that parallel these news article classes to some degree, demonstrating the potential usefulness of these content feature groups. These feature groups are as follows:

Style features are built to capture the overall writing style and structure in a news article. These features include Parts-of-Speech (POS) [15], punctuation, use of all capitalized words, use of quotes, use of past, present, or future tense, quantification words [19], and swear words. In total this feature group contains 45 features which are compute on the body text and title text of the news article independently.

Complexity features capture the complexity of writing in a news article. These features include lexical diversity, reading grade complexity, number of stop words, average word length, and the length of news article. In total this feature group contains 7 features which are compute on the body text and title text of the news article independently.

Bias features capture how opinionated or one-sided a news story is. These features include bias word count [17, 21], number of hedges (i.e. could, maybe, possibly, etc.) [17, 21], number of factives (a verb, adjective, or noun phrase presupposing the truth of a sentence), number of implicatives (i.e. manage to, failed to, etc.), certainty/tentativeness [19], and subjectivity [8]. In total this feature group contains 11 features which are compute on the body text and title text of the news article independently.

Named Entity features capture who and what is being talked about in a news article. Specifically, we extract the most frequently mentioned named entity from each news article and encode it into a unique number. Examples of named entities include: Steven Hawking (person), Middle East (place), ISIS (group), and Illuminati (group). Named entity extraction is done using Python NLTK [15].

Sentiment features capture the emotion and affect in a news article. These features include positive emotion words [9, 19], negative emotion words, neutral emotion words, words that indicate anger, words that indicate assent [19], and the strength of those words [21]. In total this feature group contains 16 features which are compute on the body text and title text of the news article independently.

Entity Slant is a combination of our sentiment feature group and entity feature group. This is a simple method of capturing the affect towards a named entity in a news article. While this feature could be much more granular, such as being computed per sentence rather than per article, it should capture the relative slant towards or against the most frequently mentioned entity. In total this feature group contains 17 features.

Source simply captures what news sources a community prefers to read. To do this, we build a source encoding dictionary that assigns a unique number to each source. If two communities read from mutually exclusive sets of sources, this feature will completely separate the communities.

V Machine Learning Models

Using these feature groups, we implement several well-known machine learning algorithms to test which features are best for prediction. Specifically, we use linear-kernel Support Vector Machines (SVM) and Random Forest (RF) classifiers. Each algorithm’s hyper-parameters are tuned using 10-fold cross validation. Further, each model is trained using balanced class weights. The classes are natural imbalanced due to varying posting behavior in each community. While this imbalance is not very extreme, it is best practice to train with balanced classes. This balance could also be achieved using minor-class oversampling or SMOTE balancing, which is a synthetic data point technique. These techniques are typically used when the imbalance in classes is extreme. All algorithms are implemented using Python Sci-kit Learn 


conspiracy bias1 bias2


TABLE V: Confusion Matrix of ROC curves for each Random Forest model. Each cell in the table contains the ROC curves (and AUC) of each feature group for classifying news articles into the respective communities. For example, row 1, column 1 contains the ROC graph for classifying news articles between mainstream and conspiracy. To save space, we only show the top-half of the matrix, as it is symmetrically on the diagonal.
(a) Train/Test 2017 (b) Train/Test 2016 (c) Train/Test 2015 (d) Test Over Time
TABLE VI: ROC AUC scores for each feature group over time when classifying mainstream vs conspiracy. Specifically: (a) Training and testing each model on 2017 data, (b) Training and testing each model on 2016 data, (c) Training and testing each model on 2015 data, and (d) Training models with 2015 data, testing on 2016 and 2017 data.

Vi Results

First, we examine how well each of our models perform on the 2017 data set. Due to limited space, we only show results for our Random Forest models. However, we found very similar results using linear-kernel SVMs. In addition, we try various thresholds of popularity using both the voting score and the number of comments (for example, training on the top 20% of articles by score, top 30% by score, etc.), but find little difference in performance. In fact, we find that when we use all posts, with the exception of posts with a score of 0, our performance is slightly better. Therefore, the results we show are using all posts with scores above 0 for training and testing.

A confusion matrix of Receiver Operating Characteristic (ROC) curves for each community pair and feature group can be found in Table V. Specifically, we train and test classification between each binary pair of communities for each feature group. This allows us to understand exactly what features provide clear signal and how those signals differ between communities. In each graph’s legend, the ROC Area Under The Curve (AUC) values can be found. As a rule of thumb, ROC AUC values can be interpreted as such: 0.90 to 1.0 is excellent, 0.80 to 0.90 is good, 0.70 to 0.80 is fair, 0.60 to 0.70 is poor, and 0.50 to 0.60 is fail. In each graph, we plot a black dotted line to indicate a ROC AUC of 0.5, which is random chance.

Vi-a Models for Prediction

We can predict which community is interested in a news article. In Table V, we see each community pair can be classified with reasonable accuracy. The community mainstream can be separated from each other community the best, achieving near 1.0 AUC. While separating conspiracy from bias1 achieves 0.81 AUC at its best and conspiracy from bias2 achieves 0.84 AUC at its best.

These prediction models are community-pair dependent. While each community pair can be separated, very different sets of features are used to classify. Features that best differentiate mainstream articles from conspiracy articles are bias, entity, and source. Features that differentiate mainstream articles from bias1 articles are bias, entity, sentiment, entity slant, and style. Similarly, features that differentiate mainstream articles from bias2 articles are bias, entity, entity slant, and source. Bias and entity based features clearly separate mainstream news communities from alternative news communities. On the other hand, features that separate bias1 from bias2 are source and entity slant. Interestingly, we see that entity features on their own do very poorly (0.48 AUC), but entity slant does well (0.78 AUC). This shows that the hyper-partisan communities are talking about the same people, places, and things, but with a different affect towards them, as we naturally expect. Lastly, we see that conspiracy articles only are separated from bias1 and bias2 articles with source and entity features.

Vi-B Generalizing Models Over Time

An important metric of performance for machine learning models is how well they work over long periods of time. This notion is often called “concept drift,” which refers to unforeseen changes in a target variable over time [27]. Concept drift becomes particularly important when prediction models are applied to quickly evolving situations, such as predicting social concepts, the news cycle, or fraud detection [3]. While a model can perform very well in a small time frame, its performance may degrade over time. To test this, we train classifiers using the 2015 data to predict news article interest in 2016 data and 2017 data. We only run this test for mainstream and conspiracy, as they have been very active communities for a long period of time, allowing us to maintain a large and rich data set in all 3 years. Further, we train and test new models on the 2016 data and 2015 data to show performance within those time-frames. These results can be found in Table VI.

Some feature groups can generalize over time, others cannot. In Table VI

(d), we show the performance of each feature group when the model is trained on 2015 data and tested on test sets from all three time frames (2015, 2016, 2017). This test is meant to emulate the performance of a 2015 trained model that is not retrained for 3 years. In general, we see small decreases in performance for each feature group as time moves further away from the trained model. Overall, due to the ever-changing news cycle, this finding is expected. However, these decreases in performance vary for each feature group. The largest decreases in performance come from entity slant and sentiment features, where as the source and entity features have a less significant drop. Considering the large amount of time between each data set, the drop in performance for source features is not very significant, dropping from 0.86 AUC to 0.76 AUC. Interestingly, while the performance for our bias features decreases in the first year, it significantly increases in the second year. This drastic change in performance may show a shift in news producer or news community behavior between 2016 and 2017. Specifically, the model’s knowledge of bias behavior remains in the same direction (conspiracy interested articles being more bias than mainstream interested articles), but the separation between the classes increases, allowing the model to make less mistakes using the 2015 decision boundary. This shift could be consider a phase transition, but more research is certainly required to understand exactly why that transition occurs and what effects this transition may have.

Retraining the models over time can prevent the degrade in performance. Despite certain feature models breakdown in performance, we do see clear signal in the models when they are trained on data from the same time frame. Table VI(a), VI(b), and VI(c) show the ROC curves for each feature group in each time frame. We see that while performance and feature importance differ across the time frames, each one shows fair to excellent performance, illustrating models can be retrained to maintain performance over time. How often a model should be retrained will depend on the accuracy tolerance for a particular application, and could be determined with more granular data.

Vii Conclusions and Discussion

In this paper, we examine community-specific interest in news articles. We construct supervised machine learning models to predict which community will be most interested in a news article using distinct communities on Reddit. Additionally, we assess the concept drift effects on each feature group over a 3 year time frame. Our results show that we can predict community interest with high accuracy, but these models are community dependent, feature group dependent, and can degrade over-time.

These results reveal several strategies to better tackle community-specific interest predictions. First, the strongly community dependent nature of the feature group importance suggests that hierarchical-binary models should be used over standard multi-class models. For example, looking at our 2017 results, we see mainstream articles can be separated from all other communities using bias features and entity features, but bias1, bias2, and conspiracy can only be separated by style, source, and entity-slant features. Hence, a prediction model can be built to first classify a news article as mainstream or not using bias and entity features. If that article is classified as not mainstream, the article can be passed through several binary classifiers which use different feature groups to classify the article into bias1, bias2, or conspiracy. On the other hand, if we used a multi-class model, we would have to include all feature groups and tune the model to all communities, which would likely underfit the data, leading to very poor performance.

Second, models should be critically analyzed for performance loss over time. The change in performance we found in this study illustrates the inherent complexity of the problem. With more granular data, we can learn how often models need to be retrained in order to maintain a certain level of performance. For example, if we test our models on week-long intervals of data, we can determine when the performance loss has degraded too much according to a tolerance threshold. At this threshold point, we can automatically retrain the models. This pattern of performance loss is likely very dependent on the communities themselves, as some may evolve slower than others. Further, performance loss may not be a steady decline, as shown by our bias feature model, which actually has improved performance over the 3 year time frame, suggesting concept drift in these models can be very complex. From a more general point of view, these results suggest that all machine learning models should be put through a time test to ensure desired performance levels are met.

Viii Acknowledgment

This work was partially supported by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF-09-2-0053; The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of ARL, NSF, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.


  • Arapakis et al. [2014] I. Arapakis, B. B. Cambazoglu, and M. Lalmas, “On the feasibility of predicting news popularity at cold start,” in International Conference on Social Informatics.    Springer, 2014.
  • Bandari et al. [2012] R. Bandari, S. Asur, and B. A. Huberman, “The pulse of news in social media: Forecasting popularity.” ICWSM, vol. 12, pp. 26–33, 2012.
  • Dal Pozzolo et al. [2015] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit card fraud detection and concept-drift adaptation with delayed supervised information,” in Neural Networks (IJCNN), 2015 International Joint Conference on.    IEEE, 2015, pp. 1–8.
  • Hessel et al. [2017] J. Hessel, L. Lee, and D. Mimno, “Cats and captions vs. creators and the clock: Comparing multimodal content to context in predicting relative popularity,” in Proceedings of the 26th International Conference on World Wide Web.    International World Wide Web Conferences Steering Committee, 2017.
  • Horne and Adalı [2017] B. D. Horne and S. Adalı, “The impact of crowds on news engagement: A reddit case study,” in NECO Workshop, 2017.
  • Horne et al. [2017] B. D. Horne, S. Adalı, and S. Sikdar, “Identifying the social signals that drive online discussions: A case study of reddit communities,” in International Conference on Computer Communication and Networks (ICCCN).    IEEE, 2017.
  • Horne et al. [2018a] B. D. Horne, W. Dron, S. Khedr, and S. Adalı, “Assessing the news landscape: A multi-module toolkit for evaluating the credibility of news,” in WWW Companion, 2018.
  • Horne et al. [2018b] B. D. Horne, S. Khedr, and S. Adalı, “Sampling the news producers: A large news and feature data set for the study of the complex media landscape,” in ICWSM, 2018.
  • Hutto and Gilbert [2014]

    C. J. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model for sentiment analysis of social media text,” in

    ICWSM, 2014.
  • Jaech et al. [2015] A. Jaech, V. Zayats, H. Fang, M. Ostendorf, and H. Hajishirzi, “Talking to the crowd: What do people react to in online discussions?” arXiv preprint arXiv:1507.02205, 2015.
  • Lakkaraju et al. [2013] H. Lakkaraju, J. J. McAuley, and J. Leskovec, “What’s in a name? understanding the interplay between titles, content, and communities in social media.” ICWSM, vol. 1, no. 2, p. 3, 2013.
  • Lee et al. [2010] J. G. Lee, S. Moon, and K. Salamatian, “An approach to model and predict the popularity of online contents with explanatory factors,” in Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, vol. 1.    IEEE, 2010, pp. 623–630.
  • Lerman and Hogg [2010] K. Lerman and T. Hogg, “Using a model of social dynamics to predict popularity of news,” in Proceedings of the 19th international conference on World wide web.    ACM, 2010, pp. 621–630.
  • Li et al. [2013] H. Li, X. Ma, F. Wang, J. Liu, and K. Xu, “On popularity prediction of videos shared in online social networks,” in Proceedings of the 22nd ACM international conference on Information & Knowledge Management.    ACM, 2013, pp. 169–178.
  • Loper and Bird [2002] E. Loper and S. Bird, “Nltk: The natural language toolkit,” in

    ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics

    , 2002, pp. 63–70.
  • Ma et al. [2013] Z. Ma, A. Sun, and G. Cong, “On predicting the popularity of newly emerging hashtags in twitter,” Journal of the Association for Information Science and Technology, vol. 64, no. 7, pp. 1399–1410, 2013.
  • Mukherjee and Weikum [2015] S. Mukherjee and G. Weikum, “Leveraging joint interactions for credibility analysis in news communities,” in CIKM.    ACM, 2015, pp. 353–362.
  • Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” Journal of Machine Learning Research, vol. 12, no. Oct, 2011.
  • Pennebaker et al. [2001] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, p. 2001, 2001.
  • Piotrkowicz et al. [2017] A. Piotrkowicz, V. Dimitrova, J. Otterbacher, and K. Markert, “Headlines matter: Using headlines to predict the popularity of news articles on twitter and facebook,” in ICWSM, 2017.
  • Recasens et al. [2013] M. Recasens, C. Danescu-Niculescu-Mizil, and D. Jurafsky, “Linguistic models for analyzing and detecting biased language.” in ACL (1), 2013, pp. 1650–1659.
  • Reis et al. [2015] J. Reis, F. Benevenuto, P. V. de Melo, R. Prates, H. Kwak, and J. An, “Breaking the news: First impressions matter on online news,” in ICWSM, 2015.
  • Szabo and Huberman [2010] G. Szabo and B. A. Huberman, “Predicting the popularity of online content,” Communications of the ACM, 2010.
  • Tatar et al. [2011] A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. de Amorim, and S. Fdida, “Predicting the popularity of online articles based on user comments,” in Proceedings of the International Conference on Web Intelligence, Mining and Semantics.    ACM, 2011, p. 67.
  • Tran and Ostendorf [2016] T. Tran and M. Ostendorf, “Characterizing the language of online communities and its relation to community reception,” arXiv preprint arXiv:1609.04779, 2016.
  • Zaman et al. [2014] T. Zaman, E. B. Fox, E. T. Bradlow et al., “A bayesian approach for predicting the popularity of tweets,” The Annals of Applied Statistics, vol. 8, no. 3, pp. 1583–1611, 2014.
  • Žliobaitė [2010] I. Žliobaitė, “Learning under concept drift: an overview,” arXiv preprint arXiv:1010.4784, 2010.