As the prevalence of social media in the world around us increases and the number of users on these online platforms grows, so too grows the rate at which scholarly content is being proliferated and discussed in these venues. More and more, academics are finding it rewarding to look to these platforms for the insight they provide into research problems.
One reason scholars have turned to social media is to measure the influence their work is having in those spaces; this has become known as alternative metrics, or altmetrics (Alhoori and Furuta, 2014); another reason is for the knowledge online platforms provide about human behavior–an area of research known as social-media analytics (Krebs et al., 2018; Bayer et al., 2018; Rajadesingan et al., 2015)
. Studies in social-media analytics tend to focus either on text, using approaches such as Natural Language Processing (NLP), sentiment analysis, or opinion mining to arrive at and support research conclusions(Tian et al., 2017), or on the proliferation of content through online communities (Gabielkov et al., 2016). These approaches have proved effective for understanding or predicting many aspects of human behavior; but they leave a number of other expressive signals unexamined.
Click-based reactions, on the other hand, are a relatively underutilized resource in social-media research. Examples of quick-draw, ready-made expressive features are becoming increasingly prevalent across many platforms, and as such have attracted some amount of attention from researchers in the past few years (Chris Pool, 2016; Basile et al., 2017).
In this paper, we present a new dataset of click-based reactions to scholarly articles on Facebook and use it to gain insight into how users are interacting with scholarly articles on that platform. In addition to information about the articles themselves, our dataset records the count of each click-based feature we could access through Facebook’s Graph API. We use our newly developed dataset to train and test two machine learning algorithms, and our analysis of the results shines some light on surprising relationships between features.
2. Building the dataset
Before going any further, it will be useful to define a few terms and features:
Click-based reactions - non-textual user interactions with shared content–sometimes referred to simply as reactions; includes Facebook Likes and Reactions, Re-shares, and Page visibility (definitions for these last two are below).
Reactions - the five click-based reactions: Love, Amazed, Laughing, Sad, and Angry; will be distinguished from the common term “reaction” by capitalization.
Page visibility - the number of followers a Facebook page has.
Re-shares - the number of times users have re-shared a public post of an article into another location.
The roots of our dataset lie in the online resource Altmetric111https://www.altmetric.com., which tracks the impact scholarly articles have across a variety of social media platforms. We used Altmetric as a “jumping-off point”, querying their API222http://api.altmetric.com/. for information on articles we were interested in and for the public pages onto which they had been shared. It gave us access to the titles, publication dates, subjects, and the URLs of Facebook shares for nearly 1.5 million scholarly articles.
We targeted content shared on Facebook rather than other social-media platforms for several reasons. First, Facebook offers its users a variety of click-based interactions with which they can personalize their response to content; other platforms we considered targeting such as Twitter have more limited palettes of click-based reactions available to users. Second, Facebook’s enormous population of active users increases the likelihood that content shared there will receive more attention: it has about 2.27 billion active monthly users, almost seven times Twitter’s active population of 330 million. Third, the impact of scholarly articles on Twitter users has been the subject of many recent studies (Priem and Costello, 2010; Shuai et al., 2012; Gabielkov et al., 2016), whereas the response to this type of content on Facebook remains largely unexamined.
With our list of Facebook URLs for article shares, we queried Facebook’s Graph API333https://developers.facebook.com/docs/graph-api/. for the reaction counts on each post. Our dataset records their responses, and was collected during the period of December 1-13, 2018. Constraints in the number of queries allowed by Facebook’s API (200/hour) determined the rate at which we could work. The resulting dataset is publicly available on OSF444https://osf.io/4kh7r/. as a comma-separated-value file (CSV).
We limited our collection efforts only toward scholarly articles published in 2017. Choosing this year accomplished three goals: (i) Reactions were released by Facebook in February 2016 (Krug, 2016), so any articles we looked at had to be published after that time to have meaningful data on this feature; (ii) any time a new feature is unrolled, it takes some amount of time for users to learn how to use it; Prah (Shah, 2018) finds that use of Reactions increased from 2.4% of all interactions in April 2016 to 5.8% by June 2016, and up to 12.8% of all interactions by June 2018; by the time of our data collection in December 2018 a large enough subset of users were comfortable expressing themselves with the feature to warrant more scholarly attention; and (iii) by the time we began our data collection, a sufficient interval of time had passed for articles to be widely shared and reacted to (between 11 and 23 months).
Of all the articles tracked by Altmetric, we found 296,052 were published in 2017 and had been shared on Facebook at least once. We eliminated entries that were missing data on the pages to which the articles had been shared; this reduced our set to 135,635 articles. We further limited the scope by selecting only articles with Scopus555https://www.scopus.com/. subjects in the scientific domain. We chose to focus only on articles in the Health Sciences, Physical Sciences, Social Sciences, and Environmental Science. Figure 1
shows that these four categories, article counts fall within one standard deviation of the mean number of articles, as do the total number of Facebook shares (Health sciences is the only exception, exceeding one standard deviation greater than the mean of article counts). It also displays the full list of subjects in all the 2017 articles and gives a sense of their distribution. The mean and two standard deviations are indicated there with blue lines for both axes, and the four subjects we target are indicated with arrows in the plot. Limiting the scope of subjects reduced the number of articles needed to process to just over 31,000. When we removed articles with missing features such as abstract and title, we were left with 11,474 articles: these are the articles recorded in our dataset.
In our data collection process, we took the utmost care to respect Altmetric’s and Facebook’s specifications for how and why their data can be accessed and used and to protect the personal information of social-media users. Our interests are only in the ways that people are interacting in the aggregate with scholarly content on social media platforms–not in specific ways users’ beliefs or opinions may influence their behavior. We recognize that identifying information could in some instances be inferred a posteriori from some of the data we collect; however, our method of data collection does not target anything that could be used to consistently identify individual users and avoids collecting identifying information about individuals.
3. Data Exploration
The click-based features of our dataset are displayed along the axes in Figure 2; also displayed are the Pearson correlation coefficients for all feature pairs. Highly correlated pairs are: Like and Love (), Sad and Angry (), Like and Amazed (), Love and Re-shares (). We can infer that high positive correlation is a sign that users employ features in similar contexts, and that the emotional expressions represented by those features overlap. For example, a Like seems to have a meaning comparable to a Love or (to a lesser extent) an Amazed, or (to an even lesser extent) a Laughing reaction. These relationships may not surprise us because they are all positive emotional states; but other feature pairs that have related expressive values in usage, such as Angry and Sad reactions, are not so intuitive.
Likes and Re-shares are correlated with the most other features; this might be explained by the fact that these two are the oldest reactions–but we also notice they are correlated with other emotionally positive reactions such as Love or Amazed and not with the negative emotions Sad or Angry. It follows that by Liking or Re-sharing a post, a user expresses a positive emotional reaction to its content. Looking at this from another angle, we infer that content that is more likely to inspire a negative reaction from users is less likely to be Re-shared or Liked.
High correlation between features can lead to increased variance in model results. To counter this, modelers often eliminate one of a pair of correlated features. Rather than removing features and losing data in our sparse dataset, we combined Love/Wow and Sad/Angry Reactions into two new composite features for our models.
Low correlation signifies that features have relatively distinct use values. Among the lowest coefficients are Love/Angry () and Laughing/Sad (); this makes intuitive sense, as these reactions nominally encompass opposite emotions. Laughing/Page visibility () is another low-correlation pair, suggesting that articles that inspire humor are more likely to be posted to public pages with relatively low follower counts. It is likely that this relationship may be a result of our choice to limit the articles we include to those in the scientific domain, where humor is an under-utilized affect.
Our dataset also contains outliers in each feature category; to correct for these, we re-scaled the features to a range from 0 to 1, then took the cube root to these new values. Our root normalization function is demonstrated in Equation1; it helped to smooth the distribution of values, raising the lower values by more than it raised the higher values. The result after combination/normalization is displayed in Figure 3.
Even after transformation, our dataset is still sparse–zeros are un-changed by the transformation; yet features with greater variance, such as Visibility or Likes, have less spread between the IQR and outliers. The median value of all Reactions is zero, and non-zero values in those features are all in the fourth quartile. Likes have the largest interquartile range (IQR), though the median is still close to zero. Page visibility and Likes have the highest median values of all features.
4. Supervised-learning models
To explore the relationships in our dataset further, we isolated two feature subsets and trained two supervised learning classification algorithms with them. We used Decision Tree and Random Forest algorithms because of the insight they provide into the relationships between features, and our feature sets are detailed in Table1. We were interested in gaining insight into the extent to which users’ interactions could be related to articles’ subject matter; and so we selected article subjects as the class labels for our models. This gave us four targets for our multiclass classification models to predict.
With the first set (A
) we were interested in finding the extent to which click-based reactions that are immediately available to users on the post itself could be used to estimate an article’s subject. The second set (B) provides insight into how extended click-based features such as Page Visibility can be used to approximate the subject matter of posts.
Table 1 displays the accuracy and Area Under the Curve (AUC) of our models, and Figure 4 shows the results of our models using several different metrics. For reference, scores are shown against the baseline, which represents random guesses at which of the four class labels an article belongs to. Feature set B produced significantly better results than A with both algorithms. Average accuracy of models with feature set B is 160% greater than the baseline, while feature set A is only 58% greater.
Figure 5 shows the relative importance of each feature in our models. In feature set A, Likes have the greatest weight, accounting for 51% of the result on average between the two algorithms; the weight of Re-shares is the second highest importance, accounting for an average of 27% of the result. In feature set B, Visibility is the most important feature accounting for an average of 94% of the result; the remainder of the weight is spread relatively evenly among the remaining features.
5. Discussion and Conclusion
Our new dataset of click-based reactions to scholarly content on Facebook offers a wealth of possibilities for researchers interested in social media analytics. We have demonstrated how it can be used in the exploration of user interactions with scholarly content on Facebook, and how click-based reactions are an effective data source for investigating indicators of user emotional attitudes.
Results from the models trained and tested on our dataset suggest that the number of followers a page has (Visibility) may be predictive of article subject matter; this indicates that there may be patterns in the content shared on Facebook pages and the number of followers these pages have. It may prove useful for researchers to explore the ways in which Facebook page popularity is stratified by the type of content displayed on their pages.
We have also suggested some interpretation of Facebook click-based reactions that are not immediately apparent, notably that Re-shares convey an emotionally positive feelings toward content, and that Sad and Angry Reactions express similar affects. These relationships are not at all obvious, and give us insight into how these features are being used in practice.
- Alhoori and Furuta (2014) Hamed Alhoori and Richard Furuta. 2014. Do Altmetrics Follow the Crowd or Does the Crowd Follow Altmetrics?. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’14). 375–378.
- Basile et al. (2017) Angelo Basile, Tommaso Caselli, and Malvina Nissim. 2017. Predicting Controversial News Using Facebook Reactions. In Proceedings of the Forth Italian Conference on Computational Linguistics (CLiC-it). 28–33.
- Bayer et al. (2018) Joseph Bayer, Nicole Ellison, Sarita Schoenebeck, Erin Brady, and Emily B Falk. 2018. Facebook in context(s): Measuring emotional responses across time and space. New Media & Society 20, 3 (2018), 1047–1067.
- Chris Pool (2016) Malvina Nissim Chris Pool. 2016. Distant supervision for emotion detection using Facebook reactions. PEOPLES@COLING (2016), 30–39.
- Gabielkov et al. (2016) Maksym Gabielkov, Arthi Ramachandran, Augustin Chaintreau, and Arnaud Legout. 2016. Social Clicks: What and Who Gets Read on Twitter?. In ACM SIGMETRICS / IFIP Performance 2016. Antibes Juan-les-Pins, France.
- Krebs et al. (2018) Florian Krebs, Bruno Lubascher, Tobias Moers, Pieter Schaap, and Gerasimos Spanakis. 2018. Social Emotion Mining Techniques for Facebook Posts Reaction Prediction. ICAART 2 (2018), 211–220.
- Krug (2016) Sammi Krug. 2016. Reactions Now Available Globally. https://newsroom.fb.com/news/2016/02/reactions-now-available-globally/. Accessed: Nov. 30, 2018.
- Priem and Costello (2010) Jason Priem and Kaitlin Light Costello. 2010. How and why scholars cite on Twitter. Proceedings of the American Society for Information Science and Technology 47, 1 (2010), 1–4.
- Rajadesingan et al. (2015) Ashwin Rajadesingan, Reza Zafarani, and Huan Liu. 2015. Sarcasm Detection on Twitter: A Behavioral Modeling Approach. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM ’15). ACM, New York, NY, USA, 97–106.
- Shah (2018) Pritam Shah. 2018. Facebook’s new Reactions are being used more - a lot more. https://www.quintly.com/blog/new-facebook-reaction-study. Accessed: Jan. 24, 2019.
- Shuai et al. (2012) Xin Shuai, Alberto Pepe, and Johan Bollen. 2012. How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations. PLOS ONE 7, 11 (11 2012), 1–8.
- Tian et al. (2017) Ye Tian, Thiago Galery, Giulio Dulcinati, Emilia Molimpakis, and Chao Sun. 2017. Facebook sentiment: Reactions and Emojis. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 11–16.