A Large-scale Study of Social Media Sources in News Articles

10/31/2018 ∙ by Md Main Uddin Rony, et al. ∙ 0

In this study, we closely look at the use of social media contents as source or reference in the U.S. news media. Specifically, we examine about 60 thousand news articles published within the 5 years period of 2013-2017 by 153 U.S. media outlets and analyze use of social media content as source compared to other sources. We designed a social media source extraction algorithm and investigated the extent and nature of social media source usage across different news topics. Our results show that uses of social media content in news almost doubled in five years. Unreliable media outlets rely on social media more than the mainstream media. Both mainstream and unreliable sites prefer Twitter to Facebook as a source of information.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Uses of social media content (e.g., Tweets and Facebook posts) in news stories have become a common practice in newsrooms across the world (Broersma and Graham, 2013; Hladík and Štětka, 2017; Paulussen and Harder, 2014)). Journalists quote and paraphrase contents regularly from social media pages. For instance, an article from the National Broadcasting Company (NBC) news 111https://www.nbcnews.com/politics/donald-trump/trump-slams-release-secretly-recorded-cohen-conversation-so-sad-n894401 says- “What kind of a lawyer would tape a client? So sad! Is this a first, never heard of it before?” Trump tweeted. This article has used a social media content (in this case, a Tweet) as a source. According to (Broersma and Graham, 2013), social media contents are being used as source because it is “convenient, cheap and effective”. Some researchers have investigated the extent to which mainstream news media in some European countries used Facebook, Twitter and YouTube contents in news ( (Broersma and Graham, 2013; Hladík and Štětka, 2017; Paulussen and Harder, 2014)). However, such a study on U.S. news media is absent. Also the previous studies were performed over small sample size which limits the scope of the findings. The purpose of this large-scale study is to examine the extent to which mainstream U.S. based news media use social media content (Facebook and Twitter) as sources of information. We also examine similar practices on many online news portals that are popular but considered by many (Wikipedia, 2018; Zimdars, 2016; informationisbeautiful.net, 2016) as unreliable. We compare the social media source usage with respect to traditional source usage. We further investigate the relation between social media source and news topic category.

There is a set of challenges which we had to overcome to conduct this study. First, a large-scale dataset of news articles from mainstream and unreliable U.S. media outlets is not available. There are some datasets that cover only the headlines (Rony et al., 2017) or cover a limited number of media outlets (Times, 2018). However, these datasets are not adequate for this study as our objective is to examine patterns of social media content usage of a range of media categories over a reasonable period of time. For this reason, we carefully collected about thousand news articles which were published within the years from

U.S. media outlets. The next challenge is to identify the used sources in the news articles. Due to the large-scale nature of the data, it is not feasible to extract source and quotes manually from the news articles. So, we depend on automatic source extraction. We design a rule-based classifier that automatically identifies whether a direct quote is sourced from Social Media (Facebook, Twitter) or not with

precision. We further extend the classifier to identify paraphrased quotes (not direct) as well that achieves a precision of . Using these classifiers, we process all the collected news articles and analyze underlying social media source usage patterns within mainstream and unreliable media over the time. Our analysis shows that the practice of sourcing from Facebook and Twitter has doubled within the five years period. In a nutshell, we make the following contributions in this paper- i) we prepared a large dataset of news articles published by mainstream and unreliable U.S. media; there is plan to share this dataset with the research community upon acceptance of the manuscript, ii) we developed an automated social media source classifier and evaluated its performance, iii) using the dataset and the classifier, we analyze the social media source usage patterns in U.S. media. According to our knowledge, no other researchers have explored this before.

This interdisciplinary study is a substantial addition to the literature on the journalistic use of social media as it relates to sourcing practices. Despite the importance of research on this topic articulated in scholarly and professional discussions, no study examined the practices in U.S. media. The current study seeks to fill that gap. By examining this new sourcing practice, the study provides an in-depth look at how social media contents are shaping public discourses in the United States.

2. Related Work

Influences on Sourcing Routine: A number of endogenous and exogenous factors, described by  (Shoemaker and Reese, 2013) as a hierarchy of influences, determine the process of news production. Several scholars used the Hierarchy of Influences model to explain sourcing practices in newsrooms  (Kruikemeier and Lecheler, 2018; Turcotte, 2017; Yamamoto et al., 2017). Some key factors behind source selection include personal relationship, relevance, accessibility or willingness of a source to talk to a reporter, and credibility (Gans, 1979; Reich, 2011).  (Gans, 1979) suggested that a combination of economic and authoritative considerations aimed at producing quality news coverage with limited resources determined the source selection processes in traditional newsrooms such as newspapers, magazines, and network news. The economic consideration refers to efficiency or optimal use of available resources while the authoritative consideration refers to the perceived degree of authority attributed to sources  (Paulussen and Harder, 2014). These considerations often lead journalists to choose sources from a small pool of known sources–mostly government officials and the powerful elites–who had already established their credibility and relevance (Kruikemeier and Lecheler, 2018; Lasorsa and Reese, 1990; Reich, 2011). Social hierarchy appeared in the literature as an undisputed predictor of news sourcing practices. All major theoretical frameworks used in the studies on news sourcing patterns– (Shoemaker and Reese, 2013) ‘hierarchy of influences’,  (Gans, 1979) ‘hierarchies of nation and society’, and  (Becker, 1967) ‘hierarchy of credibility’ –point to the same conclusion: Journalists rely more on institutional sources than ordinary people  (Thurman, 2008; Reich, 2011). Though credibility was apparently the most important factor behind source selection, resource constraints of news organizations and easier access to media had played a key role in establishing elite dominance in news  (Hermida et al., 2014; Gans, 1979; Reich, 2011).

Integration of Social Media in Sourcing: The Internet and new technologies–particularly social media–offer news reporters easy access to a large pool of diverse sources who would otherwise be hard to approach  (Broersma and Graham, 2013; Hermida, 2010). Various studies show that news reporters are increasingly integrating social media content in news  (Broersma and Graham, 2013; Chadwick, 2017; Messner and Distaso, 2008; Parmelee and Bichard, 2011; Paulussen and Harder, 2014; Kruikemeier and Lecheler, 2018). Such content includes quotes and paraphrases from posts on social media such as blogs, Tweets, Facebook, and YouTube posts. A survey of British journalists examined how journalists in UK, Germany, Sweden, and Finland view and use social media  (Gulyás, 2011). The study found that nearly all journalists in the UK (97%) use social media for work, but many journalists are skeptical of the reliability of social media content. News organizations use social media more for publishing and distributing content than for sourcing. Mainstream news media organizations (e.g., BBC, CNN, The New York Times, The Washington Post) have long been integrating social media content into news (Messner and Distaso, 2008). (Paulussen and Harder, 2014) did a study on Belgian newspapers while  (Hladík and Štětka, 2017) studied Czech newspapers and came up with evidence of the use of content from Facebook, Twitter, and YouTube. Though the use of social media content in news has become a trend, not all journalists see social media as an influential news source. (Hedman and Djerf-Pierre, 2013; Yamamoto et al., 2017; Gulyás, 2011) suggest that those who have longer professional experience and those who work for private newspapers consider social media as less influential or less newsworthy. On the other hand, younger editors coming from pluralistic society–especially the ones who work for publicly owned newspapers–see social media content as an influential source.  (Lariscy et al., 2009) found that acceptability of social media as a source is low among business journalists compared to the general trend. In sum, integration of social media content in news continued to grow despite some mild opposition from a section of journalists. Many reporters rely heavily on social media to find and verify information. Worsening financial situation of news organizations, as well as increasing user expectation for news on demand, will continue to force newsrooms to use social media as a source  (Broersma and Graham, 2013). As the integration of social media content in news has become a fait accompli, many scholars debated how this is reshaping the sourcing routine in newsrooms.  (Van Leuven and Deprez, 2017) identified two opposing views that dominated this debate. A section of the literature suggests that social media helped legacy news media diversify its sources and include more voices of ordinary citizens in news. It is, thus, replacing the existing power-to-people hierarchy with a bottom-up approach. Another group of scholars suggested that there was no change in traditional sourcing routine as elite sources, also known as experts  (Freedman et al., 2010), continued to dominate news media on the web.

3. Research Questions

The existing literature covers various aspects of journalistic use of social media. But it lacks systematic research on how U.S. news media deploy social media content in news. This study seeks to address three major aspects of social media content use in news–frequency of use, processing of content, and relation to news topic–and asks the following research questions.

  • RQ1: How often do mainstream and unreliable news websites use Facebook and Twitter content in articles?

  • RQ2: To what extent do mainstream and unreliable news media process Facebook and Twitter content used in articles?

  • RQ3: Does the use of social media source vary for different news topics?

4. Methodology

4.1. Data Collection

We prepared a list of websites of mainstream and unreliable news media based on circulation, rating and popularity among Internet users  (informationisbeautiful.net, 2016; Schneider, 2016; Zimdars, 2016). The list included websites of U.S. mainstream media ( print news media, broadcast), and unreliable media. The unreliable media included websites described as conspiracy, clickbait, satire, and junk science by  (informationisbeautiful.net, 2016) and (Zimdars, 2016). Further details about the mainstream and unreliable media selection process can be found in (Rony et al., 2017). We followed the official Facebook pages of these media and using the Facebook Graph API, we collected up to posts per media per year. These posts were published on Facebook within January 1, 2013 and December 31, 2017. A total of Facebook posts were collected. A Facebook post may contain a photo or a video or a link to an external source. For each post, we collected the headline (title of a video or headline of an article), status type, link to the main article, and the status message. Of the total posts, only contained links to the corresponding news article. For each link, we used a publicly available Python package named Newspaper3k 222https://newspaper.readthedocs.io to collect the news article content. Some links lead to unavailable web page and some links were restricted due subscription limit. At the end, we had articles from mainstream media and from unreliable media. We took a random sample of unreliable news and prepare a balanced dataset of news articles.

4.2. Extraction of Social Media Source

Identifying social media content that is used as a source in a news article is a challenging task. Because, social media contents can be of different forms. For instance, the content can be a text, or image, or a video. In this study, we restricted ourselves to the text format only. Still, the linguistic variations of the way a Tweet or a Facebook post can be cited as a source posed a big challenge. For example, she tweeted, the tweet read in part, took to Twitter- all these patterns can be used to cite a Twitter source. To identify these language variations, we examined a set of news articles manually and carefully curated a list of patterns. Specifically, we took a stratified random sample of news articles which contained any of the following keywords–facebook, twitter, post, tweet. Then, we manually inspected these articles and identified citation patterns ( patterns for Facebook and patterns for Twitter) that were used to cite social media post as a source. Figure 1 shows some examples of these citation patterns. These patterns were used to automatically extract social media sources from the collected data.

4.2.1. Quotation, Paraphrase, and Embedding

We observe, a social media source is often directly quoted fully or partially (for instance, the NBC example in Section 1). Sometimes, a source is processed and paraphrased by the news reporter. For example, she tweeted that she was glad to have lost 6 pounds 333https://www.smh.com.au/entertainment/celebrity/kim-kardashians-flu-weight-loss-tweet-should-have-been-celebrated-not-condemned-20170421-gvpw8l.html. And sometimes, a source is directly embedded 444https://developer.twitter.com/en/docs/twitter-for-websites/embedded-tweets/overview.html without quoting or paraphrasing. Embedding happens mostly in case of Twitter sources and they are easier to identify using a regular expression pattern. We use the above described patterns to identify a source usage and then categorize that into one of these three types- Quotation, Paraphrase, and Embedding. Specifically, first, we segment an article into sentences using the NLTK 555https://www.nltk.org/ python package. Then, for each sentence, we check if it contains one of the patterns. If it matches the embedding regular expression, we categorize the source usage as an Embedding. If it matches with other patterns, we examine if the sentence contains quotation signs (" " or “ ” or ‘ ’ or other similar variants) or not. If it does, we categorize the source usage as a Quotation. Otherwise, we categorize it as a Paraphrase. We did not consider Facebook embedding as we found that Facebook posts were rarely embedded in web pages.

Category Precision Recall F1
Quotation 89.80% 73.33% 86.21%
Paraphrase 94.34% 79.37% 80.73%
Embedding 100% 100% 100%
Macro-average 94.71% 84.23% 88.98%
Micro-average 97.85% 92.62% 95.16%
Table 1. Performance of Social Media Source Identification
Figure 1. Most frequent Twitter and Facebook patterns

4.2.2. Performance Evaluation

To evaluate the performance of the above explained social media source identification and categorization program, we randomly sampled 100 news articles where the program found at least one social media source and another 100 random news articles where the program didn’t find any social media source. Then, we manually inspected the articles to identify the social media sources and their categories. Our manual analysis found that in total there were 393 social media source usages (60 Quotations, 63 Paraphrases, and 270 Embeddings). Our program found in total 372 sources (49 Quotations, 53 Paraphrases, and 270 Embeddings). Out of these 372 sources, 364 (44 Quotations, 50 Paraphrases, and 270 Embeddings) were accurate and 8 were falsely marked as a social media source. Out of the 393 true media sources, the program could not identify 29 sources. Table  1 shows recall and precision of the program. As Embedding follows a fixed pattern, the program could identify all the embeddings correctly. The reason behind lower recall for Quotation and Paraphrase is the use of uncommon linguistic patterns (e.g., “put his two cents on Twitter”, “137 characters of angry tweet”) by writers. To avoid over-fitting, we decided not to add these uncommon linguistic patterns in our list of patterns.

5. Analysis

Using the collected data and the developed source identification program, we answer the research questions posed in Section 3.

5.1. RQ1: Use of Social Media as Source

RQ1 asks how often do mainstream and unreliable news websites use Facebook and Twitter content in articles? We applied the social media source identification program on all the mainstream and unreliable articles. Table 2 shows results of this application. In total, we find that 5,430 articles (9.15% of all data) contained at least one social media post as a source. Note that an article may contain both Facebook and Twitter source. However, a major portion of the articles use Twitter (4,824) as a source compared to Facebook (701). The underlying reason could be that the Tweets are generally public whereas Facebook posts are not. Also, it is convenient to embed a Tweet whereas Facebook post embedding is rarely seen in news articles. Moreover, politicians and celebrities use Twitter more frequently than they do Facebook to engage with the people. We also find that the unreliable organizations use social media posts as a source more often than the mainstream media. Of the above mentioned 5,430 articles, 6.68% belong to mainstream and 11.61% (almost double) belong to unreliable. In total, there are 4,207 social media sources in 1,982 mainstream articles (2.12 source per article) and 12,436 sources in 3,448 unreliable articles (3.61 source per article). We observe that even though Twitter dominates Facebook in terms of source usage, the mainstream media use Facebook sources more often than unreliable media outlets. 10.29% of all social media contents in mainstream articles are sourced from Facebook whereas only 2.85% of social media sources in unreliable articles are from Facebook.

Media Type
#Articles Contain
SM Source
Twitter Source Facebook Source
# of Articles Quotation Paraphrase Embedding
# of Articles Quotation Paraphrase
Mainstream 29656 1982 (6.68%) 1654 1065 (28.22%) 866 (22.95%) 1843 (48.83%) 3774 (89.71%) 377 228 (52.66%) 205 (47.34%) 433 (10.29%) 4207
Unreliable 29700 3448 (11.61%) 3170 1137 (9.41%) 1130 (9.35%) 9814 (81.24%) 12081 (97.15%) 324 178 (50.14%) 177 (49.86%) 355 (2.85%) 12436
Total 59356 5430 (9.15%) 4824 (88.84%) 2202 1996 11657 15855 701 406 382 788 16643
Table 2. Social media (Twitter and Facebook) content usage as a source by mainstream and unreliable media
Topic # Articles with Social Media Source Mainstream Unreliable
# Articles Quotation Paraphrase Embedding # Articles Quotation Paraphrase Embedding
Politics 2080 369 377 (47.72%) 238 (30.13%) 175 (22.15%) 1711 665 (9.77%) 648 (9.52%) 5495 (80.71%)
Arts & Entertainment 798 491 303 (26.84%) 241 (21.35%) 585 (51.82%) 307 197 (14.71%) 116 (8.66%) 1026 (76.62%)
Sensitive Subject 784 300 187 (32.86%) 165 (29%) 217 (38.14%) 484 217 (13.79%) 172 (10.93%) 1185 (75.28%)
Law & Government 340 112 69 (34.85%) 73 (36.87%) 56 (28.28%) 228 75 (11.28%) 74 (11.13%) 516 (77.59%)
Sports 283 213 97 (17.05%) 380 (66.78%) 92 (16.17%) 70 30 (17.05%) 28 (15.91%) 118 (67.04%)
People & Society 239 68 22 (18.18%) 38 (31.40%) 61 (50.41%) 171 64 (11.79%) 66 (12.15%) 413 (76.06%)
Health 76 25 11 (39.29%) 9 (32.14%) 8 (28.57%) 51 21 (18.58%) 26 (23.01%) 66 (58.41%)
Table 3. Processing of social media sources in different news topics

We further examine how the social media source usage has evolved over the time. We categorized the articles per year. The distribution of articles over the years is as follows- ’2013’: 7,176, ’2014’: 10,725, ’2015’: 14,585, ’2016’: 12,694, ’2017’: 14,176. We observe that the practice of citing social media content is increasing over time. For example, in 2013, about 3.85% articles (276 out of 7,176) used social media as a source whereas in 2017, the percentage was about 15.05% (2,134 out of 14,176 articles). For each year, figure  2 shows the percentage of articles from mainstream and unreliable media that use social media (Facebook/Twitter) as a source. The practice is increasing in both categories.

Media Type
# Direct Quote
(Avg. Per Article)
# Social Media
Mainstream 201924 (6.81) 4207 1:48
Unreliable 185182 (6.23) 12436 1:14.89
Table 4. Social media source vs. all direct quotations

We also study the extent of social media source usage with respect to all kinds of sources including regular, non-social media based sources (e.g., interview, book, press release). For instance, “I’m just going to pay my respects,” Trump told Fox News on Monday night 666https://www.reuters.com/article/us-pennsylvania-shooting/trump-to-visit-pittsburgh-amid-funerals-calls-for-him-to-stay-away-idUSKCN1N418P)– is an example of a direct quote from a source which is not from social media. Automatically extracting all kinds of sources is a very challenging task as a source can be directly quoted or paraphrased using many linguistic variations. In this study, we restricted the comparison among direct quotes only which are relatively easier to extract automatically. Details of our direct quote extraction method can be found in (Muzny et al., 2017; Manning et al., 2014). Table 4 shows the comparison between use of direct quotations and social media source. On average, a mainstream article uses more direct quotations (6.81 quotes per article) compared to an unreliable article (6.23 quotes per article). However, an average unreliable article use one social media source for every 14.89 direct quotes where an average mainstream article uses significantly lower number of social media sources (one for every 48 direct quotes).

Figure 2. Social media source usage is increasing over time

5.2. RQ2: Processing of Social Media Content

RQ2 asks to what extent do mainstream and unreliable news media process Facebook and Twitter content used in articles? We wanted to understand whether the media outlets embeds a source (Embedding), or directly quote a source (Quotation), or paraphrases a source (Paraphrase) . Using the program described in Section 4.2.1, we examine how media processes a source. Table 2 summarizes the results. We observe that in case of Twitter, both media tends to process the sources more as Embeddings rather than Quotation or Paraphrase. The unreliable media uses Quotation and Paraphrase almost equally whereas mainstream media uses more Quotation than Paraphrase. In case of Facebook sources, the distribution of Quotation versus Paraphrase is more balanced. We also infer that irrespective of social platforms (Twitter, Facebook), mainstream media uses more social media source as Quotations compared to unreliable media, though the difference is significant for Twitter.

Media Type Topic
# Articles
# Articles with
Social Media Source
Mainstream Arts & Entertainment 5943 491 (8.26%)
Sensitive Subjects 3391 300 (8.85%)
Law & Government 2793 112 (4.01%)
Sports 2592 213 (8.22%)
Politics 2389 369 (15.45%)
Unreliable Politics 7104 1711 (24.09%)
Sensitive Subjects 3790 484 (12.77%)
People & Society 2889 171 (5.92%)
Law & Government 2835 228 (8.04%)
Health 2546 51 (2%)
Table 5. Extent of social media source usage in the top-5 news topics for each media

5.3. RQ3: Relation With News Topic

RQ3 asks does the use of social media source vary for different news topics? To answer this question, we categorize each of the articles into topics using Google Cloud Natural Language API 777https://cloud.google.com/natural-language/. A complete list of the topics can be found here https://cloud.google.com/natural-language/docs/categories. Table 5 shows the top-5 topics for each media category. These topics cover 58% and 64% of all the mainstream and unreliable articles, respectively. Both media use social media source in Politics related news more often than in other news topics. Also, in all the three common topics (Sensitive Subjects, Law & Government, and Politics), unreliable media uses more social media source compared to mainstream media. We further investigate the processing of social media source in the top-5 topics. Table 3 shows the distribution of social media source processing categories (Quotation, Paraphrase, Embedding) for these seven (union of top-5 topics in each media category) topics– Politics, Arts & Entertainment, Sensitive Subject, Law & Government, Sports, People & Society, and Health. In all these seven categories, unreliable media processes social media sources as Embeddings significantly more compared to mainstream media. On the other hand, mainstream media uses Quotation more often compared to unreliable media in six of these seven categories.

6. Discussion

The main objective of this study is to provide a glimpse of how deeply social media content has been rooted in news. It sought to understand the extent to which online news media uses Facebook and Twitter as a source of information. It also compared differences between mainstream news media and unreliable media as they use social media as a news gathering tool. The study has been longitudinal in nature since it provides an overview of five years. Following is the discussion of some major findings.

First, the data identified several patterns that support previous studies suggesting Twitter, among other social networking sites, is the preferred source of information to online news and informational content creators  (Broersma and Graham, 2013; Paulussen and Harder, 2014). For instance, the findings suggest that both mainstream and unreliable news websites generally prefer Twitter to Facebook as a news source. Second, we find that the number of social media sources in news is increasing by a large number every year. This number has almost doubled in five years. This finding confirms previous research suggesting that uses of social media in news were increasing  (Broersma and Graham, 2013; Chadwick, 2017; Kruikemeier and Lecheler, 2018; Messner and Distaso, 2008; Parmelee and Bichard, 2011; Paulussen and Harder, 2014). Third, the results show that unreliable websites are more dependent on social media than the mainstream media, which speaks of their weak organizational structure and lack of resources to produce quality news content  (Atton and Wickenden, 2005). This also supports the assumption that they rely on free and biased content to fill pages and shore up their agenda. In general, the study has several contributions to the journalism literature. First, it gives an overview of social media content uses in U.S. Second, the study examined five years of data showing a steady increase in citing social media sources over time. Previous research only assumed this, but the current study conducted a systematic study on a large data set. Third, this study also examined practices of unreliable media websites that are rarely studied.

However, this study has several limitations. For instance, we only considered Facebook and Twitter as social media platforms where there are many other popular sites as well such as Instagram, YouTube, Tumblr, etc. However, we believe Facebook and Twitter are the most prominent social networks used nowadays. Also, our source and quotation identification methods are simple, yet accurate, as explained in the method section. These could be further improved by applying machine learning and natural language processing techniques. In future, we want to overcome these limitations and at the same time explore other possible directions.


  • (1)
  • Atton and Wickenden (2005) Chris Atton and Emma Wickenden. 2005. Sourcing routines and representation in alternative journalism: A case study approach. Journalism Studies 6, 3 (2005), 347–359.
  • Becker (1967) Howard S Becker. 1967. Whose side are we on? Social problems 14, 3 (1967), 239–247.
  • Broersma and Graham (2013) Marcel Broersma and Todd Graham. 2013. Twitter as a news source: How Dutch and British newspapers used tweets in their news coverage, 2007–2011. Journalism practice 7, 4 (2013), 446–464.
  • Chadwick (2017) Andrew Chadwick. 2017. The hybrid media system: Politics and power. Oxford University Press.
  • Freedman et al. (2010) Eric Freedman, Frederick Fico, and Megan Durisin. 2010. Gender diversity absent in expert sources for elections. Newspaper Research Journal 31, 2 (2010), 20–33.
  • Gans (1979) Herbert Gans. 1979. Deciding What’s News (New York. Pantheon 241 (1979).
  • Gulyás (2011) A Gulyás. 2011. Perceptions and Use of Social Media among Journalists in the UK. (2011).
  • Hedman and Djerf-Pierre (2013) Ulrika Hedman and Monika Djerf-Pierre. 2013. The social journalist: Embracing the social media life or creating a new digital divide? Digital Journalism 1, 3 (2013), 368–385.
  • Hermida (2010) Alfred Hermida. 2010. Twittering the news: The emergence of ambient journalism. Journalism practice 4, 3 (2010), 297–308.
  • Hermida et al. (2014) Alfred Hermida, Seth C Lewis, and Rodrigo Zamith. 2014. Sourcing the Arab Spring: A case study of Andy Carvin’s sources on Twitter during the Tunisian and Egyptian revolutions. Journal of Computer-Mediated Communication 19, 3 (2014), 479–499.
  • Hladík and Štětka (2017) Radim Hladík and Václav Štětka. 2017. The powers that tweet: Social media as news sources in the Czech Republic. Journalism Studies 18, 2 (2017), 154–174.
  • informationisbeautiful.net (2016) informationisbeautiful.net. 2016. Unreliable/Fake News Sites & Sources. https://docs.google.com/spreadsheets/d/1xDDmbr54qzzG8wUrRdxQl_C1dixJSIYqQUaXVZBqsJs. (2016).
  • Kruikemeier and Lecheler (2018) Sanne Kruikemeier and Sophie Lecheler. 2018. News consumer perceptions of new journalistic sourcing techniques. Journalism Studies 19, 5 (2018), 632–649.
  • Lariscy et al. (2009) Ruthann Weaver Lariscy, Elizabeth Johnson Avery, Kaye D Sweetser, and Pauline Howes. 2009. An examination of the role of online social media in journalists’ source mix. Public relations review 35, 3 (2009), 314–316.
  • Lasorsa and Reese (1990) Dominic L Lasorsa and Stephen D Reese. 1990. News source use in the crash of 1987: A study of four national media. Journalism Quarterly 67, 1 (1990), 60–71.
  • Manning et al. (2014) Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55–60. http://www.aclweb.org/anthology/P/P14/P14-5010
  • Messner and Distaso (2008) Marcus Messner and Marcia Watson Distaso. 2008. The source cycle: How traditional media and weblogs use each other as sources. Journalism Studies 9, 3 (2008), 447–463.
  • Muzny et al. (2017) Grace Muzny, Michael Fang, Angel X. Chang, and Dan Jurafsky. 2017. A Two-stage Sieve Approach for Quote Attribution. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL). https://nlp.stanford.edu/pubs/muzny2017twostage.pdf
  • Parmelee and Bichard (2011) John H Parmelee and Shannon L Bichard. 2011. Politics and the Twitter revolution: How tweets influence the relationship between political leaders and the public. Lexington Books.
  • Paulussen and Harder (2014) Steve Paulussen and Raymond A Harder. 2014. Social media references in newspapers: Facebook, Twitter and YouTube as sources in newspaper journalism. Journalism Practice 8, 5 (2014), 542–551.
  • Reich (2011) Zvi Reich. 2011. Source credibility and journalism: Between visceral and discretional judgment. Journalism Practice 5, 1 (2011), 51–67.
  • Rony et al. (2017) Md Main Uddin Rony, Naeemul Hassan, and Mohammad Yousuf. 2017. Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects?. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 232–239.
  • Schneider (2016) Schneider. 2016. Most Watched Television Networks: Ranking 2016’s Winners and Losers. IndieWire. http://www.indiewire.com/2016/12/cnn-fox-news-msnbc-nbc-ratings-2016-winners-losers-1201762864/. (2016).
  • Shoemaker and Reese (2013) Pamela J Shoemaker and Stephen D Reese. 2013. Mediating the message in the 21st century: A media sociology perspective. Routledge.
  • Thurman (2008) Neil Thurman. 2008. Forums for citizen journalists? Adoption of user generated content initiatives by online news media. New media & society 10, 1 (2008), 139–157.
  • Times (2018) The New York Times. (accessed October 29, 2018). The New York Times Annotated Corpus. https://catalog.ldc.upenn.edu/ldc2008t19
  • Turcotte (2017) Jason Turcotte. 2017. Who’s Citing Whom? Source Selection and Elite Indexing in Electoral Debates. Journalism & Mass Communication Quarterly 94, 1 (2017), 238–258.
  • Van Leuven and Deprez (2017) Sarah Van Leuven and Annelore Deprez. 2017. ‘To follow or not to follow?’: How Belgian health journalists use Twitter to monitor potential sources. Journal of Applied Journalism & Media Studies 6, 3 (2017), 545–566.
  • Wikipedia (2018) Wikipedia. (accessed September 24, 2018). List of fake news websites. https://en.wikipedia.org/wiki/List_of_fake_news_websites
  • Yamamoto et al. (2017) Masahiro Yamamoto, Seungahn Nah, and Deborah Chung. 2017. US Newspaper Editors’ Ratings of Social Media as Influential News Sources. International Journal of Communication 11 (2017), 17.
  • Zimdars (2016) Melissa Zimdars. 2016. False, Misleading, Clickbait-y, and/or Satirical “News” Sources. https://docs.google.com/document/d/10eA5-mCZLSS4MQY5QGb5ewC3VAL6pLkT53V_81ZyitM/preview. (2016).