The communication of health and medical research online provides a critical resource for the public. More than three-quarters of the UK public report an interest in biomedical research, with 42% having actively sought out content relating to medical or health research in 2015 . Nearly all searches for health information take place online via search engines [3, 17, 9, 11], making Internet searches a common way for people to engage with medical research and associated media communications, and have the potential to alter their healthcare beliefs and decisions .
Research communications in news and social media face several challenges. Studies with fewer participants and of lower methodological rigour are more common in news media [13, 32], and research from authors with conflicts of interest tend to receive more attention in news and social media . As many as half of all news reports manipulate or sensationalise study results to emphasise the benefits of experimental treatments .
Despite issues with the reliability of health information online, most people trust what they encounter [10, 11], and are inconsistent in their efforts to validate health information using appropriate sources [8, 11], likely because they find it difficult to do so. Where attempts to assess the credibility of health information are made, the visibility and accessibility of sources such as scientific research articles are an important criterion by which users assess the quality of online health communications [8, 11]. Individuals are also subject to order-effect biases that impact their perception of the evidence presented by online communications of health research , and tend to believe information that aligns with their current knowledge of a health topic .
The representation of medical research in the public domain is particularly important in relation to vaccination, where vocal critics actively seek to erode trust the safety and effectiveness of vaccines and immunisation programs. In 2019, the World Health Organisation listed vaccine hesitancy—the reluctance or refusal to vaccinate—as one of the ten most significant threats to global health . There is a clear risk that the misrepresentation of scientific evidence and amplification of misinformation by social media may be major contributing factors to further outbreaks of these diseases in future .
The rise of vaccine hesitancy as a global public health issue is in part driven by the increased pervasiveness of anti-vaccination sentiment in search engine results  and the mainstream news media , as well as the growth of social media as a platform for the provision of a diverse range of information sources to the public . Discussion of the safety and efficacy of vaccines is a common theme in news reports and low-quality information is common . On webpages specifically advocating against vaccination, the majority cite safety risks including illness, damage, or death [2, 19].
To be able to identify biases and misrepresentation in the communication of health research online, we need to be able to quickly identify the original source literature for that research. While existing services such as Altmetric (https://www.altmetric.com/) can be used to identify links to scientific source material using Digital Object Identifiers (DOIs), Uniform Resource Locators (URLs), or other identifiers such as PubMed IDs (PMIDs), in most cases these identifiers must be embedded in hyperlinks to enable their tracking. Other media services that offer more complete tracking of media mentions of research tend to be for-profit subscription services that support organisations wanting to keep track of their research outputs. These services are source-centric—they start with a research article and track the media that references it—and may not easily support use cases where a member of the public is interested in accessing the source material that informs the health research communications they access online.
Our aim was to evaluate methods for automatically identifying source literature by recommending articles for webpages communicating vaccination research to the public. To do this, we made use of a large set of reported links between vaccination-related webpages and the scientific literature they reference tracked by Altmetric.
2.1 Study data
The study data comprised a set of research articles from PubMed linked to a set of webpages via Altmetric. To construct the corpus of research articles from PubMed, we retrieved all articles from PubMed by searching for “vaccine”, automatically expanded to include searches for the plural form and “vaccine” as a Medical Subject Heading (MeSH) term. Title and abstract text for each article were extracted using the National Center for Biotechnology Information (NCBI) E-Utilities Application Programming Interface (API). Any PubMed articles that did not include at least 100 words after concatenating title and abstract were excluded from the analysis, and the remaining 207,538 articles formed the PubMed corpus (Figure 1). The search was conducted in July 2018.
We then used the Altmetric API to identify the set of all news articles and non-social media blog posts that linked to one or more of the articles in the PubMed corpus. Crawling each URL to access the webpages, contiguous blocks of text from the webpages were concatenated to form the basis of the data used in the following analyses. Webpages were excluded if they did not include at least 100 words of text, as were any identified as non-English using the Google Code language-detection library (https://code.google.com/p/language-detection/). We also excluded web articles with significant amounts of duplicate text. This was common where articles were published on multiple online platforms owned by a single entity, often with only minor changes in title, content, or formatting. To remove these duplicates, we extracted webpages for which the longest common substring between any two records linked to a PMID was greater than 50% of the total length of the longest webpage, before randomly selecting webpages such that no PMID was mapped to any number of highly similar web articles. Text from the set of webpages was accessed in July 2018.
The resulting dataset included 207,538 research articles, of which 4,333 had known links to one or more of 8,458 distinct webpages (Figure 1). There were 1,934 articles that were referenced on two or more webpages, with one article referenced by 98 non-duplicate webpages. Conversely, 1,418 webpages referenced 2 or more articles, one of which had known links to 68 of the articles in the PubMed corpus. To generate a final set of reported links for which no webpage or article was represented more than once, we first selected any article and webpage pairs for which the corresponding PMID and URL were both present only once in the dataset (1:1 links). For each of the remaining articles, we instead selected the linked webpage with the greatest number of words and not yet present in final corpus. This resulted in a final set of 3,573 PMID-URL pairs of individually linked articles and webpages, which we refer to as the known links set.
2.2 Feature extraction and dimensionality reduction
To generate a term-based vector representation of each of the linked articles and webpages, we pre-processed each document by removing punctuation and words consisting entirely of numeric characters. We then used the remaining words to construct a vocabulary of terms common to both corpora (terms that existed in at least one research article and at least one webpage).
Each article or webpage was then represented as a vector of numeric values based on one of three standard vector representations: binary, term frequency (TF), and term frequency-inverse document frequency (TF-IDF). Binary vectors were generated by recording the presence (value = 1) or absence (value = 0) of vocabulary terms in each document. The TF vector representation was defined as a count of the number of times each word appeared in the document. The TF-IDF score is given by the log-transformed TF value multiplied by the inverse of the log-transformed proportion of documents in which the feature was present. In contrast to term frequency, TF-IDF weights vary depending on how common the term across the entire corpus, based on the assumption that words appearing more often in fewer documents (like the name of a specific vaccine or the outcomes measured in a research study) are likely to be more informative, while those that appear often across many documents (like “and”, “the”, or “vaccination”) are less informative.
In information retrieval methods, sparse representations of documents may be less useful for measuring document similarity or finding documents relevant to a search. This is expected in particular for short documents. To address issues of sparsity, dimensionality reduction methods either remove features that are expected to be less useful or transform the vector space representation into fewer dimensions.
We evaluated the use of two approaches. The first was a simple feature reduction method that uses threshold parameters. Features were removed by applying the maximum document frequency limit of 0.85 to the combined corpora vocabulary. As a result, those terms common to more than 85% of articles and webpages in the corpus were excluded from the term-based vector representation.
For the second dimensionality reduction approach we used truncated singular value decomposition (T-SVD). T-SVD works in a similar way to singular value decomposition (SVD) by decomposing a matrix into a product of matrices that contain singular vectors and singular values. The singular values can be used to understand the amount of variance in the data captured by the singular vectors. T-SVD allows more efficient computation than SVD since T-SVD approximates the decomposition by only considering a select few components, specified as an argument to the algorithm.
2.3 Ranking methods
To measure the similarity between webpages and articles we used cosine similarity, a standard measure that is common across most applications of document similarity. For each webpage, we calculated the cosine similarity to all 205,037 articles in the test portion of the final document corpus to produce a ranked list.
We expected that there would be consistent differences between the language style used in article titles and abstracts, compared to that used in online research communications. For example, we expected that communications would replace technical jargon with simpler synonyms. Canonical correlation analysis (CCA)  is an algorithm designed to identify linear combinations of maximally correlated variables between complex, multivariate datasets. CCA captures and maps the correlations between two sets of variables into a single space, and thus the comparison for ranking can be made using a standard similarity measure. CCA is used to analyse a joint dimensionality reduction across different spaces (e.g., text and images, text and text, etc.) [27, 31]. As a result, the CCA approach could be used to learn the alignment between the terms used in the articles and the terms used to describe the same concepts in research communications presented online. To test the CCA approach, we added it as an extra process in the pipeline, using training data to construct a transform (a matrix that may modify the number of features), and then apply that transform to the testing data before calculating the distance (Figure 2).
2.4 Experiments and outcome measures
While standard document similarity methods typically do not need to be constructed on one set of data and tested on another, the CCA approach learns an alignment between articles and webpages based on a set of training data, and its ability to generalise to unseen data is best tested on a separate dataset. To examine the effect of adding CCA to the pipeline, we constructed training and testing sets by randomly assigning each PMID-URL pair. The resulting training dataset comprised 70% or 2,501 of the known links, with the remaining 30% of PMID-URL pairs allocated to the testing set. To replicate the work of searching a large corpus or database for relevant scientific publications, we also added the 203,965 eligible articles not already captured in either the training or testing datasets, resulting in a testing set of 1,072 linked articles and webpages plus the set of 203,965 articles with no linked webpages.
The set of experiments were split into two phases. In the first phase, we examined how differences in the vector space representations might affect the performance of the ranking methods, comparing the binary, TF, and TF-IDF representations in combination with either threshold or T-SVD feature reduction. In the second, we tested the effect of transforming the best performing feature representation using CCA.
We used three related performance measures to evaluate the success of each of these approaches in correctly linking articles to the webpages that reference them. For each of the 1,072 webpages in the testing set we ranked its similarity to each of the 205,037 available articles and computed the median rank of the correct article across all webpages. The second measure was the percentage of known links correctly identified for each experimental group (i.e. the number of webpages for which the matching source article was ranked first out of all possible 205,037 articles, recall@1). Finally, we calculated the percentage of links ranked within the top 50 PubMed articles in the testing set (i.e. recall@50), representing the ability to include the correct article within the first page of results returned. Finally, we visualised the results by plotting recall@N for all values between 1 and the total number of articles in the testing dataset.
All methods and experiments used Python 3.6. The code is on GitHub (https://github.com/lizaharrison/web2pubmed).
Among the 207,538 articles that were returned by the search and met the inclusion criteria for the analysis, 4,333 had one or more links to webpages recorded by Altmetric and were also eligible for inclusion in study analyses. The most popular article was used as source information on 98 webpages, while 22% (2,535 of 11,319 known links) were used as source information on one webpage (Figure 3). To construct a representative dataset in which no article or webpage was represented more than once, we selected a final set of 3,573 PMID-URL pairs.
Within this final set of 3,573 articles and webpages with known PMID-URL links and 203,965 additional articles with no known links, we identified 41,810 terms used at least once in both the set of webpages and the set of articles. Where we applied threshold parameters (limiting the vocabulary to exclude terms used in at least 85% of corpus documents), this vocabulary was reduced to 39,942 terms, representing the greatest number of features used in the following analyses. For experiments instead using the T-SVD method of feature reduction, the number of terms retained in the dataset varied between 100 and 1,600.
Of the methods of representing the text of articles and webpages, we observed that TF-IDF consistently produced the highest performance (Table 1). Regardless of the feature reduction approach used, experiments using the TF-IDF representation of document text outperformed the binary and TF representations.
Of the two feature reduction methods, the threshold approach outperformed the T-SVD approach for all outcome measures (Table 1). However, because the performance improved roughly linearly as the number of T-SVD components was increased, the results suggest that the number of features used may be a more important factor than the choice of feature reduction method. Overall, the highest performance was achieved using TF-IDF to represent the text as term features and the threshold to reduce the number of features. In the testing dataset, the method ranked the correct source article first for more than one in four webpages and placed the correct source article in the top 50 ranked candidate articles for more than half of the webpages.
|Feature representation & reduction methods||Median rank (IQR)||Percentage correct||Percentage correct in top 50|
|T-SVD (100 components)|
|T-SVD (200 components)|
|T-SVD (400 components)|
|T-SVD (800 components)|
|T-SVD (1600 components)|
*Experiments for which results have also been included in Table 2. IQR: interquartile range; TF: term frequency; TF-IDF: term frequency-inverse document frequency; T-SVD: truncated singular value decomposition.
The addition of CCA was expected to improve the performance of the method by finding an alignment between the terms used in the webpages and articles rather than exact matches between terms. We found that while adding CCA to the process improved the performance for experiments where the number of T-SVD components was relatively low (Table 2). However, as we increased the number of T-SVD components above 400, the improvements gained from adding CCA started to diminish, indicating that the maximum gain in performance from adding CCA was achieved for the experiment that used 400 T-SVD components transformed into 200 feature dimensions by the trained CCA model, where for 38.0% of the webpages, the correct source article was placed within the top 50 ranked candidates (Figure 4). As the number of feature dimensions used was increased further, the approach then failed because the CCA failed to converge because of the sparsity of the feature space. Overall, the results show that we were able to identify a maximum performance within the parameter space for which the CCA approach could be used, but that none outperformed the simpler approach that used thresholds rather than T-SVD and did not use CCA (Figure 5).
|Method (CCA dimensions)||Median rank (IQR)||Percentage correct||Percentage correct in top 50|
|100 T-SVD components|
|No CCA*||2768 (203.5-24884.5)||7.0||16.79|
|200 T-SVD components|
|No CCA*||1513 (84.75-15572.25)||9.7||22.48|
|400 T-SVD components|
|No CCA*||720 (36-9674.25)||12.59||27.61|
|800 T-SVD components|
|No CCA*||385.5 (13-6211.25)||15.02||33.49|
|1600 T-SVD components|
|No CCA*||219 (6-4145.75)||17.44||37.13|
*Experiments for which results also appear Table 1. †Experiments in which the CCA did not converge. IQR: interquartile range; CCA: canonical correlation analysis; t-SVD: truncated singular value decomposition.
In this study we evaluated methods that could be used as part of tools that support the identification of missing links between research communications and the source literature they use. We used vaccination research as an example application domain where there are common problems with bias and misrepresentation in research communications. We started with the assumptions that many webpages are not reliably connected to the research on which they are based, and that readers may not have the time or expertise to construct a search query to identify relevant articles in bibliographic databases. We tested methods that seek to circumvent the need for expert construction of search queries and instead automatically recommend articles that are likely to be relevant. While the use of a CCA-based approach did not outperform our baseline methods, the results suggest that such tools are likely feasible.
4.1 Methods for automatic recommendations from text
We tested two standard information retrieval methods and found that the simpler approach using a TF-IDF representation and a maximum document frequency limit outperformed a more sophisticated approach of transforming the feature space using CCA. While we know of no previous studies that have developed tools for the same purpose, the structure of the problem is common. The combination of TF-IDF and cosine distance has previously been used to identify missing links between trial registrations on ClinicalTrials.gov and articles in PubMed reporting trial results . Similarly, the use of TF-IDF has been shown facilitate the detection of similarities between patent documents and scientific publications , The results were consistent with ours—increasing the number of SVD components improved the accuracy but the best performance was achieved without the use of SVD.
There are a range of other more complex approaches that could be applied to a problem of this structure: the identification of missing links between two distinct sets of documents that may be matched using similarity of content and a relatively sparse bipartite graph connecting the two sets of documents. These might include alternative feature representations like pre-trained language models, word embedding, or both [1, 15, 28, 30]; as well as other algorithms for recommendation or ranking related to collaborative filtering [16, 21], and learning-to-rank methods [18, 25].
An expert might take an alternative approach to manually identifying source articles from research communications, making use of specific information including the names of authors, institutions, or journals. Rule-based approaches that make use of this information may yield improvements. Other similar approaches might make use of the date of publication extracted from webpages and articles in bibliographic databases, under the assumption that online communications of research tend to be reported soon after the research is published.
4.2 Implications and future applications
The results indicate that it is likely feasible to build a tool that could be used to help find missing links between health research communications and source literature for the purpose of checking the veracity of the communications and identifying biases. One way to operationalise this type of tool would be to develop browser plugins that automatically augment webpages with a list of recommended relevant peer-reviewed research. Hyperlinks might be added to the terms or phrases that most contribute to the recommendation based on the weights of the terms that contribute to the similarity.
are designed to be used to manually evaluate the credibility of health information and health research communications, but little work has been done to use these checklists as the basis for automatically estimating the credibility of webpages. We know of no studies that have attempted to automatically compare the text of research communications with the abstract or full text of research articles to detect specific differences that might be indicative of misrepresentation of distortion of research conclusions. For example, tools able to identify scenarios where studies of association are written as causation in communications would be of clear benefit, particularly when discussing vaccination [20, 29].
Tools extending the work we present here could also be used to help educate non-experts on when it is appropriate to search for source articles when reading research communications online, and to train them on how to construct useful search queries. First, the distances to the top-ranked articles might be suggestive of whether the text on a webpage is based on any form of peer-reviewed research. This could be used to indicate a common practice in anti-vaccine blogs where writers provide circular links within a network of other blogs that are all equally disconnected from clinical evidence. Second, the tool could be used to show users a search query that is automatically generated from the text of research communications for use with bibliographic databases like PubMed, educating users on how to search bibliographic databases for clinical evidence.
This study had several limitations. First, while the use of Altmetric helped us to quickly construct a large dataset of reported links, the dataset might be a biased sample of research communications. Communications that include hyperlinks to journal webpages, PubMed, or link to articles using their DOIs may be of higher quality or may be targeted at specialised audiences. Other research communications not using hyperlinks were not included in the dataset, and these may be different to those tracked by Altmetric. Testing the approaches on a more general set of examples before deployment would be necessary. Second, there are a wide range of alternative approaches to feature representation and recommender systems. While we discuss the potential advantages of some of these approaches above, we are at present only able to speculate on which of them are likely to perform best as part of a tool or service aimed at improving the detection of distortion in research communications online. Finally, while vaccination is an important application domain, we did not test what might happen if we had selected a much broader sample of webpages and articles, or if we constructed models specifically designed to find missing links for individual fields or topics of research. It is possible that more general or more specific datasets may influence the performance of the methods we tested.
The results indicate the feasibility of tools designed to support the identification of missing links between health research communications and the scientific literature on which they are based. Such tools have the potential to help people better discern the veracity and quality of what they read online. While standard feature representation and document similarity methods were moderately successful in this task, further investigation is warranted.
- Beam et al.  A. L. Beam, B. Kompa, I. Fried, N. P. Palmer, X. Shi, T. Cai, and I. S. Kohane. Clinical Concept Embeddings Learned from Massive Sources of Medical Data. CoRR, abs/1804.0, 2018.
- Bean  S. J. Bean. Emerging and continuing trends in vaccine opposition website content. Vaccine, 29(10):1874–1880, 2011. doi: 10.1016/j.vaccine.2011.01.003.
- Castell et al.  S. Castell, Charlton A, Clemence M, Pettigrew N, Pope S, Quigley A, Navin Shah J, and Silman T. Public Attitudes to Science 2014. Technical report, Ipsos MORI Social Research Institute, 2014.
- Charnock and Shepperd  D. Charnock and S. Shepperd. Learning to DISCERN online: applying an appraisal tool to health websites in a workshop setting. Health Education Research, 19(4):440–446, 2004. doi: 10.1093/her/cyg046.
- Charnock et al.  D. Charnock, S. Shepperd, G. Needham, and R. Gann. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. Journal of Epidemiology and Community Health, 53(2):105–11, 1999.
- Cooper Robbins et al.  S. C. Cooper Robbins, C. Pang, and J. Leask. Australian Newspaper Coverage of Human Papillomavirus Vaccination, October 2006–December 2009. Journal of Health Communication, 17(2):149–159, 2012. doi: 10.1080/10810730.2011.585700.
- Dunn et al.  A. G. Dunn, E. Coiera, and F. T. Bourgeois. Unreported links between trial registrations and published articles were identified using document similarity measures in a cross-sectional analysis of ClinicalTrials.gov. Journal of Clinical Epidemiology, 95(Mar):94–101, 2018. doi: 10.1016/j.jclinepi.2017.12.007.
- Eysenbach  G. Eysenbach. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews. BMJ, 324(7337):573–577, 2002. doi: 10.1136/bmj.324.7337.573.
- Fox and Duggan  S. Fox and M. Duggan. Health Online 2013. Technical report, Pew Research Center, 2013.
- Fox and Rainie  S. Fox and L. Rainie. The online health care revolution : How the Web helps Americans take better care of themselves. Technical report, 2000.
- Fox and Rainie  S. Fox and L. Rainie. Vital decisions: how Internet users decide what information to trust when they or their loved ones are sick. Technical report, 2002.
- Grundy et al.  Q. Grundy, A. G. Dunn, F. T. Bourgeois, E. Coiera, and L. Bero. Prevalence of Disclosed Conflicts of Interest in Biomedical Research and Associations With Journal Impact Factors and Altmetric Scores. JAMA, 319(4):408–409, 2018. doi: 10.1001/jama.2017.20738.
- Haneef et al.  R. Haneef, P. Ravaud, G. Baron, L. Ghosn, and I. Boutron. Factors associated with online media attention to research: a cohort study of articles evaluating cancer treatments. Research Integrity and Peer Review, 2(9):1–8, 2017. doi: 10.1186/s41073-017-0033-z.
- Hotelling  H. Hotelling. Relations Between Two Sets of Variates. Biometrika, 28(3/4):321, 1936. doi: 10.2307/2333955.
- Howard and Ruder  J. Howard and S. Ruder. Universal Language Model Fine-tuning for Text Classification. CoRR, abs/1801.0:1–12, 2018.
- Huang et al.  Z. Huang, X. Li, and H. Chen. Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL 2005, pages 141–142, New York, New York, USA, 2005. ACM Press. doi: 10.1145/1065385.1065415.
- Huskinson et al.  T. Huskinson, N. Gilby, H. Evans, J. Stevens, and S. Tipping. Wellcome Trust Monitor Report Wave 3 Tracking public views on science and biomedical research Wellcome Trust Monitor: Wave 3. Technical report, 2016.
- Ibrahim and Landa-Silva  O. A. S. Ibrahim and D. Landa-Silva. ES-Rank: evolution strategy learning to rank approach. In Proceedings of the Symposium on Applied Computing - SAC ’17, pages 944–950, New York, New York, USA, 2017. ACM Press. doi: 10.1145/3019612.3019696.
- Kata  A. Kata. A postmodern Pandora’s box: Anti-vaccination misinformation on the Internet. Vaccine, 28(7):1709–1716, 2010. doi: 10.1016/J.VACCINE.2009.12.022.
- Kata  A. Kata. Anti-vaccine activists, Web 2.0, and the postmodern paradigm – An overview of tactics and tropes used online by the anti-vaccination movement. Vaccine, 30(25):3778–3789, 2012. doi: 10.1016/J.VACCINE.2011.11.112.
- Koren et al.  Y. Koren, R. Bell, and C. Volinsky. Matrix Factorization Techniques for Recommender Systems. Computer, 42(8):30–37, 2009. doi: 10.1109/MC.2009.263.
- Larson  H. J. Larson. The biggest pandemic risk? Viral misinformation. Nature, 562(7727):309–309, 2018. doi: 10.1038/d41586-018-07034-4.
- Larson et al.  H. J. Larson, L. Z. Cooper, J. Eskola, S. L. Katz, and S. Ratzan. Addressing the vaccine confidence gap. The Lancet, 378:526–535, 2011. doi: 10.1016/S0140.
- Lau and Coiera  A. Y. S. Lau and E. W. Coiera. Do People Experience Cognitive Biases while Searching for Information? Journal of the American Medical Informatics Association, 14(5):599–608, 2007. doi: 10.1197/jamia.M2411.
- Liu  T.-Y. Liu. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval, 3(3):225–331, 2009. doi: 10.1561/1500000016.
- Magerman et al.  T. Magerman, B. van Looy, and X. Song. Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics, 82(2):289–306, 2010. doi: 10.1007/s11192-009-0046-6.
- Menon et al.  A. K. Menon, D. Surian, and S. Chawla. Cross-Modal Retrieval: A Pairwise Classification Approach. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 199–207, Philadelphia, PA, 2015. Society for Industrial and Applied Mathematics. doi: 10.1137/1.9781611974010.23.
- Mikolov et al.  T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, pages 3111–3119, 2013.
- Moran et al.  M. B. Moran, M. Lucas, K. Everhart, A. Morgan, and E. Prickett. What makes anti-vaccine websites persuasive? A content analysis of techniques used by anti-vaccine websites to engender anti-vaccine sentiment. Journal of Communication in Healthcare, 9(3):151–163, 2016. doi: 10.1080/17538068.2016.1235531.
- Peters et al.  M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep contextualized word representations. In Proceedings of NAACL-HLT 2018, pages 2227–2237, 2018.
- Rasiwasia et al.  N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In Proceedings of the International Conference on Multimedia - MM ’10, pages 251–260, New York, New York, USA, 2010. ACM Press. doi: 10.1145/1873951.1873987.
- Selvaraj et al.  S. Selvaraj, D. S. Borkar, and V. Prasad. Media coverage of medical journals: Do the best articles make the news? PLoS ONE, 9(1), 2014. doi: 10.1371/journal.pone.0085355.
- Shah et al.  Z. Shah, D. Surian, K. D. Mandl, and A. G. Dunn. Automatically applying a credibility appraisal tool to track vaccination-related communications shared on social media. arXiv, page arXiv:1903.07219 [cs.SI], 2019.
- Steffens et al.  M. S. Steffens, A. G. Dunn, and J. Leask. Meeting the challenges of reporting on public health in the new media landscape. Australian Journalism Review, 39(2):119–132, 2017.
- Weaver et al.  J. B. Weaver, N. J. Thompson, S. S. Weaver, and G. L. Hopkins. Healthcare non-adherence decisions and internet health information. Computers in Human Behavior, 25(6):1373–1380, 2009. doi: 10.1016/J.CHB.2009.05.011.
- World Health Organization  World Health Organization. Ten threats to global health in 2019, 2019. URL https://www.who.int/emergencies/ten-threats-to-global-health-in-2019.
- Yavchitz et al.  A. Yavchitz, I. Boutron, A. Bafeta, I. Marroun, P. Charles, J. Mantz, and P. Ravaud. Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study. PLoS Medicine, 9(9):e1001308, 2012. doi: 10.1371/journal.pmed.1001308.
- Zeraatkar et al.  D. Zeraatkar, M. Obeda, J. S. Ginsberg, and J. Hirsh. The development and validation of an instrument to measure the quality of health research reports in the lay media. BMC Public Health, 17(1):343, 2017. doi: 10.1186/s12889-017-4259-y.