A market is a social system composed of stakeholders engaged in the exchange of commodities. From this perspective, academia is also a marketplace comprising of scholars, journals and publishers, institutions, funding agencies, etc.
Based on the roles of these entities—notably journals and scholars—academia can be classified into two broad business models: subscription-based and open access (OA). In both models, scholars have their research outcomes published in journals. The difference between the two models boils down to the party liable for article processing. The subscription-based model charges the readers,e.g. libraries, governments, institutions, and scholars, and limits the readership to a few solvent scholars and institutions. In contrast, the OA model allows everyone to access the full paper without payment; instead, they bill the authors an article processing charge (APC) [6, 16]. With the Internet revolution, the OA model has gradually been accepted by academia. In the initial days of OA, scholars opined that the movement would enable them to give back to the society, and consequently benefit academia and catalyse scientific literacy and advancement.
However, while this vision has partially materialised, the rise of a clandestine segment of the OA model that promotes publications of questionable standards with the sole purpose of profiteering has been observed worldwide [3, 11, 31]. Questionable journals (QJs) and questionable publishers (QPs) infiltrate academia with a perfunctory peer-review process by exploiting the weakness of the evaluation system, i.e., its reliance on journal indices. For instance, increasing academic competition has forced scholars to have more research published in indexed journals with high visibility, e.g., science citation index (expanded), social science citation index, and Scopus, etc. QJs are often accidentally indexed in renowned citation databases despite the existence of preventive mechanisms. Journal indexing allows the expeditious publication of articles without peer review. This may entice authors who are at a critical juncture in their academic careers, increasing their likelihood of preferring QJs. Critics of the OA model warned academia of the scientific stalemate and financial losses that would follow the publishing of unqualified articles . However, not all OA journals are questionable. Therefore, to protect the padawans from the dark side, critics have presented the criteria to identify questionable publications along with a list of suspicious publishers and journals [3, 2, 25, 4].
Despite the awareness of predatory publications, the market share of questionable publications has steadily increased (Supplementary Fig. S1) [44, 30] as they carefully devise the stratagems necessary to expand their business in academia. QJs occasionally recruit respected editors to assert their credibility. Furthermore, they were also found to appropriate the aliases of reputed scholars . Authors pressed for time demand prompt article publication after submission [4, 42, 9, 18, 41], which coincides with the offer of the questionable publications. Recent studies informed the risks of questionable publishing, e.g., plagiarism , and insufficient peer review . Notwithstanding the warnings from critics, the risks can also be deduced from the qualitative evidence. However, this type of evidence is limited by the small sample of publications. Therefore, an unbiased and more quantitative approach is required to reveal the questionable publications. A few researchers have in fact quantitatively analysed the questionable publications. For instance, a study reported that papers published in QJs received fewer citations on average ; however, they only presented a simple citation count. Prior journal impact, which is often not considered in questionable publication studies , is a crucial factor determining if the paper should receive future citations. Citation distributions are known to be heavy-tailed, which implies that the majority of journals are rarely cited . In other words, there are numerous journals besides the QJs that have been rarely cited. Thus, the difference between the QJs and the unaccused journals is insignificant when investigated at a superficial level as opposed to a fundamental one. The research question we attempt to answer in this study is: do questionable publications harm the academic ecosystem?
The purpose of a journal self-citation is to detect suspicious journals that have overstated their citation scores [1, 17, 22, 48]. A logical step to achieve this is to quantify the score of the potential QJ. However, measuring the self-citation may not be sufficient, because an author may submit her/his paper to several journals before publication, which includes the revision stage. This obscures the timeline of the citation. Moreover, the majority of publishers operate multiple journals ; thus, the role of academic publishing has also been transferred from independent journals to publishers. Therefore, instead of solely relying on self-citations, we must focus on the publishers who have not been considered owing to their reputation .
In this study, we systematically analyse billions of citation records of questionable publications at various levels and of multiple agencies, i.e. papers, authors, journals, and publishers. With the large-scale data of 48,579,504 papers, 277,218 journals, and 2,714 publishers indexed and published in Scopus between 1996 and 2018 , we analysed the hidden citation patterns of questionable publications that can reflect graft. In addition to the simple citation count discussed in , we also collected the set of journals closest to the QJs in terms of fame for comparison (see Methods). This approach allowed us to systematically monitor the unique patterns of questionable publications that distinguish them from the unquestioned publications.
A clear advantage of the comparative analysis of bibliographic data over the mere investigation of the citation count of the QJs is that it allows to compare the trajectories of the unquestioned journals (UJs) and QJs. That is, it does not overlook the possibility that the QJs are simply struggling because of the negative response of academia to their business model. However, the results of this analysis indicated that the QJ was not simply a struggling new model but was negatively affecting academia when compared to the reference UJs.
The main objective of this study was to detect anomalies in the citation patterns of QJs. We used the January 2019 dump of Scopus CUSTOM XML DATA, which includes all Scopus citation records from 1996 to 2018, along with the bibliographic metadata. Because the dataset itself does not reveal QJs, we identified them by systematically extracting the ISSNs from the journal websites using Beall’s list of potential predatory journals and publishers (see Methods). To improve the accuracy, we manually cross-verified the QJs by collecting the names of their journals and publishers. With the acquired ISSNs and titles, we identified the Scopus source id for each journal. In total, we collected 766 QJs indexed in Scopus at least once between 1996 and 2018. To proceed to the comparative analysis of QJs, we emphasize the fact that a journal’s repute strongly influences the chance of its future citation 
. The effect of the accumulated citation count itself makes it infeasible to directly compare the citation patterns of the journals with different degrees of reputation. To compensate for the effect of reputation, we collected the reference journals using the journal impact, which is estimated as the way of the Journal Citation Reports (JCR). The criteria for selecting the reference journal were as follows: the reference journals i) shall not be present in the list of QJs, ii) it shall be in the same subject category as a target QJ, and iii) its journal impact shall be the most similar with that of the target QJ (see Methods). The reference journals are nothing but the UJs mentioned above, unless specified otherwise. Using these two groups, we performed a comparative investigation of the citation pattern from various aspects.
2.1 Publishers and journals as liaisons in academia
As mentioned earlier, a market is composed of buyers and sellers. In this sense, academia can also be considered a market sustained by the exchange of knowledge between authors (buyers) and journals/publishers (vendors; see Fig. 1a).Therefore, both parties must be accounted for to comprehend the entire landscape of academia. We began with the publishers and journals, which are the liaisons of academia. Journals select the articles to be distributed in academia based on their scope; thus, the direction of a journal is reflected in their publication record. Additionally, the citation pattern also reflects their strategy because citations are widely used to gauge the performance or reputation of a journal [24, 32]. For instance, the journal’s self-citation rate indicates the veracity with which it pursues high citation scores; however, indexing services have been restricting this endeavour. Similarly, we introduced two simple metrics to characterise the collective citation patterns: the citation rate and reference rate , to account for certain journals (publishers) and as follows.
where is the number of citations a journal(publisher) attributes to a paper published in the target group , whereas is the set of all journals in the dataset. Here, the target group can be: i) an individual journal, ii) a group of journals, iii) a publisher composed of multiple journals, or iv) a group of publishers (Fig. 1b-d). The value of this metric increases when the collective volume of citations has increased for a certain group; thus, this value represents the degree of a citation preference between the groups. Hereinafter, for the special case of , metrics and shall be referred to as the self-citation and self-reference rates, respectively. To illustrate with an example, a publisher’s self-reference rate can be calculated by , where is a set of journals belonging to the same publisher of journal ; that is, the fraction between the total count of references citing the publisher and the total number of references to a journal.
The logical step forward is to investigate the difference between QJs and UJs using these metrics. We pay special attention to the distinct patterns of the QJs, and therefore examine the difference between the mean self-reference rates of the QJs and UJs. We observed that the self-reference rate of the QJs was less than that of the UJs (Fig 1b). The mean self-citation rate of the QJs was also less than that of UJs, except in 2017 and 2018. In summary, the journal-based self-citation metric did not reveal any peculiar behaviour from the QJs.
On the contrary, we observed that the QJs had a collective tendency to preferentially cite other QJs (Fig. 1c). Specifically, the QJs received of the citations from other QJs, and only from the UJs (Fig. 1c). Note that we selected UJs from the same list of journals as the QJ in terms of journal impact, size, and subject category (see Methods). Thus, the QJs and UJs were expected to be in an equivalent proportion. However, more QJs have received citations from other QJs than UJs have received from other UJs. In other words, the journal impact of QJs depends more on citations from other QJs than from UJs. This observation along with the lower self-citation rate of QJs indicates a possible citation preference among the publishers of QJs.
The contrasting observation raises an intriguing question: does the citation preference indeed originate from the publisher? To answer this question, we inspected the self-citation preference of the publishers (Fig. 1d
). By calculating the fraction of the publishers’ self-citation for both the QJs and UJs, we found that the UJs have cited the same publisher more times than the QJs. However, the heterogeneity in the number of publications by publishers possibly affected the self-citation and self-reference probabilities. That is, it is normal to cite more studies from the larger, more reputed publisher despite the lack of any explicit preference. We also found a large heterogeneity in the size of publishers (see Supplementary Fig.S2) . Large publishers, such as e.g. Springer Nature, Elsevier, Wiley-Blackwell, and Taylor& Francis, have increased their market share and account for half of all academic publications (Fig. 1e). This stark heterogeneity makes it unfair to directly compare the publisher self-reference rates. Therefore, we must account for the market share of the publishers. To compensate for the heterogeneity, we proposed the publication solidarity index for a journal , which is expressed as follows:
where and are the expectations of self-reference and self-citation rates for a publisher, respectively, which are defined as:
and is the total number of papers published in journal . The term is the number of citations received by journal from papers in journal , and is the set of journals in the publisher of journal .
The self-reference rate shows the number of times the journal cites its publisher, which cannot adequately represent the entire academic landscape. However, a high self-reference rate is not necessarily evidence of malpractice. First, as shown in Fig. 1d, large publishers tend to cite more (see Supplementary Fig. S3). Additionally, the self-reference rate can be high if there is no other journal covering the same subject. For instance, a high-profile journal may publish several significant studies in quick succession, and therefore researchers have few options from which to cite. Occasionally, it unavoidably cites papers published in the most reputed journals [28, 51]. Here, was designed to compensate these biases: we divided the self-reference rate i) by the size of the publisher as to compensate for the size effect and ii) by the self-citation rate as to account for the indispensability of a citation. Therefore, the proposed metric is mostly free from the influence of the publisher size and journal’s profile. It has a high value when the journal assigns more citations to their publisher than the citations the latter receives from other sources.
To characterise and distinguish the citation behaviour of the QJs from that of the UJs, we compared the of QJs and their corresponding UJs. Note that we excluded standalone QJs and UJs because we designed the metric to capture the self-reference rates of publishers composed of multiple journals. We then calculated the ratio of of the QJs to their corresponding UJs; a ratio of implies the QJs have excessively self-referenced, and a ratio of indicates vice versa. We found that, despite two exceptions, the QJs have a higher than the UJs (Fig. 1f). The value may largely depend on the publisher size; however, the result is robust to the relative publisher size between QJs and UJs. The QJs have larger values even if the QP is larger than the unquestioned publisher (UP).
The size of the QP is observed to be larger than the UP for the two exceptions, where the ratio of . To verify this hypothesis, we manually examined two cases. The first exception is the Journal of Essential Oil-Bearing Plants, which was acquired by an UP in 2013; subsequently, the journal’s metric and improved. Specifically, CiteScore increased from 0.5 to 1.3, and from 0.1 to 0.01 between 2013 and 2018. The publisher of the second journal Insects was eventually excluded from Beall’s list. In summary, QJs are likely to embellish their journal impact with self-reference. Note that the ratio of is also larger than one for the relatively higher impact QJs, yet it is less than the lower-impact QJs (solid line in Fig. 1f). Thus, once the QJs have gained a place in academia, they gradually reduce publisher self-references.
2.2 Authors as prosumers in academia
Thus far, we investigated the role of publishers and journals, which are the liaisons in academia. The logical step forward is to observe the response of authors to QJs. Despite several warnings, the number of unique authors publishing in QJs has increased over time (Supplementary Fig. S4). Note that the author decides whether to submit a paper. There are two possibilities: the authors may submit an article to QJs with or without prior knowledge. Thus, to inspect the author’s behaviour, we must answer the question, “can authors distinguish QJs from journals with similar external appearances?” Moreover, we must investigate any unfair advantage they received from the questionable publication to fully understand their behaviour.
To understand the authors’ response to the rise of questionable publications, we first aim to their ability to discriminate the questionable publications from the rest, especially from the unquestioned publications. According to the calculations in section 2.1, the QJs demonstrated a higher than the UJs, which indicates the homophilic citations of QJs. Because we selected UJs with similar citation impacts and publication sizes, the above results demonstrated highly clustered citations for the QJs that are possibly located in the periphery of the academic market because they are rarely cited (Supplementary Fig. S5). To prove this hypothesis, we constructed a journal citation network consisting of all the Scopus (see Methods) journals (as nodes) and citations (as links). We measured various network centralities, namely betweenness centrality (BC), closeness centrality (CC), PathCore score (PC), and PageRank centrality (PR). The important nodes in the network with the constructed network have large centralities [19, 10, 35].
According to the results, the centrality distributions of the UJs are likely to have a heavier tail than those of the QJs, except PageRank (Fig. 2). Although QJs tend to be distributed over a higher PageRank range than UJs (Fig. 2d), this does not imply that QJs are more important than UJs in the academic market. For instance, with an increment in self-loops, the number of nodes with a high PageRank also increases . As discussed in the previous section, we found that QJs show a higher , which underscores the highly clustered cycles in the citation network. The trend is also valid for directly comparing the centralities of QJs and UJs. Specifically, UJs exhibited high BC, high CC, and high PC, respectively. Only UJs showed a higher PageRank than the QJs. Note that this observation is robust for the size of the citation window, target year, and/or the link types (see Method and Supplementary Figs. S15–S29). Overall, despite having similar discipline, journal size, and citation impact, UJs served more central roles than QJs in academia.
To further understand the authors’ response to questionable publications, we examined their unfair advantage with the questionable publication. The benefit of studying the massive citation dataset is that the citation pattern of individual authors and publications can be identified. The advantage of having a paper published in a QJ is the fast processing time. However, this advantage is accompanied by inadequate peer review, risking the publishing of underqualified research [9, 7]. Because we selected UJs with similar citation impacts and sizes as their questionable counterparts, their citation distributions are similarly heavy-tailed (Fig. 3a). We also found that QJs publish more uncited articles (27.5%) than UJs (22.2%), albeit by an insignificant margin. In terms of the citation distribution itself, the two groups are nearly identical. Therefore, any differences in the quality of articles may result from surplus citations.
However, simple citation counts are not always an accurate measure of the quality, e.g., the baseline of citation measures are largely determined by the discipline . Therefore, we must account for the hidden context of citations, which better reflects the quality of a publication. For instance, novelty plays an important role in the progress of science and technology and is measured using atypical combinations of references 
. Following this idea, we computed the z-score, which represents the rarity of reference pairs compared to the random null model (see Methods). A z-score of less than zero indicates that the reference combination is relatively rare and novel. On average, a single paper has more than ten references, which are composed hundreds of pairs. Therefore, we selected the 10th percentile and median of the z-scores as the representative statistics. The majority of the 10th percentile z-scores are located near zero for both UJs and QJs, with no significant difference between them (see Fig.3b). There is also nearly zero disparity in the median z-score. In summary, both QJs and UJs have articles of a similar degree of novelty on average.
It may be argued that the novelty of the reference pairs cannot capture the type of contribution to academia because it does not consider the future citation impact. However, highly disruptive articles have the potential to introduce breakthroughs in science and technology, and thus have a large impact on society. From this perspective, there are two types of contributions to science and technology: i) disruptive research, which introduces new approaches and fundamental questions in the field, and ii) developmental research, which aims to practice well-known theories in new contexts . We adopted the disruptiveness index to measure the degree of disruption for individual papers, accounting for both references and citations simultaneously (see Methods) . We demonstrated that QJs tend to publish fewer disrupting research papers. Disruptiveness decreases with the number of coauthors who contributed to the original study . The disruptiveness index of QJs is consistently less than that of UJs (see Fig. 3c). It saturates at for UJs, and at for QJs, if more than 15 persons co-authored the paper. When combined, despite having a similar citation impact, the manuscripts published in QJs are more conventional than those of UJs.
Additionally, we examined the demographics of authors in terms of their publication preference (Fig. 3d). On average, QJ authors are relatively young in terms of academic experience than those published in UJs (8.09 vs. 8.94 years, respectively; see Methods). On the one hand, QJ authors publish more in their careers (8.7 publications more than the UJ authors on average), but have fewer questionable publications (2.11) than unquestioned publications (2.67). Thus, QJ authors are more productive than UJ authors; however, they prefer UJs over QJs. On the other hand, QJ authors had self-citations in their manuscripts, while also having been self-cited by of their publications. This is a marginally smaller proportion compared with the UJ authors, who have self-citations in their manuscripts and were self-cited by of their publications. QJ and UJ authors have a similar number of publications cited by QJs and UJs, i.e., 2.12 and 2.35, respectively. On average, QJ authors cited questionable publications only once in their career, while UJ authors cited unquestioned publications twice on average.
To summarise, authors tend to have less important research published in QJs, owing to inadequate peer review. QJs are located on the fringes of academia. Papers published in QJs are not as disruptive as those in UJs; however, they receive the same number of citations on average. Although the motivation of each author cannot be pinned, the above findings led to the conclusion that authors who consistently publish in QJs generally anticipate exploitation for profiteering by the publishers.
In this study, we analysed the massive bibliographic metadata of Scopus to inspect the status quo of questionable publications as distinguished from unquestioned publications. Previous studies have investigated QJs in academia through small samples or simple citation counts, which have raised many unanswered questions [3, 34, 23]. We attempted to quantitatively understand the behaviour of questionable publications. The analysis of journal self-citation and reference revealed self-favouring citations among QJs from publishers. Moreover, QJs adopt a less important role in the journal citation network, as reflected in the centrality metrics. Authors had few creative research works published in the QJs, as highlighted by the disruptiveness index. Observational data suggest that a comprehensive analysis of citation patterns could support the case against QJs, which will help fight corruption in academia. The limitation of this study is the diversity of the sample; it only considers the journals indexed in Scopus, which cannot represent all QJs. However, the sample, and a small subset of all questionable publications, qualifies as questionable publications according to the criteria set by indexing services [38, 13]. However, non-indexed journals have insignificant visibility; therefore, their influence on academia is limited. Another limitation is that the criteria used to select UJs is imperfect as the list of UJs might contain hidden QJs. We did not consider the fluctuations in citation patterns due to individual authors and publications because we investigated the collective tendencies of several publications. We also omitted possible self-favouring groups in UJs for a future study.
This analysis may have notable implications to policymakers. Governments, institutions, and funding organisations occasionally promote researchers by incentivising their research outcomes [37, 26]. Evaluations of such incentives conventionally rely on quantitative measures, e.g., the number of publications, the number of citations, and the reputation (citation impact) of published journals. However, this study suggests that the comprehensive quality of research outcomes should also be assessed on the value of its contents, and not solely computed through simple quantitative metrics. In addition, recent reports suggest that authors in low-income countries are more likely to avail the services of questionable publications. This warrants the active consideration of the inequality of science among countries [27, 12]. For instance, the publication fees of major journals exceed the annual income of certain countries; the current APC waiver policy does not solve this problem. We believe each stakeholder should have a role in cultivating academia. An ironclad policy accompanied by quantitative analysis judiciously combined with context-specific assessment will help introduce more innovative ideas into academia worldwide.
4.1 Identifying questionable publications in Scopus
We drafted a list of QJs from Beall’s list of potential predatory journals and publishers along with anonymous updates . The original Beall’s list was terminated in January 2017. Despite this, it is still being updated by anonymous contributors (as of July 2021). We have refer to Beall’s list as the one encompassing the original list as well as the new website, unless specified otherwise. It contains two types of questionable publication lists: i) standalone journals and ii) predatory publishers. We used both lists to identify questionable publications. To enhance the credibility of our list, we collected the data by two methods. First, we built a Web robot to crawl the information on the journal Web pages (Fig. 4). We programmed a bot with a Python 3.6 module using the Beautiful Soup package to extract the ISSN information from the websites of the journals and publishers in Beall’s list. The Web crawling was performed from September 12 to October 12 2018. Multiple accesses for the websites had attempted to bypass the effect of temporary outages of Web services. However, the robot could only access 980 (including redirection to the error page) of 1,201 websites. We then collected 6,250 distinct ISSNs from the Web pages of 381 journals or publishers. The rest of the websites do not contain ISSNs. For all subpages belonging to the target domain, we tested five consecutive items (5 grams) that appeared after a keyword ISSN or ISSN: separated by a space (punctuation) or other types of punctuation, and found a 1-gram formatted as XXXX-XXXX. We filtered out the ISSN candidates that failed to pass the ISSN checksum test .
In addition to Web crawling, the journal and publisher websites were manually inspected to collect their names (Fig. 4). We accessed the websites in early 2019: i) from January 24 to February 28 for the original list and ii) from March 1 to 8 for the updated list. We used Beall’s list updated as of March 1, 2019, during which the connection was established on 953 websites among 1,236 websites. Subsequently, we searched the publisher and journal names on Scopus , accompanied with Scimago Journal & Country Ranking , to obtain the ISSNs of the publications. Thus, we collected 713 ISSNs of the journals in Scopus by manual inspection. Finally, we matched the ISSNs with the source list of Scopus to decode the source id of the QJs listed in Scopus. We selected the journals indexed in Scopus at least once between 1996 and 2018. Thus, we selected 766 sources comprised of 848,042 publications from Scopus as the group of QJs.
4.2 Selection criteria of the unquestioned journals
Note that the expected number of citations for the QJs was significantly less than those in the entire Scopus database (Supplementary Fig. S5). We believe the effect of reputation makes it unfair to compare the QJs to arbitrary journals. We chose the UJs that had a similar journal impact in the academic market as the QJs but not enlisted in Beall’s list. Along with the journal impact, we also consider the journal’s subject classification and publication size. First, we calculated the journal impact by the way of JCR impact factor , but with Scopus data, as follows:
For each subject category, denoted by its first two digits in the all science journal classifications (ASJC), we arranged the journals in the order of its impact. We classified the journals into three groups by their publication size to compensate for the journal size: i) large publishing (top 33rd percentiles), ii) moderate publishing (from 33rd to 66th percentiles), and iii) small publishing (the rest; for details, see Supplementary Methods). The journal is considered inactive if it publishes less than 30 papers annually, which were excluded from the selection process. Subsequently, we chose the UJs by the following conditions: the journal i) was classified under the same subject category as the target QJ, ii) had the most similar journal impact with the target QJ, iii) and is in the same group of journal size. For a QJ classified under multiple subject categories, we separately considered a UJ per category. One QJ may pair with many UJs.
Another challenge is that value of the citation impact may inflate owing to the increase in the overall number of publications. Thus, we normalised the citations  by readjusting the number of citations for a given paper in the year as follows:
where is the normalised citation count for a given paper in the year , is the raw citation count for a given paper in the year , and is the number of articles in the top-cited field of the study in the year . The normalisation process made the journal impact independent of time, enabling the comparison of citations in papers published several years apart (see Supplementary Fig. S7).
4.3 Construction of the journal citation network
To compute the centrality, we constructed a journal citation network for each year. We set the journals published in the target year as nodes and citations received for two years since the target year as links. Note that the citation network was directed, and the in-degree was the same as the count of citations received for two years after publication. The out-degree corresponds to the total count of references published in the past two years. We demonstrated the robustness of the results for the network construction by validating them: i) using references as links instead of citations and ii) using a different time window (for details, see Supplementary Methods).
4.4 Z-score of atypical combination and disruptiveness index
. First, we calculated the novelty by the atypical combination. We considered that a reference combination was atypical if the combination rarely appeared in the randomly shuffled citation network. We adopted the Markov Chain Monte Carlo method to generate the randomly shuffled citation networks, preserving the number of citations, the number of references, and the timeline of the citations. To handle the errors of random shuffling, we produced ten distinct random networks at the paper level. The z-score between two journalsand was computed by standardising , where is the observed frequency of a journal pair in the original data, and and
are the average and standard deviations of frequency in the randomly shuffled network, respectively. A reference list in a single paper was composed of hundreds of pairs. To summarise the distribution of such z-scores, we examined the median and 10th percentile z-scores as representative statistics in terms of the central tendency and unusual combinations for a given paper.
We also calculated the disruptiveness index using the co-citation of publications  as , where is the number of publications that cite a publication while not citing any reference of , is the number of publications that cite both and its references, and is the number of publications that cite any reference of but not itself. The disruptiveness index ranges from , where indicates all the publications citing both and its references (implying the least disruptive case), whereas indicates all publications citing but none of its references (implying the most disruptive case).
4.5 Disambiguation of scholars in Scopus
Author disambiguation in Scopus is still imperfect. Therefore, we additionally merged the authors with a two-step disambiguation . As the first step, we calculated the similarity between two papers using self-citation, the number of shared authors, the number of shared citations, and the number of shared references. If the similarity is higher than the threshold of , the same author wrote two papers. We merged the groups when the average similarity between two groups was beyond the threshold of . To this end, we used the parameters from the original study . The disambiguation calculations for the uncited single-authored papers may encounter errors. In this case, the author was classified as a distinct author who published only one article; we excluded these authors and their papers after the disambiguation to eliminate the errors because the volume of their papers accounted for less than 1%. We used the merged authors to calculate the author-based metrics. For instance, the authors’ academic age was computed as the difference between her/his first and last publications. Before disambiguation, there were 21,805,486 Scopus author ids (auid) with 1,895,356 unique author names, which were merged into 15,022,380 unique author names after the disambiguation.
-  Journal self-citation in the Journal Citation Reports – Science Edition (2002) - Web of Science Group. Note: https://clarivate.com/webofsciencegroup/essays/journal-self-citation-jcr/Accessed: 2020-11-03 Cited by: §1, §4.2.
-  (2017) Cabell’s new predatory journal blacklist: a review. The Scholarly Kitchen. Cited by: §1.
-  (2012) Predatory publishers are corrupting open access. Nature News 489 (7415), pp. 179. Cited by: §1, §3.
-  (2016) Essential information about predatory publishers and journals. International Higher Education (86), pp. 2–3. Cited by: §1, §1.
-  Beall’s list of predatory journals and publishers. Note: https://beallslist.weebly.com/Accessed: 2018-09-12 Cited by: §4.1.
-  (2012) Open access versus subscription journals: a comparison of scientific impact. BMC Medicine 10 (1), pp. 1–10. Cited by: §1.
-  (2013) Who’s afraid of peer review?. Science 342 (6154), pp. 60–65. Cited by: §1, §2.2.
-  (2003) ISSN manual. cataloging part: international. standard. serial. number. ISSN Internacional Centre. Cited by: §4.1.
-  (2019) Knowledge and motivations of researchers publishing in presumed predatory journals: a survey. BMJ Open 9 (3), pp. e026516. Cited by: §1, §2.2.
-  (2016) Detection of core–periphery structure in networks using spectral methods and geodesic paths. European Journal of Applied Mathematics 27 (6), pp. 846–887. Cited by: §2.2.
-  (2018) Fake news. fake journals. fake conferences. what we can do. Journal of Clinical Nursing 27 (9-10), pp. 1727–1729. Cited by: §1.
-  (2018) Predatory journals: who publishes in them and why?. Journal of Informetrics 12 (4), pp. 1296–1311. Cited by: §3.
-  (2020) Predatory open access journals are indexed in reputable databases: a revisiting issue or an unsolved problem. Medical Archives 74 (4), pp. 318. Cited by: §3.
-  (2018) Inside the fake science factory 2018. In Presentation at DEF CON, Vol. 26. Cited by: §1.
-  Scopus. Note: https://www.scopus.com/Accessed: 2017-04-22 Cited by: §1, §4.1.
-  (2006) Citation advantage of open access articles. PLOS Biology 4 (5), pp. e157. Cited by: §1.
-  (2007) Does self-citation pay?. Scientometrics 72 (3), pp. 427–437. Cited by: §1.
-  (2019) Why do researchers decide to publish in questionable journals? a review of the literature. Learned Publishing 32 (1), pp. 57–62. Cited by: §1.
-  (1978) Centrality in social networks conceptual clarification. Social Networks 1 (3), pp. 215–239. Cited by: §2.2.
-  (2006) The history and meaning of the journal impact factor. JAMA 295 (1), pp. 90–93. Cited by: §2.
-  (2016) Influence of short cycles on the pagerank distribution in scale-free random graphs. Master’s Thesis, University of Twente. Cited by: §2.2.
-  (2016) From excessive journal self-cites to citation stacking: analysis of journal self-citation kinetics in search for journals, which boost their scientometric indicators. PLOS One 11 (4), pp. e0153730. Cited by: §1.
-  (2019) The evolving ecosystem of predatory journals: a case study in indian perspective. In International Conference on Asian Digital Libraries, pp. 78–92. Cited by: §3.
Success and luck in creative careers.
EPJ Data Science9 (1), pp. 9. Cited by: §2.1.
-  (2019) Kscien’s list; a new strategy to hoist predatory journals and publishers. International Journal of Surgery Open 17, pp. 5–7. Cited by: §1.
-  (2016) How do scientists respond to performance-based incentives? evidence from south korea. International Public Management Journal 19 (1), pp. 31–52. Cited by: §3.
-  (2017) How stakeholders can respond to the rise of predatory journals. Nature Human Behaviour 1 (12), pp. 852–855. Cited by: §3.
-  (2015) The oligopoly of academic publishers in the digital era. PLOS One 10 (6), pp. e0127502. Cited by: §1, §2.1, §2.1.
-  (2010) The discipline dependence of citation statistics. Scientometrics 84 (3), pp. 749–762. Cited by: §2.2.
-  (2017) Predatory journals in scopus. Národohospodářskỳ ústav AV ČR vvi. Cited by: §1.
-  (2019) Revisiting the term predatory open access publishing. Journal of Korean Medical Science 34 (13). Cited by: §1.
-  (2006) Citation analysis in research evaluation. Vol. 9, Springer Science & Business Media. Cited by: §2.1.
-  (2021) Citation contagion: a citation analysis of selected predatory marketing journals. Scientometrics 126 (1), pp. 485–506. Cited by: §1, §1.
-  (2019) Plagiarism in predatory publications: a comparative study of three nursing journals. Journal of Nursing Scholarship 51 (3), pp. 356–363. Cited by: §1, §3.
-  (1999) The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: §2.2.
-  (2019) Methods to account for citation inflation in research evaluation. Research Policy 48 (7), pp. 1855–1865. Cited by: §4.2.
-  (2017) Publish or impoverish: an investigation of the monetary reward system of science in china (1999-2016). Aslib Journal of Information Management. Cited by: §3.
-  (2019) Predatory publications in evidence syntheses. Journal of the Medical Library Association: JMLA 107 (1), pp. 57. Cited by: §3.
-  (2014) Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science 3, pp. 1–14. Cited by: §4.5.
-  SCImago journal & country rank. Note: http://www.scimagojr.comAccessed: 2017-08-30 Cited by: §4.1.
-  (2018) Where arab social science and humanities scholars choose to publish: falling in the predatory journals trap. Learned Publishing 31 (3), pp. 222–229. Cited by: §1.
‘Predatory’ open access: a longitudinal study of article volumes and market characteristics. BMC Medicine 13 (1), pp. 1–15. Cited by: §1.
-  (2017) Predatory journals recruit fake editor. Nature News 543 (7646), pp. 481. Cited by: §1.
-  (2016) Riding with the metric tide: predatory journals in scopus. Higher Education in Russia and Beyond 1 (7), pp. 9–12. Cited by: §1.
-  (2013) Atypical combinations and scientific impact. Science 342 (6157), pp. 468–472. Cited by: Figure 3, §2.2, §4.4.
-  (2012) Universality of citation distributions revisited. JASIST 63 (1), pp. 72–77. Cited by: §1.
-  (2019) Predator journals: an urgent and global problem. SAGE Publications Sage CA: Los Angeles, CA. Cited by: §1.
-  (2012) Coercive citation in academic publishing. Science 335 (6068), pp. 542–543. Cited by: §1.
-  (2019) Large teams develop and small teams disrupt science and technology. Nature 566 (7744), pp. 378–382. Cited by: Figure 3, §2.2, §4.4, §4.4.
-  (2011) Citation count prediction: learning to estimate future citations for literature. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1247–1252. Cited by: §1, §2.
-  (2021) Self-citation and citation of top journal publishers and their interpretation in the journal-discipline context. Scientometrics, pp. 1–28. Cited by: §2.1.
The National Research Foundation (NRF) of Korea Grant funded by the Korean Government supported this work through Grant No. NRF-2020R1A2C1100489 (T.Y. and J.Y.). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
6 Author contributions
All authors designed the experiment and wrote the manuscript, T.Y., J.P., J.Y. collected the data. T.Y. analysed the data. All authors reviewed, edited, and approved the final draft.
7 Additional information
7.1 Competing interests
The authors declare no conflict of interest.
8 Supplementary Information
Disturbance of greedy publishing to academia
S1 Brief description of questionable journals in Beall’s list
In this study, we collected 766 questionable journals (QJs), which consisted of 643 journals in the original list by Beall and 123 journals in the updated list contributed by the community. Note that QJs that publish less than 30 articles annually were excluded to prevent erroneous computations. The results revealed a massive surge in the share of QJs in 2010, as shown in Supplementary Fig. S1. QJs were mainly classified into medicine (31.7%), engineering (18.1%), biology (16.0%), and computer science (15.6%) based on the ASJC of Scopus. Some QJs were simultaneously classified into multiple subject categories. For instance, journals belonging to biology were occasionally co-classified under medicine. Only of the QJs published 100 or more articles annually; therefore, correlation between journal impact and publication volume was insignificant. The correlation between annual publication size and journal impact was only 0.22.
S2 Note on selection criteria of unquestioned journals
s2.1 Publication size
The ecosystem of mega journals is different from that of small journals reflected in the publication size. We varied the annual publication size of journals for the selection criteria to compensate for the impact of journal size. However, it is impossible to match the journal impact. On the one hand, if the margins for matching the journal size were set too narrow, no journal would have fit the journal impact. On the other hand, if the margins were too broad, the UJs might have a different publication policy regarding the corresponding QJs.
In this study, we tested three possible candidate metrics to calculate the adequate division of journal size that will minimize the differences in journal size and impact: (a)
with a log scale, (b) 33rd percentiles, and (c) quartiles (Supplementary Fig.S6). We chose the candidates of UJs based on the selection criteria with the three types of journal size matching. For a subject category, the publication size followed a heavy-tailed distribution. Thus, we considered the division of with a log scale as the first candidate. However, using (a) with a log scale resulted in a larger annual publication size difference than that obtained using the other metrics (Supplementary Fig. S6). Both (b) 33rd percentiles and (c) quartiles yielded better results than (a). In contrast, the division of 33rd percentiles had smaller differences in journal impact and size. Therefore, we concluded that 33rd percentiles can calculate the adequate division.
s2.2 Additional analysis using metrics of the Journal Citation Reports
We also tested three well-known metrics used by the JCR to compare the citation characteristics of QJs and select UJs. The immediacy index was used to measure the speed of a citation by computing the average citation count in the year of publication (Supplementary Fig. S8). The cited half-life was used to estimate the lifetime of an article, which was measured by the median age of the forward citations (Supplementary Fig. S9). The citation half-life was used to estimate the age of the topic for an article by the median age of the backward citations (Supplementary Fig. S10). These results showed that most journals, QJs and UJs, had similar indices. To conclude, UJs do not significantly differ from QJs in terms of the citation timescale.
S3 Additional analysis with disruptiveness index
We conducted further comparisons between QJs and UJs in terms of the disruptiveness index. First, we computed the average disruptiveness index for a year (Supplementary Fig. S11). The UJs consistently scored a higher disruptiveness index, except during 1998–2000. There were only a few QJs in the late 1990s; thus, the index may fluctuate strongly.
We also found no difference in the disruptiveness index at the journal level. The differences in the average number of authors (Supplementary Fig. S12), average disruptiveness index (Supplementary Fig. S13), and journal impact (Supplementary Fig. S14) revealed an insignificant margin between QJs and UJs. Specifically, the QJs and UJs had similar number of authors and disruptiveness index on average. The average disruptiveness index fluctuated more for journals with smaller impacts, while other journals showed only a small difference. To summarise, while the publication-level disruptiveness index is high for UJs, the journal-level disruptiveness indices are the same for both QJs and UJs.
S4 Robustness of constructed network
In this section, we demonstrate the robustness of the centrality results for network construction. Several types of journal citation networks can be constructed by selecting citations as a link. For instance, one may choose between citations or references of publications and adjust the time window of the citations (references). In addition to the network presented in the main text, we tested four types of networks composed of: i) two-years citation data since publication (Supplementary Figs. S15 – S18), ii) five-years citation data since publication (Supplementary Figs. S19 – S22), iii) two-years reference data before publication (Supplementary Figs. S23 – S26), and iv) five-years reference data before publication (Supplementary Figs. S27 – S30). Because we focus on the difference between QJs and UJs, we inspected the overall trend rather than the individual value. For instance, Supplementary Figs. S15, S19, S23, and S27 display the BC of the QJs and UJs for the four types of networks. The results indicate that the UJs had a higher BC than the QJs irrespective of the network type. Similarly, other centralities also exhibited robust patterns. Therefore, we concluded that the centrality results are not sensitive to network construction.