Seasonal Entropy, Diversity and Inequality Measures of Submitted and Accepted Papers Distributions In Peer-Reviewed Journals

10/13/2019 ∙ by Marcel Ausloos, et al. ∙ 0

This paper presents a novel method for finding features in the analysis of variable distributions stemming from time series. We apply the methodology to the case of submitted and accepted papers in peer-reviewed journals. We provide a comparative study of editorial decisions for papers submitted to two peer-reviewed journals: the Journal of the Serbian Chemical Society (JSCS) and this MDPI Entropy journal. We cover three recent years for which the fate of submitted papers, about 600 papers to JSCS and 2500 to Entropy, is completely determined. Instead of comparing the number distributions of these papers as a function of time with respect to a uniform distribution, we analyze the relevant probabilities, from which we derive the information entropy. It is argued that such probabilities are indeed more relevant for authors than the actual number of submissions. We tie this entropy analysis to the so called diversity of the variable distributions. Furthermore, we emphasize the correspondence between the entropy and the diversity with inequality measures, like the Herfindahl-Hirschman index and the Theil index, itself being in the class of entropy measures; the Gini coefficient which also measures the diversity in ranking is calculated for further discussion. In this sample, the seasonal aspects of the peer review process are outlined. It is found that the use of such indices, non linear transformations of the data distributions, allow to distinguish features and evolutions of peer review process as a function of time as well as comparing non-uniformity of distributions. Furthermore, t- and z- statistical tests are applied in order to measure the significance (p-level) of the findings, i.e. whether papers are more likely to be accepted if they are submitted during a few specific months or "season"; the predictability strength depends on the journal.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Authors, who submit (by their own assumption) high quality papers to scholarly journals, are interested to know if there are factors which may increase the probability that their papers be accepted. One of such factors may be related to the month of submission, or to the day of submission, as recently discussed [1]

. Indeed, authors might wonder about editors and reviewers overload at some time of the year. Moreover, the number of submitted papers is relevant for editors and publishers handling machines to the point that artificial intelligence can be useful for helping journal editors

[2, 3]. More generally, informetrics and bibliometrics are also interested in the manuscript submission timing especially in the light of an enormous increase in the number of electronic journals.

From the author point of view, the rejection is often frustrating, be it due as a ”editor desk rejection” or following a review process. One has sometimes explained a high editor desk rejection rate due to an entrance barrier editor load effect [4]. Thus, it is of interest to observe whether there is a high probability of submission during specific months or seasons. In fact, the non uniform submission has already been studied. However, the acceptance distribution, during a year, i.e. a ”monthly bias”, is rarely studied, because of publisher secrecy. Search engines do not provide any information at all on the timing of rejected papers.

Interestingly, recently, Boja et al. [1] examined a large database on journals with high impact factor and reported that a day of the week correlation effect occurs between ”when a paper is submitted to a peer-reviewed journal (and) whether that paper is accepted”. However, bis, there was no study of rejected papers. - because of a lack of data Thus, one may wonder if beside a ”day of the week” effect, there is some ”seasonal” effect. One may indeed imagine that researchers in academic surroundings do not have a constant occupation rate, due to teaching classes, holidays, congresses, and even budgetary conditions. Researchers have only specific times during the academic year for producing research papers.

From the ”seasonal effect” point view, Shalvi et al. [5] found a discrepancy in the pattern of ”submission-per-month” and ”acceptance-per-month” for Psychological Science (), - but not for Social Psychology Bulletin (). Summer months inspired authors to submit more papers to , but the subsequent acceptance was not related to the effect of seasonal bias (based on a test for percentages); on the other hand, a very low rate of acceptance was recorded for manuscripts sent in November or December. The number of submissions to , on the contrary, was the greatest during winter months, followed by a reduced ”production” in April; however, the rate of the acceptance was the highest for papers submitted in the period [Aug.-Sept.-Oct.]. Moreover, a significant “acceptance success dip” was noted for submissions made in winter months. One of the main reasons for such differences between journals was conjectured to lie in different rejection policies; some journals employ desk rejection, whereas others do not.

Schreiber [4] analysed the acceptance rate in a journal, Europhysics Letters (), for a period of 12 years and found that the rate of manuscript submission exceeded the rate of their acceptance. The data revealed (Table 2 in [4]) that there is a maximum in the number of submissions in July, defined as a 10 % increase compared to the annual mean, together with a minimum in Feb., even taking into account the ”smaller length” of this month. He concluded that significant fluctuations exist between months. The acceptance rate was ranging from 45% to 55%; the highest acceptance rate was seen in July and the lowest in January, in the most recent years.

Recently, Ausloos et al. [6] studied submission and also subsequent acceptance data for two journals111a specialized (chemistry) scientific journal and a multidisciplinary journal, respectively , Journal of the Serbian Chemical Society (JSCS)222\(http://shd.org.rs/JSCS/\). and Entropy333 \(http://www.mdpi.com/journal/entropy\). , each over a 3 year time interval. The authors find that fluctuations, expectedly, occur: the number of submissions for is the greatest in July and Sept. and the smallest in May and Dec. The highest rate of paper submission for was noted in Oct. and Dec. and the lowest in Aug. Concerning acceptance for , the proportion of accepted/submitted manuscripts is the greatest in Jan. and Oct. Concerning acceptance for , the number of papers steadily increase from January till a peak in May, followed by a marked dip during summer time, before reaching a peak in October of the order of the May peak.

Concerning the number of submitted manuscripts, it was observed that the acceptance rate in was the highest if papers were submitted in January and February; it was significantly low if the submission occurred in December. In the case of , the highest rejection rate was for papers submitted in December and March, thus with a January-February peak; the lowest acceptance rate was for manuscripts submitted in June or December; the highest rate being for those sent in spring months, February to May. One recognizes a journal seasonal shift444We adapt the word ”seasonal”; even though changes in seasons occur on the 21st of various months, we approximate the season transition as occurring on the next 1st day of the following month. of the features.

Here, we propose another line of approach in order to study the submission, acceptance, and rejection (number and rate) diversity, based on probabilities, with emphasis on the conditional probabilities, thereafter measuring the entropy and other characteristics, of the distributions. Indeed, the entropy is a measure of disorder, and one of several ways to measure diversity. Researchers have their own preference [7, 8] in measuring diversity. Here below, we practically adapt the classical measure of diversity, as used in ecology, but other cases of interest pertaining to information science [9, 10] can be mentioned.

Let us recall that the general equation of diversity is often written in the form [11, 12]

(1.1)

in which , and the measured variable. For , reduces to the exponential of the Shannon entropy [13, 14]

(1.2)

to which we will only stick here.

Several inequality measures are commonly used in the literature: in the class of entropy related measures, one finds the exponential entropy [15], which measures the extent of a distribution, and the Theil index [16] which emerges as the most popular one [17, 18], beside the Hirschman-Herfindahl index [19], measuring ”concentrations”. ”Finally”, upon ranking according to their size the measured variable, the Gini coefficient [20], is a classical indicator of non-uniform distributions.

The Theil index [16] is defined by

(1.3)

It seems obvious that the Theil index can be expressed in terms of the negative entropy

(1.4)

indicating the deviation from the maximum disorder entropy, ,

(1.5)

The exponential entropy [15] is

(1.6)

The Hirschman-Herfindahl index (HHI) [19] is an indicator of the ”concentration” of variables, the ”amount of competition” between the months, here. The higher the value of HHI, the smaller the number of months with a large value of (submitted, or accepted, or accepted if submitted) papers in a given month. Formally, adapting the HHI notion to the present case,

(1.7)

Notice that .

The Gini coefficient [20] has been widely used as a measure of income [21] or wealth inequality [22, 23]; nowadays, it is widely used in many other fields. In brief, defining first the Lorenz curve as the percentage contributed by the bottom of the variable population to the total value of the measured (and now ranked) variable , i.e., , one obtains the Gini coefficient as twice the area between this Lorenz curve and the diagonal line in the plane; such a diagonal represents perfect equality; whence, corresponds to perfect equality of the variables.

Having set up the framework and presented the definition of the indices to be calculated, we turn to the data and its analysis, in Section 2 and Section 3 respectively. Their discussion and comments on the present study, together with a remark on its limitations, are found in the conclusion Section 4.

2 Data

In order to develop the method measuring the disorder of the time series, let us recall the necessary data. The raw data can be found in [6]. For completeness, let the time series of submitted and of accepted papers if submitted during a given month to and to be recalled through Fig. 1 for the years in which the full data is available, i.e. for which the final decisions have been made on the submitted papers.

Let us introduce notations:

  • the number of monthly submissions in a given month () in year () is called

  • the percentage of this set is the probability of submission in a given month for a specific year

  • similarly, one can define , as being the number of accepted papers when submitted in year () in a specific month (),

  • and for the related percentage, one has ;

  • more importantly, for authors, the (conditional) probability of a paper acceptance when submitted in a given month may be considered and estimated before submission

    (2.1)

Thereafter, one can deduce the relevant ”monthly information entropies”

and the overall information entropy:

in order to pin point whether the yearly distributions are disordered.

Moreover, we can discuss the data not only comparing different years, but also the cumulated data per month in the examined time interval as if all years are ”equivalent” :

  • , from which one deduces

  • and similarly for the accepted papers , and

  • leading to the ratio between cumulated monthly data

    (2.2)
  • and to the corresponding ”monthly cumulated entropy”, ,

  • finally to

which will be called the ”conditional entropy”.

Relevant values are given in Tables 1-4 both for and for . The diversity and the inequality index values are given in Table 5. Most of the results stem for the use of a free online software [24].

3 Data analysis

3.1 Data

First, notice that the 3 -year long time series in itself is not part of the main aim of the paper; this is because we intend to compare data with an equivalent number of degrees of freedom, i.e. 11, for all studied cases. Nevertheless, for completeness, and in order not to distract readers from our framework, we provide relevant figures, but in Appendix, together with a note on the corresponding discrete Fourier transform.

3.2 Analysis

The relevant values for the various indices, given in Tables 1-4, both for and for , serve for the following analysis. We consider 3 aspects: (i) a posteriori features findings;, (ii) non-linear entropy indices, and (iii) forecasting aspects.

3.2.1 A posteriori features findings

Browsing through Table 1, it can be noticed that the distribution of probabilities of submissions is weaker during the February-May months for , but is rather high for the fall and winter months. For , the highest probability of submissions also occurs in October-December, and is preceded by a low rate of submissions, the lowest being in February and in August, should one say at vacation times. Let us recall that the extremum entropy (for ”perfect disorder”) is here .

Apparently this submission evolution pattern is reflected, see Table 2, in the acceptance rate, except for which has a low acceptance rate for papers submitted in winter 2014. For , the weaker acceptance rate occur for papers submitted during August-September months, say end of summer time.

Statistical tests, e.g., , can be provided to ensure the validity of these findings for percentages, but taking into account the number of observations. In all cases such a test demonstrates that the distributions are far from uniform, suggesting to look further for the major deviations. See a discussion of others texts in subsection 3.2.3

However, values only measure the probability of monthly acceptances without considering the number of submissions in a given month. It is in this respect more appropriate to look at the conditional probabilities, , as in Table 3. For , the highest values of are found for winter months: has a notable maximum in January. and the lowest for spring-summer time, from March till August. There is a shift of such a pattern for : the highest conditional probabilities occur during spring time, except in 2016.

The corresponding values of the monthly entropy, for the given years and for the cumulated distributions, are found in Table 4. All values of the entropy are remarkably , both for and , suggesting some sort of universality. One can notice that the entropy steadily increases as a function of time both for and , - the growth rate being about twice as large for the latter journal. This is somewhat slightly surprising since one should expect an averaging effect in the case of because of the multidisciplinarity of involved topics. Comparing such values indicate that the distributions are far from uniform555The slight difference between the last lines of Table 3 and Table 4, displaying the ”conditional entropy” is merely due to rounding errors. indeed.

3.2.2 Non-linear entropy indices

The diversity and inequality measures given in Table 5. The diversity index is remarkably similar for both journals () for the submitted papers and accepted papers distributions. The similarity holds also for the HHI , although a little bit lower for the journal

. The diversity index for the conditional probability distributions is however rather different: both increase as a function of time, indicating an increase in concentrations for the in favor of relevant months. This increase rate is much higher for

than for .

The inequality between months is rather low, as further well seen in the Gini coefficient; there is a weak inequality between months. However, there is a factor in favor of , which we interpret as due to the greater specificity of , implying a smaller involved community and specially favored topics. This numerical observation reinforces what can be deduced from the Theil index, whence inducing the same conclusion.

3.2.3 Forecasting aspects

Considering the rather small sizes of both samples (not our faults!), it is of interest to discuss the significance of the findings, in some sense in view of suggesting some ”strategy” after the ”diagnosis”. The notions of ”false positives” and ”false negatives”, as in medical testing, can be applied in our framework.

In brief, a ”false positive” occurs as an error when a test result improperly indicates the presence (high probability) of an outcome, when in reality it is not present; obviously, a contrario

a ”false negative” is an error in which a test result improperly indicates no presence of a condition (the result is negative), when in reality it is present. This corresponds to rejecting (or accepting) a null hypothesis, e.g., in econometrics. Thus, two statistical tests have been used for such a discussion: (i) the

Student test and (ii) the

test. Recall that they are used if either one does not know or one knows the variance (or standard deviation) of the sample and test distributions. Such characteristics are given in Table 1-4 for each relevant quantity.

For completeness, one has also given the confidence interval [

]. It is easily seen that there is no outlier. This observation would lead, like other authors, to claim that there is no anomaly in the monthly numbers and subsequent percentages, in contradistinction with the

values and tests. We should here point out that the Student test leads to a -value 0.0001, whence to a quite significant result. Concentrating our attention to the (monthly and annual) conditional probabilities , the test gives the significance reported in Table 4. The values (so called , or error of type I) in hypothesis testing, indicate that the correct conclusion is to reject the null hypothesis and to consider the existence of ”false positives”. This is essentially due to the sample size. It is remarkable that the order of magnitude differs for and for .

4 Conclusion

The data on the number of submitted papers is relevant for editors, and the more so nowadays for publishers due to the automatic handling of papers. The relative number of accepted papers is less significant in that respect, but the conditional probability of having an accepted paper if it is submitted in a given month is much relevant for authors. Authors expect fast and (hopefully) positive response from journals as they are probably interested to discover the best timing for their submission in order to avoid possible editor overload negative effect in a particular moment. For these authors, the possible seasonal bias issue is expected to be relevant, as they would like to know whether a specific month of submission will increase the chance that their paper will be accepted. Thus, the probability of acceptance, the so called ”acceptance rate” is the relevant variable to be studied! Instead of

tests or observing the ”confidence interval” on monthly distributions, we have proposed a new line of approach: considering the diversity and inequality in the distributions of papers submitted, accepted, or accepted if submitted in a given month through information indices, like the Shannon entropy [25], the diversity index, the Gini coefficients and the Hirschman-Herfindahl index.

From this cases study, a seasonal bias seems stronger in the specialized () journal. The features are emphasized because we use a non linear transformation of the data, through information concepts, having their usefulness demonstrated in many other fields [26]. In the present cases, the seasonal bias effects are observed. The overall significance and the universality features might have to be re-examined if more data was available. Indeed the values (so called , or error of type I) in hypothesis testing, indicate that the correct conclusion is to consider the existence of ”false positives”.

Our outlined findings suggest intrinsic behavioral hypotheses for future research. Complementary aspects must be used as ingredients in order to understand whether some seasonal bias occurs [27, 28]. One has markedly to take into account the scientific work environment, beside the journal favored topics.


Acknowledgements

MA greatly thanks the MDPI Entropy Editorial staff for gathering and cleaning up the raw data, and in particular Yuejiao Hu, Managing Editor. Thanks go also to the reviewers and editor.

References

  • [1] Boja, C. E., Herţeliu, C., Dârdală, M., & Ileanu, B. V. (2018). Day of the week submission effect for accepted papers in Physica A, PLOS ONE, Nature and Cell. Scientometrics 117, 887-918.
  • [2] Mrowinski, M.J., Fronczak, A., Fronczak, P., Nedic, O., & Ausloos, M. (2016). ”Review times in peer review: quantitative analysis and modelling of editorial work flows”, Scientometrics 107, 271-286.
  • [3]

    Mrowinski, M.J., Fronczak, A., Fronczak, P., Nedic, O., & Ausloos, M. (2017). ”Artificial intelligence in peer review: how can evolutionary computation support journal editors?”.

    PLoS One 12, e0184711.
  • [4] Schreiber, M. (2012). ”Seasonal bias in editorial decisions for a physics journal: you should write when you like, but submit in July”. Learned Publishing 25, 145-151.
  • [5] Shalvi, S., Baas, M., Handgraaf, M.J.J,, & De Dreu, C.K.W. (2010). ”Write when hot - submit when not: seasonal bias in peer review or acceptance?” Learned Publishing 23, 117-123.
  • [6] Ausloos, M., Nedič, O., & Dekanski, A. (2019). Correlations between submission and acceptance of papers in peer review journals. Scientometrics, 1-24.
  • [7] Marhuenda, Y., Morales, D. and Pardo, M.C., (2005). A comparison of uniformity tests. Statistics 39(4), 315-327.
  • [8] Alizadeh Noughabi, H.A., (2017). Entropy-based tests of uniformity: A Monte Carlo power comparison. Communications in Statistics-Simulation and Computation 46(2), 1266-1279.
  • [9] Rousseau, R., (1992). Concentration and diversity of availability and use in information systems: A positive reinforcement model. . Journal of the American Society for Information Science, 43(5), 391-395.
  • [10] Leydesdorff, L. and Rafols, I., (2011). Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics, 5(1), 87-100.
  • [11] Hill, M. O. (1973). Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427-432.
  • [12] Jost, L. (2006). Entropy and diversity. Oikos 113, 363-375.
  • [13] Shannon, C. (1948). A mathematical theory of communications, Bell. Syst. Tech. J. 27, 379-423; , Bell. Syst. Tech. J. 27, 623-656
  • [14] Shannon, C. (1951). Prediction and entropy of printed English, Bell Syst. Tech. J. 30, 50-64.
  • [15] Campbell, L.L., (1966). Exponential entropy as a measure of extent of a distribution. Probability Theory and Related Fields, 5(3), 217-225.
  • [16] Theil, H. (1967). Economics and Information Theory, (Rand McNally and Company, Chicago).
  • [17] Beirlant, J., Dudewicz, E.J., Györfi, L. and Van der Meulen, E.C., 1997. Nonparametric entropy estimation: An overview. International . Journal of Mathematical and Statistical Sciences 6(1), 17-39.
  • [18] Oancea, B. and .Pirjol, D. (2019). Extremal properties of the Theil and Gini measures of inequality Quality & Quantity 53, 859-869.
  • [19] Hirschman, A.O. (1964). The paternity of an index, The American Economic Review 54(5), 761-762.
  • [20] Gini, C. (1910). Índice di Concentrazione e di Dipendenza. Biblioteca dell’Economista, serie V, vol. XX, Utet, Torino. (in Italian); English translation in Rivista di Politica Economica 87, 769-789 (1997).
  • [21] Atkinson, A. B. and Bourguignon, F. (Eds.). (2014). Handbook of income distribution (Vol. 2). Elsevier.
  • [22] Cerqueti, R., and Ausloos, M. (2015). Statistical assessment of regional wealth inequalities: the Italian case. Quality & Quantity 49(6) 2307-2323.
  • [23] Cerqueti R. and Ausloos, M. (2015). Socio-economical Analysis of Italy: the case of hagiotoponym cities. The Social Science Journal
  • [24] Wessa, P. (2014). Free Statistics Software, Office for Research Development and Education, version 1.1.23-r. \(http://www.wessa.net/\).
  • [25] Crooks, G.E. (2015). On Measures of Entropy and Information. . Tech. Note 009 .7. \(http://threeplusone.com/info\)
  • [26] Clippe, P. and Ausloos, M. (2012). Benford’s law and Theil transform of financial data, Physica A 391, 6556-6567.
  • [27] Nedić, O., Drvenica, I., Ausloos, M., and Dekanski, A. B. (2018). Efficiency in managing peer-review of scientific manuscripts-editors’ perspective. J. Serb. Chem. Soc. 83, 1391-1405.
  • [28] Drvenica, I., Bravo, G., Vejmelka, L., Dekanski, A.,  and Nedić, O. (2019). Peer Review of Reviewers: The Author’s Perspective. Publications 7, 1.