DeepAI
Log In Sign Up

The CI-index: a new index to characterize the scientific output of researchers

03/15/2019
by   Xuehua Yin, et al.
0

We propose a simple new index, named the CI-index, based on the Choquet integral to characterize the scientific output of researchers. This index is an improvement of the A-index and R-index and has a notable feature that highly cited papers have highly weights and lowly cited papers have lowly weights. In applications many researchers may have the same h-index, g-index or R-index. The CI-index can be provided an effective method of distinguish among such researchers.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/12/2021

Measuring scientific output of researchers by t-index and Data Envelopment Analysis

There is a growing need for ranking universities, departments, research ...
03/16/2017

A modification to Hirsch index allowing comparisons across different scientific fields

The aim of this paper is to propose a simple modification to the origina...
06/24/2019

Characterisation of the χ-index and the rec-index

Axiomatic characterisation of a bibliometric index provides insight into...
12/03/2021

The h-index

The h-index is a mainstream bibliometric indicator, since it is widely u...
08/09/2018

A note on limit results for the Penrose-Banzhaf index

It is well known that the Penrose-Banzhaf index of a weighted game can d...
06/19/2019

Extended probabilistic Rand index and the adjustable moving window-based pixel-pair sampling method

The probabilistic Rand (PR) index has the following three problems: It l...

1 Introduction

Since the physicist Hirsch (2005) introduced the so-called -index which as a new indicator for measure the impact of a researcher’s scientific research output, it has got a lot of attention from both in the scientific community and the scientometrics (informetrics) literature for its good properties to measure the scientific production of researchers, more than 8430 of articles have been written on the -index (Data from Google Scholar as of March 15, 2019). The result was reviewed in Nature (Ball (2005)) and Science (Anon (2005)) as well. A large part of the literature building on Hirsch’s work is concerned with introducing variants, extensions, and generalizations of the -index. In the study of Bornmann et al. (2011), no less than 37 variants of the -index were listed. In a more recent study (Bornmann (2014)) says that there are around 50 variants of the -index.

The -index is a simple single number incorporating both quantitative and qualitative aspects. The -index is also robust in the sense that it is insensitive to a set of uncited (or lowly cited) papers but also it is insensitive to one or several outstandingly highly cited papers. This last aspect can be considered as a drawback of the -index, for more details we refer to Egghe (2006a). For more advantages and disadvantages of the -index see also Hirsch (2005, 2007) and Jin et al. (2007). In order to overcome some of these limitations scientists have proposed several new indicators based on the -index with the intention of either replacing or complementing the original -index. To overcome the well-known problem of the insensitivity of the h-index to the number of citations received by highly cited paper, Egghe (2006a, 2006b) first developed the -index, Jin (2006) followed with the -index, Jin et al. (2007) suggested correcting the -index for the aging of papers using the -index, Egghe and Rousseau (2008) proposed a citation-weighted -index, Anderson et al. (2008) describe a new version of the -index, the “tapered h-index”, which positively scores all of an author’s citations, accounting for the tapered distribution of citations associated with highly cited papers rather than using a cut-off at . Hirsch (2010) proposed an index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Alonso et al. (2010) proposed a new index, called -index (), to characterize the scientific output of researchers which is based on both -index and -index to overcome the limitations of both indices. Recall that -index is simply defined as the average number of citations received by the publications included in the Hirsch core. Mathematically, this is where the numbers of citations ’s are ranked in decreasing order. Note that all the have the same weight . A closer related to -index is -index introduced in Jin et al. (2007) which is defined as . Recently, Perry and Reny (2016) introduced Euclidean index , which is the Euclidean length of . Note that although the Euclidean index avoids several shortcomings of the -index and its successors, but it still has drawbacks. For example, consider two researchers and , has 10 papers, each with 10 citations (= 10), and has 1 paper with 100 citations (= 1). However, has Euclidean index 31.6, has Euclidean index 100.

The aim of this paper is to present a new index—called the Choquet integral index (-index for short)- to characterize the scientific output of researchers. This index is an improvement of the -index and has a notable feature that highly cited papers have highly weights and lowly cited papers have lowly weights. To our best knowledge, such a index has not been studied before.

In the following, we first recall the definitions of distortion function and distortion expectation or Choquet integral, then we introduce three -indices, namely, -index in the -core, -index in the -core and -index in the -core, where stands for all citations.

2 Preliminaries

Definition 2.1.

A vector

is called a weight if having the properties

Moreover, if , then we call is increasing, on the other hand if , we call it is decreasing.

Definition 2.2.

A distortion function is a non-decreasing function such that .

The notion of distortion function was proposed by Yaari (1987) in dual theory of choice under risk, since then many different distortions

have been proposed in the literature. The distortion function is also called regular increasing monotone quantifier in computer science and artificial intelligence literature, see Yager (1996). Here we list some commonly used distortion functions:


Incomplete beta function , where and are parameters and . Setting gives the power distortion ; setting gives the dual-power distortion
The Wang distortion where is the distribution function of the standard normal.
The lookback distortion

Let us recall the standard definitions of convexity and concavity of functions.

Definition 2.3.

Let be an interval in real line . Then the function is said to be convex if for all and all , the inequality

holds. If this inequality is strict for all and , then is said to be strictly convex. A closely related concept is that of concavity: is said to be (strictly) concave if, and only if, is (strictly) convex.

Assume that are positive numbers, given a distortion function , considering the following weighted sum

(2.1)

where is the weights generated by as follows

Because of the nondecreasing nature of it follows that . Furthermore, from and , it follows that . If is convex, then is monotonic decreasing; if is concave, then

is monotonic increasing (see, e.g., Sha et al. (2018) for details). We remark that (2.1) can be written as a Choquet integral (see, e.g. Denneberg (1994)) of a random variable

with probability distribution

:

(2.2)

where is the decumulative distribution function of with probability distribution . For example,

for the case of , and

for the case of .

Obviously, the identity function is the smallest concave distortion function and also the largest convex distortion function; Any concave distortion function gives more weight to the tail than the identity function , whereas any convex distortion function gives less weight to the tail than the identity function . If , then , the expectation of . If is concave, then

(2.3)

and if is convex, then

Clearly, if for , then for any random variable .

From (2.2) we see that the Choquet integral satisfy the following properties:

a) Positive homogeneity: for any non-negative constant ;

b) Translation invariance: for any constant ;

c) Monotonicity: for any two random variables and , where with probability 1.

3 The Indices for Different Datasets

3.1 -index in the -core

A paper belongs to the -core of a scientist if it has citations (Hirsch 2010). Hence the -core may contain more than elements and -core contains exactly elements if only one paper has citations. We will use standard for the set of -core and the number of elements in is denoted by . Note that is a multiset, which, unlike a set, allows for multiple instances for each of its elements. The number or cardinality of a multiset is constructed by summing up the multiplicities of all its elements. For example, , the element 1 has multiplicity 2, 6 has multiplicity 1 and .

Let be the elements of -core which are ranked in decreasing order, where is the -index. Note that . The -index in the -core is defined as

(3.1)

where

In particular, when ,

(3.2)

where

Here is a distortion function. The reason that taking the root is to prevent the number being too large. The distinguishability of and are same, since the function is strictly increasing.

If is a concave distortion function, then by (2.3) we get , where is the -index which defined by (see e.g. Jin et al. (2007)). Taking in (3.1) yields

which can be seen as the modified version of -index. The -index are well-defined regardless of or not.

If , then

Therefore,

(3.3)

In particular, if , then

(3.4)

As illustrations of the calculation of the -index, we consider the following toy example.

Example 1. Suppose that .
(1) If , then , and

(2) If , then , and

(3) If , then , and

(4) If , then , and

(5) If , then , and

(6) If , then , and

(7) If , then , and

(8) If , then , and

(9) If , then , and

(10) If , then , and

Note that in the cases (1),(3) and (5) have the same , and , but with different -index. The following toy example considers three cases with the same -index and -index but with different -indices; in the cases (2),(4) and (6) have the same , and , but with different -indices. Therefore, the index has a good distinguishability.

3.2 -index in the -core

In order to give more weight to highly cited articles, Egghe (2006b) proposed the -index. The -index was presented by Egghe (2006a, b) as a simple variant of the -index. A set of papers has a -index if is the highest rank such that the top papers have, together, at least citations. This also means that the top papers have less than cites. Egghe and Rousseau (2008) pointed out that a small variant of the -index is possible by not limiting it to , where stands for total number of papers. This means that, in these cases, fictitious articles with 0 citations have to be added.

A paper belongs to the -core of a scientist if it has citations. Here we restrict ourselves . We will use stands for the set of -core and the number of elements in is denoted by . Obviously, .

Let be the elements of -core which are ranked in decreasing order, where is the -index. One can define an analogous quantity of -index in the -core

(3.5)

where

Here is a distortion function.

In particular, if , then . If is a concave distortion function, then by (2.3) we get where , which is closely related to -index (see, Schreiber (2010)). Taking in (3.3) yields .

If , then

(3.6)

3.3 -index in the core of all citations

Highly cited papers are, of course, important for the determination of the values of -index and -index. However, it is not to take into account the “tail” papers (with low number of citations). Thus, maybe many citations that accompany the most highly cited papers effectively contribute zero. A bibliometric measure of publication output should be assign a positive score to each new citation as it occurs. It is necessary to consider the -index in the core of all citations.

Let denote the number of published articles by a scientist, and let , denote the number of citations of the -th most cited article, so that . Assume that represents the total number of citations received. The -index in the core of all citations is defined as

(3.1)

where

Here is a distortion function. If is a concave distortion function, then by (2.3) where .

If , then

(3.2)

4 Concluding Remarks

Based on the Choquet integral and the foundation of the -index and -index we have introduced the -indices within the -core and -core. These new indices eliminate some of the disadvantages of the -index, -index, -index and -index and has a notable feature that highly cited papers have highly weights and lowly cited papers have lowly weights. The new indices discussed in the paper are useful complements to the -index of a scientist to quantify his/her scientific achievement. Finally, we consider the -index in the core of all citations. It would be of interest to see that -index can also be defined within other cores such as the -core. This research has not taken into account the effect of multiple authorship as in Hirsch (2010, 2019) and the effect of self-citation as in Bartneck and Kokkelmans (2011) and others which could be an excellent direction for further research. We hope that this new -index will be further studied and used in practical assessments.

Acknowledgements.  The research was supported by the National Natural Science Foundation of China (No. 11571198).

References

  • [1] Alonso. S., Cabrerizo, F. J., Herrera-Viedma, E. and Herrera, F. (2010). -index: a new index to characterize the scientific output of researchers based on the - and -indices. Scientometrics 82, 391-400.
  • [2] Anderson, T. R., Hankin, R. S. and Killworth, D. (2008). Beyond the Durfee square: Enhancing the -index to score total publication output. Scientometrics 76(3), 577-588.
  • [3] Anonymous (2005). Data point. Science 309(5738), 1181.
  • [4] Ball, P. (2005). Index aims for fair ranking of scientists. Nature 436(7053), 900.
  • [5] Bartneck, C. and Kokkelmans, S. (2011). Detecting -index manipulation through self-citation analysis. Scientometrics 87, 85-98.
  • [6] Bornmann, L., Mutz, R., Hug, S. and Daniel, H. (2011). A multilevel meta-analysis of studies reporting correlations between the index and 37 different hindex variants. Journal of Informetrics 5(3), 346-359.
  • [7] Denneberg, D. (1994). Non-additive Measure and Integral. Theory and Decision Library 27, Kluwer Academic Publilshers.
  • [8] Egghe, L. (2006a). How to improve the -index. The Scientist 20(3), 315-321.
  • [9] Egghe, L. (2006b). An improvment of the -index: the -index. ISSI Newsletter 2(1), 8-9.
  • [10] Egghe, L. and Rousseau, R. (2008). An -index weighted by citation impact. Information Processing and Management 44, 770-780.
  • [11] Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA 102(46), 16569-16572.
  • [12] Hirsch, J. E. (2007). Does the index have predictive power? Proceedings of the National Academy of Sciences of the United States of America 104(49), 19193-19198.
  • [13] Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics 85, 741-754.
  • [14] Hirsch, J. E. (2019). : An index to quantify an individual’s scientifc leadership. Scientometrics 118(2), 673-686.
  • [15] Jin, B. H. (2006). H-index: an evaluation indicator proposed by scientist. Science Focus 1(1), 8-9.
  • [16] Jin, B., Liang, L., Rousseau, R. and Egghe, L. (2007). The - and -indices: complementing the -index. Chinese Science Bulletin 52(6), 855-863.
  • [17] Perry, M. and Reny, J. (2016). How to count citations if you must. American Economic Review 106 (9), 2722-2741.
  • [18] Schreiber, M. (2010). Revisiting the -Index: The average number of citations in the -core. Journal of the American Society for Information Science and Technology 61(1), 169-174.
  • [19] Sha, X. Y., Xu, Z. S. and Yin, C. C. (2018). Elliptical distribution-based weight-determining method for ordered weighted averaging operators. International Journal of Intelligent Systems 2018, 1-20, DOI: 10.1002/int.22078.
  • [20] Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica 55, 95-115.
  • [21] Yager, R. R. (1996). Quantifier guided aggregation using OWA operators. International Journal of Intelligent Systems 11, 49-73.