It is well-known that ranking of journals, whether in science, technology, engineering or in social sciences, such as in economics, is a contentious issue. For many subjects, there is no correct ranking, but a universe of rankings, each a result of subjective criteria included by its creators. In this regard, the following studies are instructive: (Engemann Wall (2009), Jangid . (2014)). With the creators’ choices and rules laid out explicitly, the users of such ranking still need to use own judgments and institutional requirements to choose ranks appropriately. The subjective element in journal rankings not only complicates matters about what is correct, if any, but also about outcomes that depend crucially on adoption and analysis of rankings. For science and related subjects, SCOPUS and SCIMAGO hold some of the best journal ranking systems to this day, using their Cite Score and SJR indicators respectively, to rank journals.(Kianifar . (2014)) However, owing to the manner in which both these indicators are considered, it is often the case that the received ranking might not always display the true quality and outreach of a specific scientific journal. Obviously, this could be true for a large number of subjects across length and breadth of contemporary research therefore recourse to a scientifically more acceptable method should always be of interest and often beneficial for a large set of users. To demonstrate this, we therefore considered the case of the Journal entitled, Astronomy and Computing, within the context of SCOPUS Journals in the relevant domain of AstroInformatics (Bora . (2016)) , in particular and Astronomy and Astrophysics, in general.
The primary focus of this case study is to determine the standing of Journal Astronomy and Computing with respect to other journals which were established prior to it. More importantly, the reasons for such standing need to be investigated which is a more complex and qualitative study. The algorithm also tests the validity of the ranking and suggests an alternative rank that used a more holistic approach towards the features. While this paper focuses on a specific journal, it is easy to see that the purpose of this construct is broad-based and deep-seated at the same time, such that the applications of the algorithms can be adopted by numerous other subjects grappling with the same problem.
|L1 Scheme Rank||SJR based Rank||Year|
|Astronomy and Computing||39||31||2013|
|Astronomy and Astrophysics Review||40||5||1999|
|Radiophysics and Quantum Electronics||41||51||1969|
|Solar System Research||42||48||1999|
|Living Reviews in Solar Physics||43||3||2005|
|Journal of Astrophysics and Astronomy||45||55||1999|
|Revista Mexicana de Astronomia y Astrofisica||46||23||1999|
|Journal of the Korean Astronomical Society||48||32||2009|
|Geophysical and Astrophysical Fluid Dynamics||50||46||1999|
|New Astronomy Reviews||51||12||1999|
|Kinematics and Physics of Celestial Bodies||52||65||2009|
|Astronomy and Geophysics||53||67||1996|
|Chinese Astronomy and Astrophysics||54||72||1981|
2 Motivation: The ranking scheme
We implemented -norm SVD scheme using the publicly available SCOPUS dataset to rank all its corresponding journals, and simultaneously determine the potency of the algorithm. The outcome of the ranking scheme posed interesting and compelling questions which led us to model the growing influence of the particular journal. We discuss the detailed method in Appendix A, for the simple reason that the focus of the manuscript is not on the ranking methods, rather on the model formulation and interpretation explaining such rank. SCOPUS contains approximately 46,000 Journals listed in different domains. Discarding few redundancies, SCOPUS effectively covers a large range of metrics and provides adequate resources for verification. For this demonstration, we have considered 7 different metrics from SCOPUS to be used as features in our algorithm. These features include Citation Count, Scholarly Output, SNIP, SJR, Cite Score, Percentile and Percent Cited.
Indeed, to cross-verify the results of the algorithm these were compared to SJR based ranking of SCIMAGO for suitable articulation of the discrepancies. It seems that the -norm SVD scheme works quite successfully (Aedula . (2018)) in rating the journals and approaches the data in a more comprehensive way. The result is a ranking system which ranks Astronomy and Computing much higher than most of the older journals and at the same time highlights the niche prominence of the particular journal. Similarly, this method also highlights the rise of other journals which were underrepresented due to the usage of the SCOPUS and SCIMAGO indicators only. This method, therefore, has been largely successful in rectifying the rank of such journals. Importantly, the -norm SVD scheme can be extrapolated to other data as well. It can be used to study the impact of individual articles, for example. Utilizing similar features such as Total Citation, Self Citation, and NLIQ (Ginde . (2016)), the algorithm can be used to rank articles within a journal with great accuracy along with a holistic coverage. To re-apprise the scope of this research, it is important to remember that the common practice (Engemann Wall (2009))has been to control for the size of the journal (measures like pages, number of articles, even characters), age of article, age of citation, reference intensity, exclusion of self-citations, etc.
In order to be precise, the ranking scheme raises some important questions which can be reasonably challenging. Standard scientometric features used to study influence/reputation of journals are not adequate for explaining the ascendancy of ASCOM in influence. The importance of investigating intrinsic dynamics is rarely stressed upon in scientometric literature (Fei . (2015)). Usually, the analysis is static, based on citations and other factors. The authors intend to bring out the missing dynamics via the DDE based model. The following set of questions are addressed in this study. What are the non-quantitative factors (could be qualitative and difficult to quantify) explaining the rapid growth of this journal? What is the direction of causation, and how do we frame it? Does the big data landscape help? Can we formulate a model that reasonably accounts for such surge in influence? Are there features/factors, not statistically significant but play crucial roles as implicit control variables toward the phenomena? The proposed model (Section 4 onward) addresses these questions.
2.1 Knowledge Discovery and the Evolution of ASCOM: Key Motivation for the model
Albeit, Astronomy and Computing (ASCOM) has been in publication for five years only, its reputation has grown quickly as can be observed from the ranking system proposed here. This is despite the fact that ASCOM is severely handicapped in size. There is no journal focused on the interface of astronomy and computing in the same way as ASCOM. It can be observed from Table 1 that, ASCOM, in comparison with the other journals listed, is significantly younger! Unless the number of volumes and issues published are significant, a journal is unlikely to create the equivalent impact of an established journal. This is a notable handicap for any new journal, ASCOM being no exception. We define this as ”size handicap”.
Despite the ”size handicap” explained above, ASCOM is ranked 39 according to our method, slightly lower than its 31 rank in SCOPUS. This is due to the fact that we have not used ”citations from more prestigious journals” as a feature (this data are not readily available). Nonetheless, it is ranked higher than many of its peers which have been in publication for over 20 years. This is also due to the fact that ASCOM is ”one of its kind” and uniquely positioned in the scientific space steered by appropriate editorial support. However SUBJECTIVE the statement may sound, it seems that interdisciplinary, diversity in background of the Editors and authors and novelty in theme have been instrumental in placing journals uniquely (NAKAWATASE (2017). Rodríguez (2016) Jacobs Rebecca. (2012) Erfanmanesh (2017)). Such qualitative feature, regrettably is not visible from the big data landscape alone. This is another significant driving factor behind framing and interpreting a novel model that explains trends arising from investigating the big data landscape.
There is another interesting observation to take note of. By ignoring the ”size does matter” paradigm, the ranks of some journals (many years in publication with several volumes and issues) suffered according to our method. A few examples include Living Reviews in Solar Physics, ranked 43 according to our scheme while it is ranked 3 in SCOPUS; and Astronomy and Astrophysics Review, ranked 40 in our scheme while it is ranked 5 according to SCOPUS. This reversal of positions should be considered as important findings, because existing methods do not offer appropriate weights to journals that are new, despite catering to a niche and important area of research. In other words, the results indicate that years in publication may sometimes dominate over other indicators of quality and may not capture the growth of journals in ”short time windows”. Our study also reveals that ASCOM is indeed a quality journal as far as early promise is concerned.
Scientometrics deals with analyzing and quantifying works in science, technology, and innovation. It is a study that focuses on quality rather than quantity. The journals are evaluated against several metrics such as the impact of the journals, scientific citation, SJR, SNIP indicators as well as the indicators used in policy and management contexts. The practice of using journal metrics for evaluation involves handling a large volume of data to derive useful patterns and conclusions (Manyika . ()). These metrics play an important role in the measurement and evaluation of research performance. Due to the fact that most metrics are easily susceptible to manipulation and misuse, it becomes essential to judge and evaluate a journal by using a single metric or a reduced set of significant metrics. We proposed l-norm Singular Value Decomposition(l-SVD) (Aedula . (2018)) to efficiently solve this problem. The code of the proposed method is available at Aedula ().
2.2 The Big Data Landscape
The appeal of modern-day computing is its flexibility to handle volumes of data through an aspect of coordination and integration. Advancements in Big Data frameworks,(Apache) and technologies has allowed us to break the barriers of memory constraints for computing and implement a more scalable approach to employ methods and algorithms. The aforementioned journal ranking scheme is one such algorithm which thrives under the improvements made to scalability in Big Data. With optimized additions such as Apache Spark to the distributed computing family, the enactment of Regularization and Singular Value Decomposition has reached an all new height. Implementing the SVD algorithm with the help of Spark can not only improve spatial efficiency but temporal as well. The -norm SVD scheme utilizes the SVD and regularization implementation of ARPACK and LAPACK libraries along with a cluster setup to enhance the speed of execution by a magnitude of at least three times depending on the configuration. Collecting data is also a very important aspect of Big Data topography. The necessity of a cluster based system is rendered useless without the requisite data to substantiate it. Scientometric data usually deals with properties of the journals such as Total Citation, Self-Citation etc. This data could be collected using Web Scraping methodologies but also can be found by most journal ranking organizations, available for open source use; SCOPUS and SCIMAGO. For the -norm SVD scheme, we used SCOPUS as it had an eclectic set of features which were deemed appropriate to showcase the effectiveness of the algorithm. The inclusion of the two important factors such as Cite Score and SJR indicators gave a better enhancement over just considering one over the other. For more information about the data and code used to develop this algorithm (please refer to Aedula (2018), Github repository of the project).
The landscape and the framework, although attractive are insufficient to explain the rapid rate of growth of ASCOM. The remainder of the paper is organized as follows. We begin by presenting the motivation for Delay Differential Equation (DDE) based modeling by outlining key strengths of such modeling concept. Next, we consider time reversed DDE to model the growth by including historical effects, a fundamental contribution in section 4. Section 5 contains solutions, analytically and computationally investigated and interpreted in light of the big data landscape. Section 6 considers further modifications in the model by adding Editorial reputation and Publisher Goodwill value. We discuss the implications of these additional factors and the fundamental assumptions in Discussion & Conclusion Sections, 7 and 8.
3 Scope of our study and Motivation for modeling via DDE
The manuscript strives to achieve two fundamental objectives:
We establish and quantify current journal influence as a function of its past influence. If the past influence is positive (good inheritance), the present journal influence benefits immensely from it (Please see sections 4, 5 and fig.s 2, 3 and 4).
The manuscript proposes a doctrine of ” self-serving incentivization” by exploiting implicit control variables (publisher goodwill value and editorial reputation-the celebrity effect). The so-called ” incentivized model” is proposed to propagate a positive ”start-up boost” to the journal influence. Thereby, these control variables and the modifications form the second and more advanced, complex layer in modeling journal influence (Please see sections 6, 7 and fig.s 5-10) and help quantify the theory of ” celebrity effect”.
The factors mentioned above and the resulting model explained in the subsequent sections also account for the remarkable growth in influence and ASCOM discussed in sections 1 and 2. We achieve this by the DDE based model presented below.
DDE is a well known concept for over two centuries, which has found application in various problems in the fields of dynamical modeling of biomedical systems, biochemical reactions as well as in the newer models of interpersonal/romantic relationships!! DDEs also find useful applications like dynamic population growth, economic growth and spread of diseases like HIV, cancer, etc. Delay Differential Equations belong to the class of Partial Differential Equations. These are used by the scientific community for modeling dynamic systems for many of the obvious advantages. These equations describe the rate of change of a function, at time ’t’ as a function of earlier times. A DDE in its general form can be given by:
considering a constant delay of . Some of the advantages of DDEs are:
DDE take care of the ”hereditary effects” during modeling a system. This implies if the influence of a journal is positive in the past and/or intrinsic factors have been responsible for surge in reputation, such features are naturally modeled in DDEs.
In system modeling, it is desirable that the model is closer to the real process (in our case, influence diffusion and percolation) and it has been observed that DDEs offer a better model than others.
DDEs are seen to provide a better control over the system since historical data is directly modeled in to the system (using time-reversed structure). This is particularly desirable.
In case of a DDE, the initial point defined over the interval [-, 0] , is a function and not just a point. The solution is also a function in the same interval. Hence, the solution becomes infinite dimensional, unlike an ODE. Moreover, in a dynamical system, DDE takes care of rate of growth, which is a robust form of looking at the real world problem than just reading from hereditary events and inferring from them.
4 TIME REVERSED DDE: Our Contribution
Let rate of change of influence over time, influence @ time , and influence @ time ( or and so on). The Time Reversed equation can now be written as
which implies the rate of change of influence is represented as a combination of present and past influence. Let us consider a simple growth model given as
where is not the Initial condition but is the value at the instant of time under the interval of consideration. We represent this linear growth in the form of time reversed structures as follows:
We assume symmetric influence function; there are two possibilities, symmetric and non-symmetric influence. Differentiating w.r.t ,
Note: may exhibit linear growth under the assumption that there is a certain repeatability in the journal influence.
4.1 The model under non-symmetric influence:
Let us not consider the symmetric influence function since it is too strong an assumption to begin with (fluctuations are absent, unidirectional slope, elements of uncertainty almost absent). Let us consider the same model given as
without the assumption of symmetric influence (p(t)=p(-t)). Here also, is not the Initial condition but is the value at the instant of time under the interval of consideration. Reorganizing equation 3,
Assuming and are continuous functions, we fix and . Putting this in the equation, we obtain
Let be an integrating factor(Saha (2011)). Multiply both sides of equation with and integrating, we arrive at the following form:
and eventually the expression for journal influence is written as
Under the assumption of non-symmetric influence (more realistic), the influence seems exponential growth or decay depending on the coefficients but not a combination of both in a single expression. We shall see a different picture in the next section when we encounter non-linear growth in influence for a slightly more complicated, time reversed model.
Remark: Please note the above model does not contain ”history” functions. Hence the solution does not display a convex combination of exponential functions, which can be easily interpreted in light of historical data. This is in contrast to the simple case (we assume a symmetric influence) where we can safely conclude that if either the historic influence or the current influence of the journal is high then the journal is most likely going to experience further rise in influence in the near future.
4.2 Modeling Non-linear growth using symmetric influence effects
Let us consider eq.(1) with the condition by mapping these to the following DDE:
Consider . Our proposed model is a special case of DDE and it will be shown later that eq. has at least one solution, which may not be necessarily unique.
Solution Methodology: Let us consider the time reversed model eq.:
By Symmetry, we have
Differentiating Eq.(2) wrt t, we get,
Again, by symmetry,
Solution is of the form,
It is evident that is an exponential function. Using initial conditions, solving for A & B in terms of a & b we get,
Depending on the coefficient values, either positive or negative exponents will dominate. The two possible solutions depend on the value of r.
When , we can expect an exponential real solution
When , there will be oscillatory solutions, due to r being imaginary. Again, these solutions are deemed infeasible due to lack of fixed periodicity.
is considered to be in the middle of a short time frame, at which, we are measuring the influence. Hence, this is not considered as initial value problem and hence we are not guaranteed of a unique solution. is the mirror image of and it will result in a sharp spike in influence provided its value is high. This is typically observed in a short time window and averages out in the longer time span. We see that depending on the values of the parameters and b either the historical or the current data dominates . The curve shows that in first few years the influence is largely dominated by the past reputation of the editors , represented by the historical part of the DDE. After a certain point ( we have assumed this point to be at the center of time series data), other parameters such as the current journal citations and the current reputation of the editors begin to reflect on the influence.
5 Model Fitting:
Let us recall Eq.:
We also know, by approximation that,
where is the step size. Let us consider the spread at discrete time intervals corresponding to one to five years (obtained from the dataset), indicated as respectively. Here, we can write . Also,
The value on LHS is obtained from the dataset. Similarly, we can compute
etc., can be obtained from the dataset, where the fractions represent the quarters in a year. We are now required to estimate the coefficients a, b, A & B. This is an overestimation problem with number of equations exceeding number of unknowns. We can solve this by method of Least Squares and hence use the solution to predict future influence in rate of journal influence spread.
5.1 Least Square Method to fit the data:
From eq. (1), we obtain
Let . Therefore, eq. (1) becomes
Differentiating w.r.t a,
Differentiating w.r.t b,
On solving eq. (11) and eq. (12), we obtain the values of and .
ESTIMATING ’A’ and ’B’:
We have found that
On solving eq. (13) and eq. (14) we can obtain values of w1 and w2. Hence,we can also find the values of A and B. We present the algorithm below.
6 Model Modification to accommodate implicit control variables
Additionally, we consider implicit control variables which play important roles in the growth of any journal. These variables pose challenges to the modeling set up and without these, the scope is limited to empirical verification at a minor scale only. Next step in modeling data is to carry out modifications to this structure in order to accommodate implicit control parameters such as publisher goodwill value and “start-up initiative” by editors (editorial reputation). We define this initiative as the reputation of editors who steered the journal and offered a strong attraction for quality submissions from scholars across the globe. It is realistic to hypothesize that reputed scholars acting as editors add value and credibility to an emerging journal. This value however is extremely hard to quantify and therefore modeling such phenomenon is novel and imperative to understand the journal’s growth pattern. We propose to present the model and the analytical solution, repeat the exercise of sections 3 and 4 and discuss the implication of the proposed modification.
The Time Reversed equation with the additive influence term (Publisher goodwill value) can now be re-written as
where is an additive term implying goodwill of the publishing house, Elsevier, in our case! , OTOH represents Editors’ reputation.
6.1 Additional Considerations
Let us assume to be either linear or exponential. Such considerations are justified since any reputed publisher, in order to remain competitive, would strive to enhance goodwill. Thus, can’t possibly be a constant.
We pose the next question pertinent to quantification of goodwill. It is modeled as a function of the percentage of accepted papers over time, a trend that accommodates a fixed number of accepted articles and the selection criteria of additional papers becomes increasingly stringent. It is modeled as
where is the percentage of articles accepted after the initial threshold of articles. is the initial threshold, conveniently set to ensure that the influence doesn’t hover to the negative.
Thus, is a control variable in the formulation and explanation of publisher goodwill. This implies, increasingly the percentage of accepted articles will diminish. Such stringent measures in peer-review bolster publisher goodwill.
The formulation being in place, we now integrate with the modified model.
Editorial reputation may be any of the three: a constant function, linear or exponential growth. The first one is more likely since the Editors of ASCOM are well established in their fields. Therefore, it is less likely that their phase of influence is still growing at quadratic rate or higher. In fact, we have observed that the influence pattern (citations) is steady. Nonetheless, we have considered all three possibilities and discuss the implications after integrating Editorial reputation, (which is a function of time) in to the model.
6.2 Temporal evolution of publisher goodwill value
Figures 5(a) and 5(b) throw some useful insights. We hypothesize that the linear graph (fig. 5(b)) is a subset of the non-linear one (fig. 5(a)). Fig 5(b), which is a time-series plot of publisher goodwill value is linear upon fitting the ASCOM data. Fig. 5(a) is an extended time window plot of the same journal which is accomplished by simulating the data available from 5 years, extended to 10 years. The 5-year trend, if we take the time-slice off from fig. 5(a), produces fig. 5(b). This is done to establish the hypothesis that, available data to understand and predict longer time average behavior is insufficient.
This synthetic experiment implies that, if the good work in the past continues (good inheritance in terms of positive influence of the implicit control variable, the publisher goodwill, it shall continue to grow in non-linear fashion). The observation is in agreement with the publisher in question, Elsevier, who pursues aggressive and stringent quality practices toward the larger goal of monopoly in the business of publishing. At this point, we may note that, the nonlinear, time dependent trend shall influence the overall journal growth in influence to a greater proportion in comparison with the model we assumed in eq. (16) (which is time-independent). We draw such inferences from the goodwill value as a time series plot by resolving the equation with fitted goodwill value model from time-series data. We show that in the ensuing discussion and the figures below (fig. 6, 7, 8). Let us now consider eq. (16). On adding the publishers goodwill as a function of time we obtain the equation,
On solving the above equation on similar lines outlined in Appendix C, we obtain expressions of journal influence as solutions for the three different cases of being constant, linear and exponential and being the time dependent function instead of a function of accepted articles as discussed earlier.
CASE 1 (Fig. 6): Let us assume that
CASE 2 (Fig. 7): Let us assume that is linear: = At+B
CASE 3 (Fig. 8): Let us assume that is exponential:
These plots (shown in Figure 5, 6, 7 and 8) demonstrate clearly, as publisher goodwill value is modeled as a time dependent evolution, the influence of the journal grows at a faster pace in the longer run. Therefore, it complements our observation that, publisher goodwill value has a small role to play in the growth of journal influence in short time span but evolves gradually as time elapses.
6.3 Temporal evolution of Editorial reputation
We observe the celebrity effect here (Fei . (2015)). Editors are well established scholars and by the time they assumed editorial responsibility, they are in the ”cool off state” implying the surge in reputation they experienced when they were rising stars stabilized. Therefore, steep gradient shall no longer be expected. This is what we observe in Fig 9 where the editorial influence between 2004 and 2014 is plotted. Please note ASCOM was founded in 2013. The influence trend of all the editors during that time (2010-14) is approximately constant.
Next section will deliberate on the contributions of these variables, in particular and model modification, in general on the rate of change in influence observed in ASCOM. The role of control variables are evident in the visualization we present below.
We develop a model to study its effect on astronomy and computer science domains and analyze parameters that have contributed in building the reputation of ASCOM. In this specific case study of journal influence, the spread is clearly dependent on present as well as history dependent functions. This strengthens the motivation of using DDE model for the study. The model explains the growth pattern of the journal well by capturing the intrinsic attributes and historical data. The time reversed model works as a mirror and helps carry over the good deeds of the past (quality of articles in niche areas and open problems solved by interdisciplinary efforts reflected in citation history). Our model takes care of the “hereditary effects” and since the phenomenon of observing a journal in an emerging and interdisciplinary area is modeled as a function of spatial variables renders the system infinite degrees of freedom. Thus, the proposed model is robust and provides better control over the system.
However, the data is limited since the journal is in publication for just over five years. Therefore the influence of historical data does not translate to overwhelming quantitative evidence in the way we liked it to. Nonetheless, if we extrapolate the interval by extending the time window of consideration (since the historical data is assumed to influence the present one), we observe profound effects (Please see the discussion on temporal evolution of publisher goodwill value where the observed linear growth in goodwill is really a 5-year snapshot subset of the longer window; (please see figs. 5(a) and (b) and the discussion in section 6.2). Additionally, we considered implicit control variables such as Editorial reputation and Publisher goodwill which play important roles in the growth of any journal. These variables pose challenges to the modeling set up and without those, the scope is limited to empirical verification at minor scale. We observe,
The graphs for the equations for vs time where axis represents and axis is time has a parabolic shape. This shape is due to the presence of the symmetric history function .
From the results of this study, it can be established that the journal citations vary in a non-linear fashion. Initially, the citation score is usually less as the journal will have less popularity. This can be seen when we do a comparison of the citations of the editor v/s time and the influence of the journal v/s time.
Fig 9. shows that the rate of change in influence is more or less constant over time. There is an initial irregularity as the initial change in influence is directly related to the current influence. But with time, the graph smooths out because, the other parameters such as citations and readership of the journal also begin to affect the rate of change of influence. This is a testimony of consistent and largely positive rate of change in influence. This hypothesis is verified by the graphs discussed below. First, we plot the the editorial influence with time without considering the publishers goodwill (as time series data) and editorial influence. We see that the curve is a simple parabola. This confirms our assumption that the influence has a global minima and it stays upward elsewhere.
After adding publishers Goodwill which we assumed to be an exponential distribution, the plot shows a small change in shape. Since publisher’s goodwill is a constant followed by a -ve exponential function, it can be conveniently added to, as those are also exponential. This shows that the publishers goodwill plays a small but non-negligible role in governing the journals growth in influence. However, in line with our argument presented in section 6, the small change will translate to larger increment if the time window is expanded i.e. the goodwill will have a larger contribution with the elapse of time. On plotting the curve for influence VS time after considering different Editorial influence functions (i.e. (t)) along with the publishers goodwill, we obtain the following graphs (fig. 10.(a), (b) and (c)).
Figure 10: Journal Influence v/s for different variations in
Note, when . If this condition is not satisfied the values of the influence becomes negative for initial values of time which is not possible. The model ensures that. When the editorial influence is exponential it gets added to the positive part of the delay differential equation.
We simulated/computed goodwill value, for every year. The value changes, as expected and therefore, can now be interpreted as time series data, fitted as a function of time and used in the modified equation as a time-dependent function. With this modification, we resolve the DDE equation and noted changes in the trend. The gradient (rate of change in influence) is greater as compared to the earlier model where is considered as a function of percentage of accepted articles!
The magnitudes of the constants and determine if the recent or the historic part of the equation dominates the curve.
Effect of implicit control variables: Editorial reputation and publisher goodwill value are indeed positive factors. When the figures are compared, (figs. 2, 6, 7 and 10), we observe a non-zero start-up boost in figs. 6, 7, 8 and 10. On the contrary, when we investigated the original DDE (fig. 2) without these implicit variables, influence evolution begins from 0. This non-negative boost (graphs of figs. 8, 9 and 10 start from some positive value above 0) is due to influence of the control variables, which are otherwise hard to quantify and model. Therefore, our arguments in favor of including these variables in the influence model stand vindicated.
model suitably explains the growth pattern of the journal by capturing the intrinsic attributes and historical data. We observe effects of celebrity authorship in the role of editors contribute to the growth in influence of ASCOM s well as the goodwill established publishing house brings. These effects, dynamic in nature, haven’t been studied before. The contribution of the manuscript is therefore two-fold. Firstly, a novel model of DDE is exploited to study the influence of a journal in an emerging area. Secondly, qualitative and dynamic control variables (Editorial reputation and publisher goodwill value), hitherto unexploited, for the simple reason of complexity have been quantified and integrated in to the model. The time reversed model works as a mirror and helps to carry over the achievements of the past (quality of articles in niche areas and open problems solved by interdisciplinary efforts). As a final note, it might be useful to remind that our model takes care of the “hereditary effects” by exploiting the function, . The phenomenon of observing a journal in an emerging and interdisciplinary area is modeled as a function of spatial variables renders the system infinite degrees of freedom. Therefore, the proposed model is robust and it provides better control over the system. Note however, that the data is limited given the low age of the journal. Therefore the influence of historical data does not translate into overwhelming quantitative evidence. The model also holds promise because of its control structure and ability to accommodate implicit control variables such as Editorial reputation and publisher goodwill value have been found to generate significant implications, overall. We establish a fact, no less remarkable, that the implicit control variables act as incentives to the new journal in an emerging area. This is a much needed boost that the journal enjoys, in the absence of which, it may have to struggle much harder to attract readership and scholarship! We conclude by stating that unlike most of the scholarly work in scientometric literature, which are post-facto statistical studies, our work is focused on investigating the background responsible for certain trends observed in a journal in niche area. This is where the manuscript is markedly different!
We would like to thank the Science and Engineering Research Board (SERB)-Department of Science and Technology (DST), Government of of India for supporting our research. The project reference number is: SERB-EMR/2016/005687.
- Aedula () gitAedula, R. . [2018-04-24]https://github.com/rahul-aedula95/L1_Norm
- Aedula (2018) GithubAedula, R. 2018. Github Repository. Github repository.
- Aedula . (2018) l1SVDAedula, R., Yashasvi Madhukumar, Snehanshu Saha, Mathur, A., Kakoli Bora Surbhi Agrawal. 2018. L1 Norm SVD based Ranking Scheme: A Novel Method in Big Data Mining L1 norm svd based ranking scheme: A novel method in big data mining. –. 10.13140/rg.2.2.23721.29280
- Bach . (2012) Bach2012Bach, F., Jenatton, R., Mairal, J. Obozinski, G. 2012nov. Structured Sparsity through Convex Optimization Structured sparsity through convex optimization. Statistical Science274450–468. https://doi.org/10.1214/12-sts394 10.1214/12-sts394
- Bora . (2016) Bora2016Bora, K., Saha, S., Agrawal, S., Safonova, M., Routh, S. Narasimhamurthy, A. 201610. CD-HPF: New habitability score via data analytic modeling CD-HPF: New habitability score via data analytic modeling. Astronomy and Computing17129-143. 10.1016/j.ascom.2016.08.001
- Engemann Wall (2009) KristieEngemann, KM. Wall, HJ. 2009. A journal ranking for the ambitious economist A journal ranking for the ambitious economist. ReviewMay127-140. https://ideas.repec.org/a/fip/fedlrv/y2009imayp127-140nv.91no.3.html
- Erfanmanesh (2017) AminErfanmanesh, A. 2017June. Composition of Journals’ Editorial Board Members as an indicator of the Interdisciplinarity: The Case of Iranian Journals in Social Sciences and Humanities Composition of journals’ editorial board members as an indicator of the interdisciplinarity: The case of iranian journals in social sciences and humanities. Library and Information Science19181-106.
- Fei . (2015) articleFeiFei, Q., Chong, HG. Bell, R. 201503. The diminishing influence of celebrity authors in a diversified world of accounting journals The diminishing influence of celebrity authors in a diversified world of accounting journals. 1537-57.
- Ginde . (2017) Apache2017Ginde, G., Aedula, R., Saha, S., Mathur, A., Dey, SR., Sampatrao, S. Sagar, BS. 2017. Big Data Acquisition, Preparation and Analysis using Apache Software Foundation Projects Big data acquisition, preparation and analysis using apache software foundation projects. AKS. Ganesh Chandra Deka (), Big data analytics tools and technology for effective planning Big data analytics tools and technology for effective planning ( 9). CRC Press.
- Ginde . (2016) Ginde2016Ginde, G., Saha, S., Mathur, A., Venkatagiri, S., Vadakkepat, S., Narasimhamurthy, A. Daya Sagar, BS. 2016Sep01. ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms Scientobase: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms. Scientometrics10831479–1529. https://doi.org/10.1007/s11192-016-2006-2 10.1007/s11192-016-2006-2
- González-Pereira . (2009) DBLP_sjrGonzález-Pereira, B., Bote, VPG. de Moya Anegón, F. 2009. The SJR indicator: A new indicator of journals’ scientific prestige The SJR indicator: A new indicator of journals’ scientific prestige. CoRRabs/0912.4141. http://arxiv.org/abs/0912.4141
- Jacobs Rebecca. (2012) Jacob2012Jacobs, A. Rebecca., H. 2012August. Interdisciplinarity in Recently Founded Academic Journals Interdisciplinarity in recently founded academic journals. Scholarly Common Repository.
- Jangid . (2014) Jangid2014Jangid, N., Saha, S., Gupta, S. Rao, JM. 2014. Ranking of Journals in Science and Technology Domain: A Novel and Computationally Lightweight Approach Ranking of journals in science and technology domain: A novel and computationally lightweight approach. IERI Procedia1057 - 62. http://www.sciencedirect.com/science/article/pii/S2212667814001397 International Conference on Future Information Engineering (FIE 2014) https://doi.org/10.1016/j.ieri.2014.09.091
- Kalman (1996) Kalman96Kalman, D. 1996. A singularly valuable decomposition: The SVD of a matrix A singularly valuable decomposition: The svd of a matrix. College Math Journal272–23.
- Kianifar . (2014) Kianifar2014Kianifar, H., Sadeghi, R. Zarifmahmoudi, L. 2014. Comparison Between Impact Factor, Eigenfactor Metrics, and SCimago Journal Rank Indicator of Pediatric Neurology Journals Comparison between impact factor, eigenfactor metrics, and SCimago journal rank indicator of pediatric neurology journals. Acta Informatica Medica222103. https://doi.org/10.5455/aim.2014.22.103-106 10.5455/aim.2014.22.103-106
- Manyika . () bigdataManyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. Byers, AH. . Recursive object model (ROM)-Modelling of linguistic information in engineering design. Recursive object model (ROM)-modelling of linguistic information in engineering design. [2018-04-24]https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
The present state analysis of emerging interdisciplinary research areas: An analysis based on a classified table of Grant-in-Aid for Scientific Research The present state analysis of emerging interdisciplinary research areas: An analysis based on a classified table of grant-in-aid for scientific research.Joho Chishiki Gakkaishi2711–5. https://doi.org/10.2964/jsik_2017_007 10.2964/jsik˙2017˙007
- Rodríguez (2016) Rodrguez2016Rodríguez, JM. 2016nov. Disciplinarity and interdisciplinarity in citation and reference dimensions: knowledge importation and exportation taxonomy of journals Disciplinarity and interdisciplinarity in citation and reference dimensions: knowledge importation and exportation taxonomy of journals. Scientometrics1102617–642. https://doi.org/10.1007/s11192-016-2190-0 10.1007/s11192-016-2190-0
- Saha (2011) diff_equSaha, S. 2011. Ordinary Differential Equations: A Structured Approach Ordinary differential equations: A structured approach. Cognella. https://www.amazon.com/Ordinary-Differential-Equations-Structured-Approach/dp/160927704X?SubscriptionId=0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=160927704X
Appendix A First Appendix
a.1 Singular Value Decomposition
Singular Value Decomposition is the factorization of a real or complex matrix. Large scale of Scientometric data is mined using suitable web scraping techniques and is modeled as a matrix in which the rows represent the articles in a journal published over the years, and the columns represent various Scientometrics or indicators proposed by experts of evaluation agencies (Kalman (1996)). The original data matrix, say A of dimension mxn and rank k is factorized into three unique matrices U, V and WH.
- Matrix of Left Singular Vectors of dimensionmxr
V - Diagonal matrix of dimension rxr containing singular values in decreasing order along the diagonal
WH - Matrix of Right Singular Vectors of dimension nxr. The Hermitian, or the conjugate transpose of W is taken, changing its dimension to rxn and hence the original dimension of the matrix is maintained after the matrix multiplication. In this case of Scientometrics, since the data is represented as a real matrix, Hermitian transpose is simply the transpose of W.
r is a very small number numerically representing the approximate rank of the matrix or the number of ”concepts” in the data matrix A. Concepts refer to latent dimensions or latent factors showing the association between the singular values and individual components (Kalman (1996)). The choice of r plays a vital role in deciding the accuracy and computation time of the decomposition. If r is equal to k, then the SVD is said to be a Full Rank Decomposition of A. Truncated SVD or Reduced Rank Approximation of A is obtained by setting all but the first r largest singular values equal to zero and using the first r columns of U and W (Ginde . (2017)).
Therefore, choosing a higher value of r closer to k would give a more accurate approximation whereas a lower value would save a lot of computation time and increase efficiency.
a.2 Regularization Norms
In the case of Big Data, parsimony is central to variable and feature selection, which makes the data model more intelligible and less expensive in terms of processing.
l-norm of a matrix or vector x, represented as x is defined as, x = i.e the pth root of summation of all the elements raised to the power p. Hence, by definition, l norm = x =
Sparse approximation, inducing structural sparsity as well as regularization is achieved by a number of norms, the most common ones being l norm and the mixed group l-l norm. The relative structure and position of the variable in the input vector, and hence the inter-relationship between the variables is inconsequential as a variable is chosen individually in l regularization. Prior knowledge aids in improving the efficacy of estimation through these techniques.
The l norm concurs to only the cardinality constraint and is unaware to any other information available about the patterns of non-zero coefficients.(Bach . (2012))
a.3 Sparsity via the l norm
Most variable or feature selection problems are presented as combinatorial optimization problems. Such problems focus on selecting the optimal solution through a discrete, finite set of feasible solutions. Additionally,l norm turns these problems to convex problems after dropping certain constraints from the overall optimization problem. This is known as convex relaxation. Convex problems classify as the class of problems in which the constraints are convex functions and the objective function is convex if minimizing, or concave if maximizing.
regularization for sparsity through supervised learning involves predicting a vectory from a set of usually reduced values/observations consisting a vector in the original data matrix x. This mapping function is often known as the hypothesis h : xy
. To achieve this, we assume there exists a joint probability distribution P(x,y) overx and y which helps us model anomalies like noise in the predictions.
In addition to this, another function known as a loss function L(y’,y) is required to measure the difference in the prediction y’=h(x) from the true result y. Consider the resulting vectors consisting of the predicted value and the true value to bey’ and y respectively. A characteristic called Risk, R(h) associated with loss function, and hence in turn with the hypothesis-h(x) is defined as the expectation of the loss function.
Thus, the hypothesis chosen for mapping should be such that the risk, R(h) is minimum. This refers to as risk minimization. However, in usual cases, the joint probability distribution of the problem in hand, P(x,y) is not known. So, an approximation called empirical risk is computed by taking the average of the loss function of all the observations. Empirical Risk is given by :
The empirical risk minimization principle states that the hypothesis(h’) selected must be such it that reduces the empirical risk :
While mapping observations x in n dimensional vector x to outputs y in vector y, we consider p pairs of data points - where i = 1,2…p. Thus the optimization problem for the data matrix in Scientometrics takes the form:
L is a loss function which can either be square loss for least squares regression, , or a logistic loss function. Now, the problem thus takes the form:
Since the variables in the vector space/groups can overlap, it is ideal to choose to be a group norm for better predictive performance and structure. The m rows of data matrix A are treated as vectors or groups(g) of these variables, forming a partition equal to the vector dimension, [1:n]. If G is the set of all these groups and d is a scalar weight indexed by each group g, the norm is said be a l-l norm where (Bach . (2012)),
The choice of the indexed weight d is critical because it is responsible for the discrepancies of sizes between the groups. It must also compensate for the possible penalization of parameters which can increase due to high-dimensional scaling. The factors that affect the selection are the choice of q in the group norm and the consistency that is expected of the result. In addition to this, accuracy and efficiency can be enhanced by weighing each coefficient in a group rather than weighing the entire group as a whole. The initial sparse data matrix is first manipulated using the l-norm (Bach . (2012)).
An estimate of a journal’s scholastic indices is necessary to judge its effective impact. The nuances of scientometric factors such as Total Citation Count and Self-citation Count come into play when deciding the impact of a journal. However, these factors unless considered in ideal circumstances don’t by themselves become a good indicator to represent the importance of a journal. Many anomalies arise when considering these indices directly which may misrepresent or falsify a journal’s true influence. The necessity to use these indices in context with a ranking algorithm is imperative to better utilize these indices. The resulting transformation of -norms gives rise to a row matrix which is of the length equal to the number of features of the pristine Scientometric data. This row matrix effectively represents the entire dataset at any given iteration. The application of the Singular Value Decomposition operation on this row matrix is key in determining the necessary norm values to remove through a recursive approach.
The array contains the Normalized Singular Values of all the individual -norm transformed columns. These values act as scores while addressing the impact of any given journal. In the context of Singular Values the one with the lowest score is the most influential journal.
Appendix B Lemma : The equation , where and has a unique solution in .
Proof: Let us consider two solutions to this equation, , . Then,
Therefore, the DDE has a unique solution, which implies that if there exist a oscillatory solution, there can not be an exponential or linear family of solutions depending on the parameters.
Appendix C DDE solution
Let us consider a non-linear homogeneous DDE: The solution for this equation depends on the value of . Auxiliary equation is: