Software Citation in Theory and Practice

07/21/2018
by   Daniel S Katz, et al.
IEEE
0

In most fields, computational models and data analysis have become a significant part of how research is performed, in addition to the more traditional theory and experiment. Mathematics is no exception to this trend. While the system of publication and credit for theory and experiment (journals and books, often monographs) has developed and has become an expected part of the culture, how research is shared and how candidates for hiring, promotion are evaluated, software (and data) do not have the same history. A group working as part of the FORCE11 community developed a set of principles for software citation that fit software into the journal citation system, allow software to be published and then cited, and there are now over 50,000 DOIs that have been issued for software. However, some challenges remain, including: promoting the idea of software citation to developers and users; collaborating with publishers to ensure that systems collect and retain required metadata; ensuring that the rest of the scholarly infrastructure, particularly indexing sites, include software; working with communities so that software efforts "count" and understanding how best to cite software that has not been published.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/21/2019

Software Citation Implementation Challenges

The main output of the FORCE11 Software Citation working group (https://...
03/11/2021

Research Software Sustainability and Citation

Software citation contributes to achieving software sustainability in tw...
06/14/2019

Software and Dependencies in Research Citation Graphs

Following the widespread digitalization of scholarship, software has bec...
06/14/2019

Software and their Dependencies in Research Citation Graphs

Software is essential for a lot of research, but it is not featured in c...
01/22/2022

Software publications with rich metadata: state of the art, automated workflows and HERMES concept

To satisfy the principles of FAIR software, software sustainability and ...
11/24/2021

Citation method, please? A case study in astrophysics

Software citation has accelerated in astrophysics in the past decade, re...
11/27/2018

Challenges of measuring the impact of software: an examination of the lme4 R package

The rise of software as a research object is mirrored in the increasing ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In most fields, computational models and data analysis have become a significant part of how research is performed, in addition to the more traditional theory and experiment. Evidence of the increased role and importance of software in today’s research can be found in surveys and in papers, and while neither of these are specific to mathematics, it is likely no exception.

Two recent surveys, one of UK academics at Russell Group Universities [9, 10], and one of members of (US) National Postdoctoral Research Association [15, 14] asked researchers asked how important software is to them, and found that 67% / 63% (UK/US respectively) of respondents said, “my research would not be possible without software.” 21% / 31% said, “my research would be possible but harder,” while just 10% / 6% said, “it would make no difference.” A similar survey of mathematicians would be welcome.

One of the authors of this paper scanned six months of Science in mid-2013, and found that about half the papers were software-intensive projects, and most of the other papers also relied on some software. A formal study of 90 randomly selected papers in the biology literature in 2015 found that 80% mentioned software, and that those articles mentioned an average of 4.85 software packages [11]. A more recent study of Nature in Jan–Mar 2017 found software mentioned in 32 of 40 research articles, with an average of 6.5 software packages mentioned per article [16]

. A similar study could be done of the mathematics literature. And while these studies have been manually performed by humans, natural language processing and machine learning could be used to expand their reach.

The system of publication and credit for theory and experiment (journals and books, often monographs) has developed and has become an expected part of the culture, how research is shared and how candidates for hiring, promotion are evaluated; software (and data) do not have the same history. In order to cite software, we could overload the current citation system to add software or alternatively, we could develop a new citation system that works for all kinds of products. As developing a new citation system would be very difficult, current efforts related to software citation have focused on the overloading approach.

2 Software Citation Principles

FORCE11111https://www.force11.org is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. In 2015 and 2016, a FORCE11 Software Citation working group developed a set of software citation principles [19]. The group grew to about 60 members, including researchers, developers, publishers, repository developer and maintainers, and librarians.

The group worked on GitHub222https://github.com/force11/force11-scwg and on the FORCE11 web site333https://www.force11.org/group/software-citation-working-group. It reviewed existing community practices and developed a set of use cases for software citation, and then drafted a software citation principles document. To do this, the group started with previously published data citation principles [5], updated them based on software use cases and related work, and further updated them based on working group discussions. This draft was then subjected to community feedback and review through a variety of channels, including a workshop at FORCE2016 in April 2016. In late 2016, the paper and its reviews were published [19]. The paper includes a set of six principles (general statements), use cases (where the principles should apply), and discussion (suggestions on how to apply the principles).

The software citation principles, quoting from [19], are:

  1. Importance. Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.

  2. Credit and Attribution. Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.

  3. Unique Identification. A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.

  4. Persistence. Unique identifiers and metadata describing the software and its disposition should persist – even beyond the lifespan of the software they describe.

  5. Accessibility. Software citations should facilitate access to the software itself and to its associated metadata, documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.

  6. Specificity. Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

There are now over 50,000 DOIs that have been issued for software, and more than 60% of them have been issued since the FORCE11 group published the first preprint of the principles paper [20].

3 Practices and Examples

In practice, the adoption of software citation depends on developing community guidelines that implement the software citation principles within the context of existing community scholarly communication and software development norms.

For some commonly used commercial software, there are mandatory citations, e.g. as specified by SAS [17] or Matlab [4]. In other cases, authors of research software may provide a recommended general citation referring to suite of related software, e.g. the HSL Mathematical Software Library [18]. However, in many of these cases, the citations do not provide enough information to allow crediting of the software authors (Principle 2), a machine actionable unique identifier (Principle 3) and persistent identifiers and metadata (Principle 4) or – in the case of HSL – an understanding of which version of the software was used (Principle 6).

Examples of mandatory and general software citations that do not fully implement the Software Citation Principles: The output for this paper was generated using SAS/STAT software, Version 14.1 of the SAS System for Unix. Copyright ©2018 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA. MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States. HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk/

Some software frameworks and platforms provide clear guidance on how to support particular versions or a specific citation for a package (Principle 6), e.g., by using the citation() function for R packages or the instructions for citing the GAP system for computational discrete algebra [23]. However these still do not provide persistent, machine actionable identifiers.

Examples of citations of specific packages as recommended by the software platform they are distributed with that mostly implement the principles:

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K. (2018). cluster: Cluster Analysis Basics and Extensions. R package version 2.0.7-1.

Emma J. Moore, Christopher D. Wensley, groupoids - a GAP package, 1.54, 29/11/2017, https://gap-packages.github.io/groupoids/

However most software used in research does not provide guidance on how to cite it properly. If the software’s website, or a CITATION file or README file with the source code, specifies how to cite the software, the author should use this information; this might be a reference to a software paper, or other publication. If the source code includes a codemeta.json [12] or Citation File Format (CFF) [7] file, the metadata in these files can be used with appropriate tooling to generate a citation automatically. Otherwise, the following guidance will help to construct a citation that implements the principles:

  • For the authors, try to include all contributors to the software or, if this is not clear, name the project as the author. This may encourage some projects to make citation metadata available, including listing the authors.

  • Include the name of the software, along with specific version/release information.

  • Try to include a method for identification that is machine actionable, globally unique and interoperable. This ideally is a DOI but if there is no DOI, a URL pointing to a specific release might be the next best option.

  • If there is a landing page that includes metadata, point to that, not directly to the software. Where you have the choice of pointing to a URL for general landing page including metadata, versus a specific URL (e.g. to a tag of a version) which does not contain sufficient metadata it is preferred to use the URL for the general landing page as the identifier, and clearly state the version.

Examples of citations for software using the suggested guidelines: Voevodsky, Vladimir and Ahrens, Benedikt and Grayson, Daniel and others. UniMath — a computer-checked library of univalent mathematics. https://github.com/UniMath/UniMath [accessed 2018-04-27] Eigen Project. (2017). Eigen [software] version 3.3.4 Available from https://bitbucket.org/eigen/eigen/ [accessed 2018-04-27]

For developers of a piece of software, there are several things that can be done to make it easier for others to cite the software. At a minimum, the code should be published using a clear version number and license. If the code is in GitHub, the developer can make it easily citable using Github’s integration with Zenodo [8]. Alternatively, the developer can manually deposit it in a digital repository such as Zenodo or Figshare – supplying metadata including the authors, title and version – and being provided with a Digital Object Identifier (DOI) and often a recommended citation that adheres to the Software Citation Principles. This information can be used to insert the citation that others should use into the software documentation, preferably as a CITATION file.

Example of a citation generated by Zenodo that implements the principles: Vince Knight, & Ria Baldevia. (2018, January 31). drvinceknight/Nashpy: v0.0.13 (Version v0.0.13). Zenodo. http://doi.org/10.5281/zenodo.1163694

Of course, the fact that swMath [21] exists means that citation should be integrated with it, providing suggested citations for software in it, and using it to track and understand citations of math software.

4 Challenges

In May 2017, the FORCE11 Software Citation Working Group ended, and a new Software Citation Implementation Working Group444https://www.force11.org/group/software-citation-implementation-working-group started. This group has the goal of moving the software citation principles to implementation. Those interested in following the new group can join it.

Many challenges remain, including:

  • Encouraging citation of software by authors. Data citation is still not commonplace in many disciplines, let alone software citation. Author guidance for software citation is varied in the mathematical sciences. Both the Journal of Mathematical and Computer Simulation [22] and Journal of Statistical Software [13] provide guidance that follows the Software Citation Principles, but others - including the International Congress on Mathematical Software - do not. This will require the community to work with journals, conferences, and publishers to implement the Software Citation Principles in a way that they can be adopted by researchers in the area, similar to efforts in astronomy [2]. Tools such as CiteAs [1] may also help.

  • Promoting the idea of software citation to developers. The benefits of making software more easily citable are not always obvious. The time taken to submit metadata can be reduced by the use of formats such as CodeMeta [12] and Citation File Format [7], particularly as they are adopted by repositories [3] and citation tools.

  • Citing unpublished software. When authors do not publish their software, there is no archival link a citer can point to. The in-progress work to build a software archive for all source code by Software Heritage [6] may solve this problem.

  • Ensuring quality of information. Even when information is provided, it may be discarded in the publication process. Collaboration with publishers, funders, and the identifier and citation infrastructure will be required to ensure that systems collect and retain required metadata, making it easier to discover and reuse software.

  • Giving credit for software through citation. Ultimately, software citation will become widely practiced when the rest of the scholarly infrastructure, particularly indexing sites, includes software, and research communities recognize the value of software as a research output, thus providing an incentive for developers and authors to publish and reuse research software.

5 Conclusions

Although software citation is currently not standardized nor widely practiced, the publication of the Software Citation Principles has acted as a foundation on which to build community guidelines and improved tooling and infrastructure to support citation. The FORCE11 Software Citation Implementation Working Group is taking forward work to address the challenges standing in the way of software citation, and looks to the mathematical sciences community to work towards implementing the principles in the future.

References