Best Practices for Replicability, Reproducibility and Reusability of Computer-Based Experiments Exemplified by Model Reduction Software

07/05/2016 ∙ by Jörg Fehr, et al. ∙ University of Stuttgart Max Planck Society University of Münster 0

Over the recent years the importance of numerical experiments has gradually been more recognized. Nonetheless, sufficient documentation of how computational results have been obtained is often not available. Especially in the scientific computing and applied mathematics domain this is crucial, since numerical experiments are usually employed to verify the proposed hypothesis in a publication. This work aims to propose standards and best practices for the setup and publication of numerical experiments. Naturally, this amounts to a guideline for development, maintenance, and publication of numerical research software. Such a primer will enable the replicability and reproducibility of computer-based experiments and published results and also promote the reusability of the associated software.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In a publication in the fields of applied mathematics, numerical analysis, and scientific computing, a Computer-Based Experiment (CBEx) or its results can be of different value. If a work contains strong and generally valid analytical findings, a CBEx may not be needed or is just used to affirm a valid fact by some concrete numerical results. On the other hand, if the considered problem is very complex or very specific, a practical example might be necessary to justify a possibly wild combination of analytical estimates, intuitive assumptions, or heuristics. In the extreme case, there might be no analytical reasoning at all and the whole research contribution bases on CBEx.

One may well say, that with increasing complexity of the considered problems and with increasing computation capabilities, both the need for and the opportunity to provide a valid CBEx to a scientific work has grown.

Exemplarily, this general observation can be illustrated by comparing three papers from 1971, 1986, and 2010, which introduced nowadays commonly applied numerical methods. In Nitsche’s 1971 paper [37] on a new variational approach to elliptic PDEs with non-homogeneous Dirichlet conditions, there is no numerical experiment reported. Then, in the important paper [44] by Saad and Schultz on the GMRES algorithm from 1986, two out of 14 pages are devoted to numerical experiments. Finally, the paper [6] on DEIM by Chataranbutat and Sorensen in 2010, consists of more than 30% of numerical examples or reasonings based on numerical experiments.

Summing up, we assess that the value of a CBEx has risen significantly in comparison to analytical results over the last decades. However, the high standards on analytical findings, namely the requirement of a concise and comprehensible and traceable derivation and documentation, seems not equally adapted to numerical experiments and results in the scientific literature, cf. LeVeque’s article on Top Ten Reasons to Not Share Your Code (and why you should anyway) [29].

With the ever growing sophistication of the numerical simulations, a CBEx in the field of mathematics has more and more changed its nature. From a rather deterministic mathematical exercise on a computer (which is still remembered in the terms numerics and to some extend in numerical referring more to numerology than to floating point operations) towards a scientific experiment with inevitable uncertainties coming, e.g. from rounding errors or changing software and hardware environments. Thus, a CBEx should be seen in analogy with experiments from natural sciences with the numerical result corresponding to the observation of the experiment and with the hard- and software corresponding to the methods that were used to obtain the observations like the experimental setup, the design of the tests, the used statistics, or the choice of the samples.

Once an experiment has been established, the question of reproducibility arises, since only an experiment and its obtained observations which can be reproduced, is seen to give valid and reliable insights that can serve as the base for further research. This principle seems broadly accepted since long, and it has found its formulation in Popper’s work Logik der Wissenschaften from 1935, later translated into English, with the formulation “I only demand that every such statement must be capable of being tested; or in other words, I refuse to accept the view that there are statements in science which we have, resignedly, to accept as true merely because it does not seem possible, for logical reasons, to test them.”, cf. [41, Ch. 1.8]. Note that the demand of testability of the hypothesis does not include a truth value as it is implicated by the reproducibility of an experiment. However, as Popper states, an unreproducible singular discovery would not be published by a researcher, since “the ‘discovery’ would be only too soon rejected as chimerical, simply because attempts to test it would lead to negative results.”, cf. [41, Ch. 1.8]

Reproducibility is commonly accepted as a necessary condition for good scientific practice, and it’s absence in some prominent works but also in a statistically significant number of journal publications that has been detected in recent years in, e.g., medicine [10], psychology [22], and computer science [7] has shaped the term of the reproducibility crisis that has been broadly covered in scientific, public, and social media111newyorker.com/tech/elements/the-crisis-in-social-psychology-that-isnt 222bjoern.brembs.net/2016/02/earning-credibility-in-post-factual-science.

The general concept of reproducibility has been taken up in computer-based research in the 90s [5] and adapted to the comparatively deterministic nature of software and its ability to easily enable the “open exchange of data, procedures and materials”, as it was phrased in a code of ethics and values of the American Physical Society333aps.org/policy/statements/99_6.cfm. In this time, the term reproducible research [12] was shaped and often referred to computational environments that allowed for simply transferring to and rerunning the experiments on different computers; see [30] for an example in the field of archeology and for references.

It is also in the nature of software that it can be duplicated and dissected so that not only the results but also parts of the methods itself can serve as the base of new experiments, which is meant by reusability.

In this work, we adapt notions related to Replicability, Reproducibility and Reusability (RRR) as they are relevant for CBEx from first principles. We describe conditions for their implementation in research and publications that are general enough to meet particular needs of projects as well as habits of the researchers. To find the balance between a reliable framework and openness towards common practices, we add sections with concrete suggestions – a best practice guide.

In this contribution the details on code and data layout or licensing and associated copyright issues are not covered; work on these topics can be found for example in [47] and [49] respectively. Also, for completeness we mention that our work is about the way how CBEx are conducted and documented. Hence, the principles considered here are to be distinguished from approaches that try to validate numerical results like the notion of Verification and Validation444sciencenode.org/feature/why-should-i-believe-your-hpc-research-.php.

Overall, this work aims to: Make CBEx replicable in its basic definition and use the potential of software to enable easy reproducibility and even reusability.

1.1 Prior Work and State of the Discourse

The discrepancy between the potential of CBEx to be easily made RRR and the widespread lack of RRR in CBEx in the scientific literature has stimulated various initiatives and theoretical work on the implementation of RRR principles in scientific computing. We list but a few of the most recent publications:

The discussion on opening scientific source codes has been more noticeable in the recent years. For example in Nature, arguments against open source are refuted [3], more accurate results are predicted [33], partial opened codes are discussed [18], and a code availability section is suggested [35, 36]. In Science not only the opening and review of research codes is discussed [24, 45, 23] but it is required by the editorial policies that: “All computer codes involved in the creation or analysis of data must also be available to any reader of Science”. Also mathematical organizations are discussing open scientific codes, examples are AMS on the maintainability and necessity of open code accompanying publications [25], ACM on advantages and disadvantages of releasing the scientific codes [32], and SIAM on a publication of codes by default and attributable credit [2].

Several publications describe abstract software engineering and collaborative development techniques. In [27] basic practices for scientific software development are distilled, while in [15] software management principles are explained. A set of rules, devised in [42], is concerned with the code development but also the user-developer interaction. And the best practices in [52] summarize code development fundamentals. General recommendations for reproducibility for CBEx are also given in [1]. Furthermore, the practical reproduction of research results themselves is discussed as in [34].

Lastly, we note that various initiatives have been started to promote certain standards in CBEx. Foremost, the Science Code Manifesto555sciencecodemanifesto.org states five principles (Code, Copyright, Citation, Credit, Curation) for the handling of research software to improve its use in science. The Recomputation Manifesto666recomputation.org [13] also formulates rules to facilitate the repeatable realization of CBEx.

1.2 Outline

This introductory discussion is followed by a more refined analysis of replicability, reproducibility and reusability in Section 2. In Section 3 a technique to document code availability is described. Section 4 summarizes high-level considerations to facilitate RRR, while a minimal documentation for scientific codes and research software is proposed in Section 5. Finally, a sample software project is presented to illustrate the practical implementation of the herein suggested best practices.

2 The Three “R”s of Open Science

In this section, taking up the ideas of [51], we give a definition of the frequently used terms Replicability, Reproducibility, and Reusability and discuss how these basic scientific principles apply for assessing scientific software.

The distinct notions of Replicability and Reproducibility are used to qualify research in all fields of science in which experiments play a role, cf., e.g. [50] with a background in biology, [38] from psychology, or [8, 12] focusing on scientific computing.

In short, replicability refers to a repetition of the experiment with the same results by the same observers in the same environment; reproducibility refers to an independent repetition of the experiment and its outcomes in different circumstances.

Reproducibility points to a certain reliability of both the findings of the experiment and the procedure that was used to obtain the results [28]. Once reliability of a method is established, one can address reusability as the property that enables the use of the method for different setups and different purposes.

Note that these characteristics should be considered nested, which means reproducibility implies replicability and reusability require reproducibility.

In what follows, we extend, specify, and adapt these general notions to the case of scientific software and numerical simulations.

2.1 Replicability

The attribute Replicability describes the ability to repeat a CBEx and to come to the same (in a numerical sense) results. Sometimes the equivalent term Repeatability is used for this experimental property. Replicability requires some basic documentation on how to run the software (described in Section 4.5) to obtain replicable results.

Replicability, in turn, is a basic requirement of reliable software as well as of its result as it shows a certain robustness of the procedure against statistical influences and bias of the observer. Also, a replication can serve as a benchmark to which new methods can be compared as pointed out in [51].

2.2 Reproducibility

In its native definition, Reproducibility of a CBEx means that it can be repeated by a different researcher in a different computer environment. This can be assured, first, through a documentation that provides enough mathematical and technical detail to set up the CBEx that will provide comparable results, including the software implementation of algorithms; second, through the distribution of a software capable of producing the results on a large variety of machines, or third, any combination of these two extrema – sufficient documentation and available software. If the CBEx depends on hardware, e.g. if runtime is measured, then for reproducibility the hardware needs to be available or sufficiently well documented.

2.3 Reusability

In the sphere of CBEx, Reusability refers to the possibility to reuse the software or parts thereof for different purposes, in different environments, and by researchers other than the original authors. In particular, Reusability enables the utilization of the test setup or parts of it for other experiments or related applications. Although theoretically, any bit of a software can be reused for different purposes, here, Reusability applies only for reproducible parts, since a building block of a CBEx that does not define reproducible or even replicable outcomes cannot be reused for a replicable or reproducible CBEx.

3 Code Availability Section

Even though availability of the source code associated to a CBEx is not a requirement for replicability and reproducibility (see Section 4), it is essential to open the CBEx to peer scrutiny and highly recommended by the authors. The availability of the source code itself is necessary for reusability and unconditionally desirable for reproducibility. This section makes the case for a Code Availability Section as introduced by Nature [31, 35, 36]. Such a section should by default be included in any publication presenting numerical results like a “Materials and Methods” section in other sciences, and should state if the utilized code is available and if not for what reason, i.e. third-party licenses, non-disclosure agreements, trade secrets, or the thought of keeping competitive advantages.

Different code availability models exist, which will be listed and shortly commented in the following.

Open source code, published under a public license

Compare, e.g., the iterative rational Krylov algorithm (IRKA) example in Section 6

. This procedure is probably preferred by most scientists and for some people the only way to do proper science, compare, e.g.,

[18]. Referees and interested readers can check if the code fulfills the necessary requirements for reproducibility and they can modify and use the code for their own purpose. There are multiple possibilities how access to the code can be gained. Nowadays, a common and widely used procedure is the provisioning of source code via a publicly readable revision control repository located on a private server777e.g., gitlab.com or a third-party service provider888e.g., github.com, bitbucket.org. Alternatively, a download from a collection such as netlib999netlib.org can be provided. A shining example for best practice in the field of open source code in combination with reproducible experiments is the Image Processing On Line (IPol) Journal [19]. In this journal each article is supplemented with its source code, with an online demonstration facility and an archive of experiments. Furthermore, the text, as well as source code, are peer-reviewed.

Closed source, software available under a non-public license

This less desirable option gives readers and reviewers the opportunity to check, e.g. if the proposed numerical procedure / experiments work with their own data, given a license is available. Often, the source code is encoded or obfuscated to protect intellectual properties, which then allows a replication but not a comprehension of results. Matlab code, as an example of an interpreted language, can be encoded via the pcode command or compiled into a binary format. However, as stated since Matlab Version 2014b [17] “The pcode function obfuscates the code but does not encrypt it. While the content in a .p file is difficult to understand, it should not be considered secure.” For programs written in a compiled language, such as C++, only executables or runtime libraries are provided. Hence, for trust reasons it is important, that the software has a-priori passed through a strictly documented verification & validation procedure. By providing and hosting the source via a version control repository (see Section 4.6) it is possible to provide certain people, i.e. the reviewers, with access to the source code upon request. Alternatively, the source code may be provided directly to an eligible user via physical data volumes, or direct file transfers.

Software as a Service (SaaS)

The availability of web access to computer programs or computer resources is an emerging strategy. This approach can also be used to enable interested users or reviewers to use the developed software as a service, e.g. to test if the program runs with their own, respectively modified input data. Therefore, SaaS offers many advantages such as read without copying the source code restriction, time-limited access for users, third-party software dependencies can be resolved, new licensing schemes and so on. It should be noted that, while SaaS enables the use of a CBEx, it does not allow a dissection at a source code level.

Non-available code

The last and the most undesirable option is the non-availability option. The source code, computer program or required third party software is not available or purchasable to the interested reader. A review is hardly, possible and the proposed numerical scheme or ideas need to be written in great detail, so reproducibility of the work is possible in a different environment.

A sample Code Availability Section is enclosed in Figure 1. The linked source code archive should ideally be uniquely identified by a Digital Object Identifier101010doi.org (DOI) which can be obtained for software releases for example from Zenodo111111zenodo.org for scientific codes. Alternatively, the source code can be enclosed in the supplemental materials or deposited at some stable location.

Code Availability / Licensing Option

The source code of the implementations used to compute the presented results can be obtained from:

doi:XXXXXXX/XXXXXXXX and is authored by: XXXX, XXXX

Please contact XXXXX for licensing information

Figure 1: Sample Code Availability Section.

Even though a simple statement on the (non-)availability of the source code does neither improve the review process nor the reproducibility (in the sense of Section 2.2), it can at least facilitate replicability through its assurance by the authors. Furthermore, it could be noted if the referees had access to the implementation during the peer review process.

Moreover, due to the important role of computational results, not only in numerical analysis but also in many other sciences, this measure contributes to the basic idea of verifiability in science. If the source code is made available, as a part of the publication, on the one hand, effort invested into an openly available software implementation is made visible and, on the other hand, compels authors to comment on means of the experimental setup. Lastly, a mandatory code availability section raises awareness for RRR.

4 Code Guidelines

In this section, based on the previous definitions of replicability, reproducibility, and reusability, guidelines for the design, documentation, or publication of CBEx and research software are summarized. The foundation for these guidelines is the interrelation of RRR: reusability implies reproducibility which implies replicability; and are composed of mandatory requirements and optional recommendations. Requirements are limited to the minimal extent necessary while recommendations enable a practical and comfortable realization of the replication, reproduction, or reuse. The interdependence of the requirements and recommendations is to be understood as follows: A requirement for replicability is also a requirement for reproducibility and similarly, a requirement for reproducibility is also a requirement for reusability. The recommendations are optional but strongly encouraged, yet have no dependence on previous recommendations.

We will use the term “source code archive” to refer to the set of source code, build instructions (such as a makefile), configuration files and input data121212The source code archive may also include resulting data sets from the authors experiments.. For a summary of the following guidelines see Figure 2.

4.1 Replicability Requirement: Basic Documentation

A fundamental requirement for replicability is a basic documentation, which encompasses instructions on how to generate an executable binary program in case of a compiled language, and a description on how to run the program to obtain the results to be replicated (see also Section 5). This documentation is crucial to an experiment’s replication as it defines the technical implementation and ensures the practical repetition of the experiment.

Often, the numerically computed results are further processed to facilitate interpretation, for example by a visualization. A documentation of the evaluation of these results, descriptively or algorithmically, is needed to allow for replication, not only of the computational results, but also of their evaluation.

4.2 Replicability Recommendation: Automation and Testing

The automation of the experiment enables the easy and reliable check for replicability of a CBEx. This typically means that a single or multiple scripts automatically prepare and run the experiment as well as the post-processing of the results.

Replicability requires replicable behavior of all building blocks of the experiment, for which the setup of particular tests is recommended. Commonly, three categories of tests are considered: Unit tests, examining a small section of the source code; integration tests, checking a major component of the source code; and system tests, assessing the whole project [4, Chapter 3]. Tests usually involve a comparison of the computed to analytical results, statistically significant sampling or the conformance to an accepted benchmark problem.

4.3 Reproducibility Requirement: Extensive Documentation

To enable the reproducibility of a CBEx, a sufficiently detailed description of the algorithms, implementation, test setup, and parameters needs to be provided. Here, sufficiency is achieved if the documentation contains all information needed to setup and to run the experiment by a different researcher in a comparable environment.

However, to reproduce a CBEx in a different environment, a documentation of the utilized hardware and software is also needed. An essential part of this environment documentation is the listing of other software packages required to perform the CBEx. Documenting these dependencies includes all software, which is not available in a commonly assumed environment with employed variant and version and allows to set up the same or at least similar software stack.

Depending on the programming language in which the considered CBEx is encoded, different types of dependencies arise. A compiled language requires a compiler and linked libraries to generate an executable file embodying the program computing the results. The variant of the compiler and its version as well as the variants of (statically and dynamically linked) libraries with their versions make up the associated dependencies. Furthermore, a build system, which organizes the compilation and linking may be used and constitute a dependency. An interpreted language requires an interpreter, which parses and executes the source during its runtime. In this case, typical dependencies are the variant of the interpreter in a specific version as well as depending toolboxes with versions.

4.4 Reproducibility Recommendation: Availability

The availability of the source code archive is highly recommended for reproducibility because of two main reasons. First, the code itself may serve as documentation of the experiment. Second, the code may be used to realize the actual reproduction.

Therefore, the availability of the source code archive from a stable location is vitally important. A location can be considered stable if its main purpose is storing data. This does not imply lasting availability, hence a second backup location is commendable.

The classic method of providing source code access is the bundling with the publication by including the source code archive as supplemental material. This affiliates the code with the publication and is conveniently obtainable together with the publication itself. Yet, a supplemental material section may not be available for all journals or may only accept certain file types (with a maximum file size).

Recently, software depots for scientific source code have been established. For example, RunMyCode131313runmycode.org or ResearchCompendia141414researchcompendia.org are services storing source code archives and associating these to publications.

Alternatively, the source code archive can be published separately through platforms such as Zenodo151515zenodo.org or Figshare161616figshare.com. An advantage of this method is the assignment of a digital object identifier (DOI) for such a software publication, which can then be stated in the Code Availability Section of the associated publication.

As for the dependencies, reproducibility is not inhibited by closed-source software. However, a statement on the applicability of an open-source variant, if available, of those dependencies is suggested. In any case, those parts of the experiments that are not part of the source code, need to be documented as described in Section 4.3.

4.5 Reusability Requirement: Accessibility

A CBEx is reusable if it is accessible in a related or even different context. Accessibility encompasses all means to (partially) apply the functionality of the original to another CBEx. The availability of source code fulfills the accessibility for reusability, but also access to a compiled executable and library or a remote service is sufficient to comply.

4.6 Reusability Recommendation: Modularity, Software Management & Licensing

To be able to adapt a CBEx to differing environments and settings, the CBEx itself has to allow some parametrization to enable a certain configurability. Furthermore, modularity, the separation of experiment and method, enables the utilization of the method in other experiments or conducting the experiment with alternative methods. A more fine-grained modularization can allow, in addition, the exchange of components from the method or experiment such as numerical solvers or service libraries. Modularity necessitates a definition of interfaces which determine the communication between the interchangeable components. The documentation of such an interface is essential for it to fulfill its purpose and involves e.g. a description of protocols, variables, types and function signatures with their arguments and return values.

Source code usually undergoes some evolution over time during which errors are fixed, and new features are introduced. Hence, software management methods, such as version control, are recommended for the organization of this development process.

A reusable software project is recommended to obey some versioning procedure. A version scheme allows a unique identification of different chronological stages of the project. Usually, such a version consists of at least two numbers delimited by a dot, describing the major and minor iteration of changes. More fine-grained versioning can be applied with further numbers. A release of a new version can be fixed by assigning a DOI.

To record the evolution of the source code a version control system, such as git, mercurial or bazaar, is an important tool. A version control system tracks changes for each controlled file and allows a well-defined collaborative work on the source files. The set of all files under version control makes a repository, a set of changes to a single or multiple files constitute a revision of the repository, and a set of revisions defines a new version. A history of the revisions can also augment the documentation of the CBEx if the changes are recorded with comprehensive descriptions.

A license assigned to the source code archive, which governs the rights and duties associated with its use and reuse as well as indicating copyrights, is practically necessary for reusability. If an open-source license is selected, certain characteristics should be considered: The license should be approved by the Open-Source-Initiative171717opensource.org and the Free-Software-Foundation181818fsf.org as well as being compatible with the GNU-General-Public-License191919opensource.org/licenses/gpl-license. Generally, a central requirement for scientific software should be an attribution clause requiring the future inclusion of the copyright information, which usually notes authors and contributors. A non-permissive license may inhibit the reusability of the software in non-open projects, cf. [48]. To select a license, the service Choose-A-License202020choosealicense.com can be of help, and for an explanation of the selected license, a service like tl;dr Legal212121tldrlegal.com provides short summaries of the license’s legal implications.

Replicability Basic Documentation Automation & Testing Reproducibility Extensive Documentation Availability Reusability Accessibility Modularity, Software Management & Licensing
Figure 2: Coding guidelines overview.

5 Basic Documentation

In terms of research software, it is important that the accompanying documentation enables usage and reproducibility of results. To this end, certain information on the tested hardware and software should be documented. Following, a basic form of documentation is proposed, which includes the essential information to facilitate RRR.

A simple form of documentation is providing basic information in plain text files. These should be sequential files containing only printable ASCII characters [20] and consequently using a US-ASCII file encoding. If it is necessary to also use non-ASCII characters, a modern encoding with good cross-platform support, like UTF-8, should be used. Recently, these text files have been decorated with commonmark222222commonmark.org mark-down code232323Usually indicated by the file extension .md., which rather improves readability then inhibiting it and are considered an unofficial standard due to the widespread use for example by github. Since typically scientific publications are composed in the English language, so should be these text file.

Certain default filenames are established to indicate the file’s contents, such as README, LICENSE, AUTHORS and CHANGELOG. Additionally, further files of relevance to the academic environment have been suggested such as CITATION and CODE. This work proposes two more files, namely RUNME and DEPENDENCIES to facilitate replicability.

5.1 Readme

The bare minimum of any code package, source code repository or source code archive should be a README file. To uniquely identify this text file it should state the name of the associated software project along with its version and the release date. Normally, also a brief description of the package functionality and its contents are expected.

Often, the README file also includes a manual for the compilation or installation of the project. In the case that these procedures are more elaborate, a separate INSTALL file can be used and referenced inside the README. The same holds for the authors and contributors to the project, which can be listed in the README or in an additional AUTHORS file. Relevant information for the README includes a project website, a (stable) download location, contact information and sample usage (for example referencing the RUNME file) of the associated software. Furthermore, the license and the LICENSE file242424The LICENSE file holds the full license text the copyright holders and the release year, a record of the history of changes in the CHANGELOG file, a set of frequently asked questions in a FAQ file and a documentation can be referenced.

In the case that the replicability of an experiment is targeted, the specifically used software stack and hardware environment should be documented, as well as all configurations, parameters and arguments defining the CBEx. For reproducibility, related publications should be cited, and for reusability, links to technical documentation, e.g. interfaces, or a version control repository could be listed. Generally, a README file can also act as a table of contents to the remaining files associated with the source code archive.

Preferably, the README presents the necessary information to start using the software in a quick and comprehensive way. Therefore, the general recommendation is to make it as detailed as necessary while at the same time keeping it as brief as possible. For in-depth discussions of the further details, a reference to the actual software documentation should be preferred.

5.2 Runme

To facilitate replicability, an additional file called RUNME is proposed in this work, and lists the steps required to replicate results. This can be an executable script file, which upon execution automatically performs all steps necessary to replicate the results of an associated publication. In case, multiple environments are supported, the respective environment can be highlighted by a file extension, for example RUNME.linux or RUNME.win. Alternatively, the RUNME file can describe these stages in pseudo-code or, in general, not machine readable language.

5.3 Citation

The concept of a CITATION file has first been used by the R-project [43] and has also been adapted by GNU Octave [9]. This file contains information on how to cite the associated software project in other works. Besides a sample citation, a suggested BibTeX code is often provided in this file.

5.4 Dependencies

Modern software stacks encompass multiple layers of intermediary software on which a project may depend upon. To be able to build and use a provided source code package such dependencies must be locally available. For projects with few dependencies, it is sufficient to list those in the README file, yet for projects with many dependencies it is suggested to include a DEPENDENCIES file that lists these necessary (third-party) software components including the required version. Dependencies encompass, but are not limited to: runtime environments, libraries, toolboxes, source code archives or executable files.

5.5 Code

The purpose of the CODE file is the listing of key meta-data on the associated software project. Initially, the idea of bundling code meta-data was proposed in [46] and formalized in [26]. The main intended purpose of this proposal was the assignment of transitive credit in software stacks utilized for scientific work. In publications, about a software project this meta-data also helps as a unique identification, as for example in the SoftwareX journal252525www.journals.elsevier.com/softwarex. Another important reason for code meta-data is the classification and organization of scientific software, which facilitates reproducibility and reusability. This information could and should also be enclosed in the README file, yet the focused CODE file is machine-readable and allows automatically generated directories.

Various file formats to encode this meta-data are surmisable. Among others there are: ini (Initialization File), xml (Extensible Markup Language), yaml (YAML Ain’t Markup Language) and json (Javascript Object Notation), which is suggested in [46, 26]. Basic requirements for such a file are a plain text encoding and a human readable formatting. Additionally, a simple syntax262626This is understood as a small set of rules. as well as the availability of parsing facilities should be considered. Due to its renownedness and easy readability for human and machine, the authors suggest to use the ini file format, as the more elaborate grammars xml, yaml and json require sophisticated parsers.

There is no standard defining the ini format, yet its widespread use establishes a quasi-standard: Each line in an ini file holds a single key-value pair, which is delimited by a colon. The other formats also provide hierarchies for its components, which allow nesting of fields, for example grouping an author’s properties under a common author key, but these hierarchies introduce an impediment for the automatic parsing of contents. To resolve the former example of multiple authors, in the case of the ini file a comma separated list can be used as the value.

Due to the wide range of possible meta-data across the sciences utilizing software, no one-size-fits-all list of keywords is given, but a list of suggestions which applies to most research software projects.

  • name The primary identifier of the software project.

  • shortname An alias or the name of the main executable.

  • version A unique state of the project, usually symbolized by numbers separated by decimal points indicating the major and minor revisions.

  • release-date The date this version has been released written in the ISO-8601 international format: YYYY-MM-DD [21].

  • doi A digital object identifier fixing a software release at a stable location.

  • authors The list of authors.

  • orcids The list of ORCID272727orcid.org identifiers corresponding to the list of authors.

  • topic A basic categorization282828For example category classifications such as MSC ( msc2010.org ), ACM ( www.acm.org/about/class ) or PACS ( www.aip.org/publishing/pacs ) may be used. of the project.

  • type The type of software, for example a program, library or toolbox.

  • license The license under which the software is released.

  • license-type Distinguishes between open and propriety licenses.

  • repository The link to project’s source code repository.

  • repository-type The type of version control software of this repository.

  • languages This field is supposed to contain a comma separated list of utilized programming languages in the software project. For larger projects a naming of the major languages will be sufficient. Since programming languages evolve over time, a version or standard of the employed language or dialect should also be provided.

  • dependencies A list of software required to use the project, such as libraries, toolboxes and runtimes.

  • systems A list of compatible operating systems or computational environments.

  • website If the CBEx is part of an enclosing research software project and has a website, the URL (Uniform Resource Locator) can be provided in this field to guide users to the available resources.

  • keywords A list of descriptive terms.

An example of such a code meta data ini-file, from emgr - the empirical gramian framework [16], is shown in Figure 3.

name: Empirical Gramian Framework

shortname: emgr

version: 3.9

release-date: 2016-02-25

doi: 10.5281/zenodo.46523

authors: Christian Himpe

orcids: 0000-0003-2194-6754

topic: Model Reduction

type: Toolbox

license: 2-Clause BSD

license-type: Open

repository: github.com/gramian/emgr

repository-type: git

languages: Matlab

dependencies: GNU Octave >= 3.8, MATLAB >= 2011b

systems: Linux, Windows

website: gramian.de

keywords: empirical gramians, cross gramian, combined reduction

Figure 3: Sample CODE ini-file for the empirical gramian framework.

5.6 Source Code File Headers

Apart from the text files enclosed with the project, every source code file should state in its first lines, the so-called header:

  1. the associated project,

  2. the authors and contributors,

  3. and the purpose of the file.

This establishes the affiliation of this source file to the project. The header can optionally also include license and version information. Additionally, this file header can hold citations to works used to compose the following source code or keywords categorizing the contents.

6 A Practical Example

Figure 4: Example IRKA results for the FOM model by Penzl and reduced order 10.

In this section, we discuss a very rudimentary and simple implementation of the iteratively corrected rational Krylov algorithm for model reduction proposed by Gugercin, Antoulas and Beattie [14]. The implementation of the algorithm was made as an exercise in a lecture about model reduction. The common denominator of the authors is the fact that their research is within the area of model order reduction. But, their backgrounds, scientific computing, mathematics, control or engineering, is different. Nevertheless, in our opinion, the sharing of code, good documentation, and modular programs which can be reused is essential for the further success of model order reduction. The intention of the best practice example is exemplary to show the files and rules for good CBEx’s. The example serves as a template for other research. During implementation, we particularly paid attention to follow the guidelines given in this work. In a first step, the IRKA algorithm [14] is chosen because the algorithm is widely used, heavily cited algorithm but also has a well-documented examples section where the numerical experiments used to verify the behavior of the algorithm are described including the model. Also, the outcome of the algorithm is for many examples deterministic therefore replicability of the results of [14] is achieved. The minimum requirement for replicability is the basic documentation, which documents the RUNME.m file and every single function. Two example files are given. In the first example, RUNME.m, the IRKA algorithm automatically produces the figures shown in Figure 4. The second example file, EXAMPLES.m, can be used to test the algorithm with different test examples and is used to test the algorithm on various system architectures with different programs and different program version. Documentation in the header, which architectures and programs work with the algorithm and the test examples, is recommended. Furthermore, standardized benchmark examples, e.g. from the Oberwolfach Benchmark Collection292929portal.uni-freiburg.de/imteksimulation/downloads/benchmark are used to allow reproducibility of the results for other users. Finally, to demonstrate the advantages of reusability part of the implementation is based on the work of Panzer [39]. Since the source code of Panzer [39] is published under an open-source license, a reuse of his work of is possible. We can modify and use the code for our own purpose. Consequently, for a further reuse of the source code, this implementation is also published under a public license. The code was made public via a GitLab archive303030gitlab.mpi-magdeburg.mpg.de/saak/best_practice_IRKA.git and uniquely identified and archived via a Zenodo entry with a valid DOI [11], the availability of the source code is depicted in our Code Availability section below. Nevertheless to show the possibility to combine open source code with closed source code the function calculateFrequencyResponse.p is given in a p-coded version, which is obfuscated to protect intellectual properties.

The results shown in Figure 4 use Penzl’s FOM benchmark example (see e.g., [40, Section C.3.1]) and apply our implementation of the method from [14]. In the reported test the initial shift parameters and the reduced order have been chosen such that the progress of the IRKA iteration becomes nicely visible. Larger reduced orders would allow for smaller error norms, while more clever choices of the initial shift could lead to less overall iterations. Both are however beyond the scope of this presentation.

7 Closing Remarks

In this contribution the notions of replicability, reproducibility and reusability are discussed and classified by requirements and recommendations. The issue of code availability and the implied reflection on the artifacts of associated CBEx is exemplified, and simple formats of documentation and meta-data provisioning are described.

The proposed best practices in this work improve scientific validity of CBEx, but also aim to spark a discussion on RRR in this context. And by no means are the suggested techniques to be understood as a strict rulebook with everlasting validity. The authors emphasize that the proposed practices, which are based on practical experience and standards as well as on general considerations of abstract concepts, are subject to change over time. Nonetheless, the herein demonstrated strategies do enhance replicability, reproducibility & reusability and thus, also in the absence of other general solutions or approaches, merit their consideration for scientific CBEx in general and numerical CBEx in particular.

Code Availability


The source code of the implementations used to compute the presented results can be obtained from: doi:10.5281/zenodo.55297 and is authored by: Jörg Fehr and Jens Saak Please contact Jörg Fehr and Jens Saak for licensing information

Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft, DFG EXC 1003 Cells in Motion – Cluster of Excellence, Münster, the Center for Developing Mathematics in Interaction, DEMAIN, Münster, Germany, and the Deutsche Forschungsgemeinschaft, DFG EXC 310/1 Simulation Technology at the University of Stuttgart.

Conflict of Interest

All authors declare no conflicts of interest in this paper.

References

  • [1] D.H. Bailey, J.M. Borwein, and V. Stodden. Facilitating reproducibility in scientific computing: Principles and practice. In Harald Atmanspacher and Sabine Maasen, editors, Reproducibility: Principles, Problems, Practices, and Prospects, pages 205–232. Wiley, July 2016.
  • [2] W. Bangerth and T. Heister. Quo Vadis, Scientific Software?. SIAM News, 2014.
  • [3] N. Barnes. Publish your computer code: it is good enough. Nature, 467:753, 2010.
  • [4] P. Bourque and R.E. Fairley, editors. Guide to the Software Engineering Body of Knowledge (SWEBOK), Version 3.0. IEEE Computer Society, 2014.
  • [5] J. B. Buckheit and D. L. Donoho. WaveLab and Reproducible Research. In Anestis Antoniadis and Georges Oppenheim, editors, Wavelets and Statistics, volume 103 of Lecture Notes in Statist., pages 55–81. Springer, New York, 1995.
  • [6] S. Chaturantabut and D. C. Sorensen. Nonlinear model reduction via discrete empirical interpolation. SIAM J. Sci. Comput., 32(5):2737–2764, 2010.
  • [7] C. Collberg, T. Proebsten, and Alex M Warren. Repeatability and Benefaction in Computer Systems Research. Technical report, University of Arizona, 2014.
  • [8] S.M. Easterbrook. Open code for open science?. Nature Geoscience, 7:779–781, 2014.
  • [9] J.W. Eaton, D. Bateman, S. Hauberg, and R. Wehbring. GNU Octave version 4.0.0 manual: a high-level interactive language for numerical computations. http://www.gnu.org/software/octave/octave.pdf, 2015.
  • [10] Timothy M Errington, Elizabeth Iorns, William Gunn, Fraser Elisabeth Tan, Joelle Lomax, and Brian A Nosek. An open investigation of the reproducibility of cancer biology research. eLife, 3:e04333, dec 2014.
  • [11] J. Fehr and J. Saak. Iterative Rational Krylov Algorithm (IRKA), April 2016.
  • [12] S. Fomel and J.F. Claerbout. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering, 11(1):5–7, 2009.
  • [13] I.P. Gent. The Recomputation Manifesto. arXiv cs.GL, 2013.
  • [14] S. Gugercin, A. C. Antoulas, and C. A. Beattie. Model Reduction for Large-Scale Linear Dynamical Systems. SIAM J. Matrix Anal. Appl., 30(2):609–638, 2008.
  • [15] M.A. Heroux and J.M. Willenbring. Barely sufficient software engineering: 10 practices to improve your CSE software. In ICSE Workshop on Software Engineering for Computational Science and Engineering, pages 15–21, 2009.
  • [16] C. Himpe. emgr - Empirical Gramian framework (Version 3.9). gramian.de, 2016.
  • [17] The Mathworks Inc. Matlab, Product Help, Matlab Release 2014b. Mathworks Inc., Natick MA, USA, 2014.
  • [18] D.C. Ince, L. Hatton, and J. Graham-Cumming. The case for open computer programs. Nature, 482:485–488, 2012.
  • [19] IPOL Journal · Image Processing On Line.
  • [20] ISO. ISO 646 - Information technology – ISO 7-bit coded character set for information interchange. ISO, 1991.
  • [21] ISO. ISO 8601 - Data elements and interchange formats – Information interchange – Representation of dates and times. ISO, 2004.
  • [22] L. K. John, G. Loewenstein, and D. Prelec. Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science, 23(5):524–532, 2012.
  • [23] L.N. Joppa, D. Gavaghan, R. Harper, K. Takeda, and S. Emmott. Optimizing Peer Review of Software Code - Response. Science, 341(6143):237, 2013.
  • [24] L.N. Joppa, G. McInerny, R. Harper, L. Salido, K. Takeda, K. O’Hara, D. Gavaghan, and S. Emmott. Troubling Trends in Scientific Software Use. Science, 340(6134):814–815, 2013.
  • [25] D. Joyner and W. Stein. Open source mathematical software. Notices - American Mathematical Society, 54(10):1279, 2007.
  • [26] D.S. Katz and A.M. Smith. Transitive Credit and JSON-LD. Journal of Open Research Software, 3, 2015.
  • [27] D. Kelly, D. Hook, and R. Sanders. Five Recommended Practices for Computational Scientists Who Write Software. Computing in Science & Engineering, 11(5):48–53, 2009.
  • [28] S. Krishnamurthi and J. Vitek. The Real Software Crisis: Repeatability as a Core Value. Communications of the ACM, 58(3):34–36, 2015.
  • [29] R. J. LeVeque. Top Ten Reasons To Not Share Your Code (and why you should anyway). SIAM News, April 2013.
  • [30] B. Marwick. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory, pages 1–27, 2016.
  • [31] Scientific Data. Editorial and publishing policies. http://www.nature.com/sdata/for-
    authors/editorial-and-publishing-policies#code-avail
    , 2015.
  • [32] D. McCafferty. Should code be released?. Communications of the ACM, 53(10):16–17, 2010.
  • [33] Z. Merali. Computational science: …Error. Nature, 467:775–777, 2010.
  • [34] O. Mesnard and L.A. Barba. Reproducible and replicable CFD: it’s harder than you think. Technical report, arXiv (physics.comp-ph), 2016.
  • [35] Code Share. Nature, 514:536, 2014.
  • [36] Ctrl alt share. Scientific Data, 2, 2015.
  • [37] J. Nitsche. Über ein Variationsprinzip zur Lösung von Dirichlet-Problemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind. Abh. Math. Semin. Univ. Hambg., 36(1):9–15, 1971.
  • [38] Open Science Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251), 2015.
  • [39] H. K. F. Panzer. Model Order Reduction by Krylov Subspace Methods with Global Error Bounds and Automatic Choice of Parameters. Dissertation, Technische Universität München, München, 2014.
  • [40] T. Penzl. Lyapack Users Guide. Technical Report SFB393/00-33, Sonderforschungsbereich 393 Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107 Chemnitz, Germany, 2000. Available from http://www.tu-chemnitz.de/sfb393/sfb00pr.html.
  • [41] K.R. Popper. The Logic of Scientific Discovery. Classics Series. Routledge, 2002.
  • [42] A. Prlić and J.B. Procter. Ten Simple Rules for the Open Development of Scientific Software10.1371/journal.pcbi.1002802. PLoS Computational Biology, 8(12), 2012.
  • [43] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.
  • [44] Y. Saad and M. H. Schultz. GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Statist. Comput., 7(3):856–869, 1986.
  • [45] P. Sliz and A. Morin. Optimizing Peer Review of Software Code. Science, 341(6143):236–237, 2013.
  • [46] A.M. Smith. JSON-LD for software discovery, reuse and credit. http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit, 2014.
  • [47] V. Stodden. The Legal Framework for Reproducible Scientific Research: Licensing and Copyright. Computer in Science & Engineering, 11(1):35–40, 2009.
  • [48] V. Stodden. Enabling Reproducible Research: Open Licensing for Scientific Innovation. International Journal of Communications Law and Policy, pages 1–55, 2009.
  • [49] V. Stodden and S. Miguez. Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software, 2(1), 2014.
  • [50] D. L. Vaux, F. Fidler, and G. Cumming. Replicates and repeats—what is the difference and is it significant?. EMBO reports, 13(4):291–296, 2012.
  • [51] J. Vitek and T. Kalibera. Repeatability, reproducibility, and rigor in systems research. In Proceedings of the 9th ACM international conference on Embedded software, pages 33–38, 2011.
  • [52] G. Wilson, D.A. Aruliah, C.T. Brown, N.P.C. Hong, M. Davis, R.T. Guy, S.H.D. Haddock, K.D. Huff, I.M. Mitchell, M. D. Plumbley, B. Waugh, E.P. White, and P. Wilson. Best practices for scientific computing. PLoS biology, 12(1), 2014.