Long term availability of raw experimental data in experimental fracture mechanics

03/20/2018 ∙ by Patrick Diehl, et al. ∙ Corporation de l'ecole Polytechnique de Montreal 0

Experimental data availability is a cornerstone for reproducibility in experimental fracture mechanics, which is crucial to the scientific method. This short communication focuses on the accessibility and long term availability of raw experimental data. The corresponding authors of the eleven most cited papers, related to experimental fracture mechanics, for every year from 2000 up to 2016, were kindly asked about the availability of the raw experimental data associated with each publication. For the 187 e-mails sent: 22.46 received our request and did not reply, and 19.79 replied to our request. The availability of data is generally low with only 11 available data sets (5.9 raw experimental data. First, the ability to retrieve data is strongly attached to the the possibility to contact the corresponding author. This study suggests that institutional e-mail addresses are insufficient means for obtaining experimental data sets. Second, lack of experimental data is also due that submission and publication does not require to make the raw experimental data available. The following solutions are proposed: (1) Requirement of unique identifiers, like ORCID or ResearcherID, to detach the author(s) from their institutional e-mail address, (2) Provide DOIs, like Zenodo or Dataverse, to make raw experimental data citable, and (3) grant providing organizations should ensure that experimental data by public funded projects is available to the public.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reproducibility is the ability to obtain the same research results as another researcher, given the same analysis is done on the same raw data. Reproducibility is crucial to the scientific method [2, 13]. Reproducibility in experimental mechanics, can hardly be achieved without access to the raw data used by fellow researchers in their publications. The lacking of scientific reproducibility has been shown for basic and preclinical research [5] and psychological science [7]. In biology [15], a study revealed that raw data sets could be obtained from % of papers containing experimental data and published from to .

Different stakeholders addressed the lacking availability of experimental data. The organization for economic co-operation and development (OECD) was commissioned by different governments to develop a set of guidelines to provide cost-effective access to publicly funded research data [12, 11]. Publishers are currently investigating means to strengthen data-access practices [9] or support open data [8].

According to a recent study, more than % of researchers, out of more than polled, have tried and failed to reproduce another scientist’s experiments [4]. However, the study also shows that physicists and engineers are confident that peer reviewed published data is reproducible.

Most publications in experimental fracture mechanics rely on data gathered from experiments. It has been our experience (and practice) that only quantities of interest are presented and the raw experimental data is usually missing. Moreover, even when experimental data is available, information related to the experimental setup itself is usually sparse (e.g. , calibration of the measurement unit, software used, etc.), which prevents the experiment’s replication.

The modelling community is also highly interested in high fidelity and well documented experimental data to validate model predictions [16]. The data published in the literature usually lacks information about boundary conditions, etc. , to ensure that the models reproduce, at least conceptually, the experiments they aim to reproduce.

This short communication focuses on the accessibility and long term availability of raw experimental data, as well as supporting information, in experimental fracture mechanics. We have contacted the authors of the eleven most cited papers related to experimental fracture mechanics for every year from up to . We kindly asked these authors about the availability of raw experimental data associated with each publication. The up-to-dateness of the e-mail addresses were studied and the reply-behaviors for working e-mail addresses were considered. Finally, the availability of raw data out of the positive responses from the authors was emphasized.

Section 2 deals with the methodology for this study and the data collection. Section 3 presents the main results. In Section 4, discusses the results and Section 5 concludes the paper.

2 Methodology

2.1 Data collection

The Web of Science database111https://webofknowledge.com was queried with the following fields on September th :
TOPIC: (CRACK) AND TOPIC: (DAMAGE) AND TOPIC: (EXPERIMENTAL) AND YEAR PUBLISHED: (), where varied from to .

The top eleven cited papers containing experimental data generated by the authors, included as a reference to an online resource or as an appendix, were selected for each of the respective publication year.

We investigated: (1) the document’s meta-data provided by Web of Science (WoS), (2) the PDF document itself and finally (3) the publisher’s website to identify the corresponding authors email contacts. Less than % of the papers published between and contained an e-mail address in the meta data from WoS while % of the papers published after contained that information. The first corresponding author was selected for the communication attempt. The full list of references investigated can be found on Github under the BibTEX format222https://github.com/OpenDataExpMechanics/Survey.

The generic email detailed in .2 was sent to the selected authors on October 16th 2017. We asked the authors if they were willing to share their experimental data and if so, how long, in minutes, would it take to gather.
The prepared prescribed answers were: (a) data is not available, (b) the data is confidential, (c) one of the co-authors should be contacted to obtain the data. Furthermore, an open answer was available to the authors that were unable, or unwilling, to share this information.

A reminder was sent on November the 6th (three weeks after the first attempt) to all authors for which we did not receive a reply and for which the e-mail did not bounce. The communication contained the detailed query shown in .3.

Eight e-mails were sent a few weeks after the first iteration, to correct an error in the automated data acquisition. All responses received before December the 15th were considered in this survey.

All e-mails were sent using the institutional e-mail address of one of the authors, as in other studies. The possibility for a biased reply-behavior when sending the e-mail as a student or as a professor was not addressed.

3 Results

Table 1: Analysis of the data obtained from the sent e-mails to the first author of the top-eleven cited papers from to .

Out of the papers selected, only one publication provided the experimental raw data attached as supplementary data on the journal’s website.

Table 1 lists the data analysis for the e-mails sent. The first column presents the number of e-mails that bounced. The second column shows the number of replies to either the first or second e-mail. Note that there is no distinction between a positive or negative reply with respect to sharing the data. Only authors that did not respond to the first e-mail responded after receiving the second. The third column presents the number of no replies, which means that we did not obtain an error from the mail server and no answer weeks after sending the first e-mail.

Table 1 also lists the time, in minutes, required for the authors to retrieve the data (for those willing to share it). The following columns list the reasons the authors invoked for not providing the requested data. The last column lists the amount of available data sets.

Figure 5

shows the collected data with respect to the author responses as scatter plots. The linear regression between the year and the quantity of interest is presented. Figure 

(a)a shows the bounces for non valid or non existing e-mail addresses per year. Figure (b)b presents the number of replies received to the first or second e-mail. Figure (c)c presents the number of authors having working addresses who did not reply. Figure (d)d shows the number of times the requested data was available per year.

(a)
(b)
(c)
(d)
Figure 5: Collected data with respect to the author responses as scatter plots. A linear regression (black line) for the collected data with respect to the responses of the authors was done.

4 Discussion

Outdated contact information

Bouncing e-email addresses hindered the contact with the original authors and limited the acquisition of original research data. The number of bouncing e-mail addresses declines during the observation period, as seen in Figure (a)a. From to , the average number of bouncing e-mails is , while this number drops to in the period.

No reaction to our requests

Figure (c)c shows that the number of authors who did not reply to our e-mails increased over the years while Figure (a)a shows that the number of invalid emails increased over the years. This observation suggests that authors who published more recently are less responsive than authors who published in previous years.

Availability of raw experimental data

Figure (d)d indicates that the availability is independent of the year and no trend within the linear regression could be found. The availability of data is generally low, with only available data sets (%).

Reasons invoked by the authors for not providing the data

Authors were able to provide the reasons for not sharing the experimental data related to their publication. For example:

  1. [noitemsep,nolistsep,label=()]

  2. Retired author(s), or author(s) who left their institution, did not keep data backups;

  3. Author(s) have data storage plans and only keep large data sets for 5 to 10 years;

  4. Author(s) explained that sharing data would require work to render it usable by other researchers and they are not being paid to do so;

  5. One author explained that he believed that it is better for the experimentalist to do his own experiments.

5 Conclusion and Outlook

This work suggests that the availability of data sets in experimental fracture mechanics is very limited. Furthermore, the ability to retrieve the data is strongly attached to the possibility to contact the corresponding author. Retrieving the data becomes unlikely when the contact is lost with the corresponding author. Moreover, it seems that recent authors are less responsive to data sharing requests than authors who published in previous years.

These facts limit the scientific capabilities of researchers to reproduce, build on, and check other scholars work. This study suggests that institutional e-mail addresses are insufficient means for obtaining experimental data sets. The lack of experimental data could also result from the fact that granting agencies and publishers do not require authors to make their raw data publicly available [14]. A possible solution to this issue could be the requirement to present a data management plan at the beginning of every new project. This can be required by research institutions or organizations that provide grants to researchers. It is also important to notice that providing the data is not sufficient: the data has also to be usable by other researchers. This means that the data must be labelled, explained, and put into context.

We propose the following steps to improve experimental data availability:

  • [noitemsep]

  • Requirement for ORCID [10], ResearcherID [6], or other unique identifiers for publications that detach author(s) from their institutional e-mail addresses;

  • Universities and other institutions listed as affiliations in scientific literature should provide forwarding e-mail addresses in case an author leaves their institution;

  • Provide DOIs, like Zenodo [3] or Dataverse [1], to make raw experimental data citable and provide more value for the academic curriculum. By making data sets citable, experimental researchers might take more time to prepare and store their data sets;

  • Grant providing organizations should ensure the availability of experimental data by public funded projects, e.g. by asking for a data management plan.

As a future work, the authors of this publication would like to continue exploring the availability of experimental data through time by investigating the usability of data sets and determining guidelines to properly review an experimental data set in the field of mechanics, which could be done before updating a data set to one of the larger existing repositories.

References

  • [1] dataverse: A data repository framework to share and publish research data, Dec. 2017. original-date: 2013-11-01T18:47:39Z.
  • [2] Scientific method, Nov. 2017. Page Version ID: 812148396.
  • [3] zenodo: Research. Shared, Dec. 2017. original-date: 2013-02-11T09:34:27Z.
  • [4] M. Baker. 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604):452, 2016.
  • [5] C. G. Begley and J. P. Ioannidis. Reproducibility in science. Circulation research, 116(1):116–126, 2015.
  • [6] Clarivate Analytics. researcherID.com, 2008.
  • [7] O. S. Collaboration et al. Estimating the reproducibility of psychological science. Science, 349(6251):aac4716, 2015.
  • [8] L. Finnegan. Publish or Perish – How can publishers support open data?, June 2015.
  • [9] NATURE — Editorial. Data-access practices strengthened, November 2014.
  • [10] ORCID, INC. ORCID — Connecting Research and Researchers, 2012.
  • [11] Organisation for Economic Co-operation and Development. Science, Technology and Innovation for the 21st Century. Meeting of the OECD Committee for Scientific and Technological Policy at Ministerial Level, 29-30 January 2004 - Final Communique, January 2004.
  • [12] Organisation for Economic Co-operation and Development. OECD Principles and Guidelines for Access to Research Data from Public Funding. Technical report, OECD PUBLICATIONS, 2007.
  • [13] R. D. Peng. Reproducible research in computational science. Science, 334(6060):1226–1227, 2011.
  • [14] H. Spencer. Thoughts on the sharing of data and research materials and the role of journal policies, Jan. 2010.
  • [15] T. H. Vines, A. Y. Albert, R. L. Andrew, F. Débarre, D. G. Bock, M. T. Franklin, K. J. Gilbert, J.-S. Moore, S. Renaut, and D. J. Rennison. The availability of research data declines rapidly with article age. Current biology, 24(1):94–97, 2014.
  • [16] Z. Zhuang and M. Maitireyimu. Recent research progress in computational solid mechanics. Chinese Science Bulletin, 57(36):4683–4688, Dec. 2012.

Appendix

.1 Data and Analysis

The data and code for this study are available on github under following DOI: 10.5281/zenodo.1203766.

.2 Initial email (sent 10/16/2017)

Dear Prof. author

My name is XXX and I am part of the Laboratory of Multi-Scale Mechanics at Polytechnique Montréal. We are currently working on a study aimed at determining how experimental data associated with publications changes through time.

We found your article title among the 30 most cited articles on Scopus for the “experimental crack mechanics” query in year. We would be delighted if your publication could be part of our study. We are interested in the long term availability of raw experimental data and work supporting data, which was partly used in publications like yours. The complete study is anonymous and your response will not be used with your name or the reference in the study. For our study it would help if you could answer the following questions.

  • Are you willing to share the experimental data with a peer to reproduce or to compare his simulations with the experiment?

    • Could you also let us know how long (in minutes) it would take you to find the data?

  • If your answer to the previous question is no, we would very much like to know the reason(s) behind that:

    • The data is not available or lost

    • The data is confidential

    • Can you name a contact of the co-authors who can we ask for the data?

    • Other reasons (If you like please explain them)

If you have any further question on the design of this study or, are interested in its results, please feel free to contact us.

Many thanks for your time and help.

.3 Follow-up email (sent 11/06/2017 if no response to our initial email was received)

Dear Prof. author

I am following up on an e-mail we sent three weeks ago: we are a group of researchers from Polytechnique Montreal and the University of Stuttgart. Our researches are related to experimental mechanics or simulation and modeling in mechanics. We are currently trying to examine how the availability of experimental data in publications changes over time. Here, we are interested if the data is still available for reproducibility or the usage for benchmarks in simulation.

We handle your answer anonymous and your response will not be used with your name or the reference in the study. For our study it would really help if you could answer the questionnaire for your article title published in year, it will take less than two minutes.

  • Are you willing to share the experimental data with a peer to reproduce or to compare his simulations with the experiment?

    • Could you also let us know how long (in minutes) it would take you to find the data?

  • If your answer to the previous question is no, we would very much like to know the reason(s) behind that:

    • The data is not available or lost

    • The data is confidential

    • Can you name a contact of the co-authors who can we ask for the data?

    • Other reasons (If you like please explain them)

If you have any further question on the design of this study or, are interested in its results, please feel free to contact us or visit our project’s blog [0].

Many thanks for your time and help.

[0] https://opendataexpmechanics.github.io/